Agent Smith Needs to Kill Git
Git Is a Tool for Slow Typists
Git is past twenty now. It was designed for a workflow where a human edits a few files, stages them, writes a commit message, and pushes. The unit of change is a line of text. The unit of conflict is the same line edited by two people. The unit of history is a commit, which is a manual act of attention.
None of these assumptions survive contact with an agent. An agent does not type. It rewrites. It can fork a hundred variants of a function in parallel, test each against a fitness function, and promote the survivor. It does not pause to write a commit message. It does not care which file a function lives in.
The substrate has to match the worker. Git matches humans. It does not match agents.
Agents Edit Geometry, Not Text
Go one layer deeper than "the agent does not type". An LLM does not understand a program as a sequence of characters. It understands it as a point in a high-dimensional space. Tokens are embedded into vectors, attention warps those vectors against every other vector in the context, and the output is another point in the same space projected back down to tokens. What looks like code generation from the outside is a geometric transformation on the inside. (The underlying lift from symbols to vectors is developed in Language as Geometry.)
Refactoring, under this view, is rotation. Renaming is translation along an axis the model has already learned. Generalizing a function is a shear that preserves the semantic direction while relaxing a constraint. The model is not editing lines. It is moving a program through a manifold of programs and reading off the nearest lawful point.
The printed source is the shadow of that transformation. Text is a projection chosen for humans, the way a blueprint is a projection of a building. Asking an agent to operate through Git is asking it to move a point in a high-dimensional space by hand-editing the blueprint, line by line, and then reconstructing the geometry from the blueprint on every step. The loss at each projection boundary is where merge conflicts, reformatting noise, and phantom deletes are born.
If the worker moves programs as geometry, the substrate has to live closer to geometry than to text. Anything that forces a round trip through a byte stream on every edit is paying a translation tax the worker was built to avoid.
The Real Break: Parallel Swarms
The argument against Git sharpens when you stop thinking about one agent and start thinking about hundreds or thousands of them working in parallel. Agentic coding is not a faster human workflow. It is a swarm workflow. Fork a thousand variants of a refactor, let each explore a slightly different path, keep the survivors, merge the rest.
Git cannot carry that load. Not because it is slow, but because textual merge does not compose. Every pairwise merge introduces some chance of a conflict. Stack a thousand of them and the probability that something collides approaches one. The conflicts are almost never semantic. They are two agents touching the same line for unrelated reasons, or one agent reformatting while another edits, or the same function moved in two directions.
Merging that pile is next to impossible even for another agent. The conflict is stated in the vocabulary of text ("these two lines disagree"), not in the vocabulary of the program ("these two facts disagree"). An agent trying to resolve a textual conflict has to re-infer the semantics from the lines, decide which side is right, and hope the context around the conflict is stable. At swarm scale this re-inference becomes the bottleneck, and the error rate compounds.
Three failure modes follow from the same root:
- Reordering looks like rewriting. Move ten functions inside a file and Git sees a 500 line delete and a 500 line add. The semantic change (permute the
:ast/indexof ten entities) is drowned in noise. Every other agent editing that file now has a conflict. - Forks are heavyweight. A branch is a full pointer into the commit DAG. Agents want to fork at the granularity of a single function, run fifty variants, and throw forty nine of them away. Git branches are too expensive for that, and the forty nine losers still have to be torn down.
- Conflicts are textual, not semantic. One agent renames a function. Another adds a parameter to the same function. Both edits land on the signature line. In the text these are a three-way conflict; in the AST they are two independent facts. Someone (human or agent) has to adjudicate a disagreement that has no semantic content.
The common root is that Git versions the wrong thing. It versions the serialization (text) rather than the meaning (AST). At one-agent scale that mismatch is an annoyance. At swarm scale it is a wall.
The Solution: Code as Datoms, Not Files
The fix is not a better diff algorithm or a smarter merge tool. The fix is to stop representing code as files and start representing it as datoms. A file is a linear byte sequence that happens to encode a program. A datom is a fact about the program itself. Versioning the file versions the encoding. Versioning the datoms versions the meaning.
In Yin.vm the canonical form of code is the Universal AST, and the AST is a graph of datoms. A datom is a 5-tuple:
[e a v t m]Entity, attribute, value, transaction, meta. Every AST node is an entity. Every property of that node (its name, its type, its parent, its position in a sequence, its docstring) is a separate fact. A function is not a blob of text, it is a set of datoms:
[fn-17 :ast/type :function t1 m1]
[fn-17 :ast/name "calculate" t1 m1]
[fn-17 :ast/params [p-1 p-2] t1 m1]
[fn-17 :ast/body expr-42 t1 m1]
[fn-17 :ast/doc "totals " t2 m2]Editing is transacting new datoms. History is automatic, not a manual git commit. The log of transactions is the version control system.
Git's Best Idea, Kept and Sharpened
Git's deepest idea was not the commit log, it was content addressing. A blob's name is the hash of its bytes. Two files with identical content have identical names, everywhere, forever, with no coordinating authority. That single move is why git can be distributed, why pushes and pulls are just transfers of missing hashes, and why integrity comes for free.
The trouble is Git hashes the wrong object. It hashes the serialized byte stream of a file. Rename a local variable and the hash changes. Add a trailing newline and the hash changes. Reformat and the hash changes. The identity is pinned to the surface, not the substance.
Datoms keep the idea and move it one layer down. Entity IDs are hashes of structural content, not of text. A function's identity is the hash of its AST subtree: same parameters, same body, same facts, same ID, regardless of which file it lives in, which language it was written in, or how it was indented when a human last looked at it. Alpha-equivalent definitions collapse to one entity. Reformatting is invisible because formatting was never part of the hashed content in the first place.
A few consequences fall out immediately:
- Structural sharing is free. Fifty forked variants of a function share every unchanged subtree by hash. The store holds the deltas, not fifty copies. A swarm that forks ten thousand variants pays for ten thousand differences, not ten thousand programs.
- Cache keys are semantic. Compiled bytecode, type-check results, test outcomes, and proofs can all be keyed to the hash of the subtree they describe. "Did I already test this function?" becomes a hash lookup, not a rerun. The same function rediscovered in another repository hits the same cache.
- Cross-repository dedup. Two teams independently write the same helper and it has the same ID in both codebases. Libraries stop being copied; they are referred to by hash. Dependency is attribution, not import.
- Merge by union. Two offline swarms producing disjoint fact sets can be merged by set union because hash collisions mean identical content by construction. There is no "did we both invent the same thing" ambiguity: if the hashes match, we did.
- Provenance is verifiable. "Agent X asserted fact Y at transaction T" is checkable by anyone, because Y is content-addressed. Audit logs stop being narratives and start being proofs.
- Queries over fragments. "Find every call site that uses this exact expression" becomes a hash lookup, not a regex. Copy-paste detection, supply-chain auditing, and refactor-impact analysis stop being heuristics.
Git showed that content addressing is the right primitive for a versioned, distributed, tamper-evident system. Datoms finish the argument by applying the primitive to the object the worker actually edits: the fact, not the file.
DaoDB Already Is Git
The claim sharpens once you put the pieces together. Datoms live as a content-addressed stream inside DaoDB. A stream of content-addressed, append-only, hash-linked records replicated by hash transfer is, structurally, what Git is. The commit graph is a Merkle DAG of content-addressed blobs; our datom stream is a Merkle sequence of content-addressed facts. Same primitive, same guarantees.
So the comparison is not "datoms instead of Git" as if something is being given up. DaoDB plus content-addressed datoms is Git at the core. Distribution, offline work, cryptographic integrity, replayable history, pull-by-missing-hashes: all present as properties of the substrate, not as features to reimplement. Any capability Git has at the blob layer, DaoDB has at the datom layer, by isomorphism.
What changes is the range above the floor. Because the addressed object is a semantic fact rather than a byte blob, everything Git could not do (structural sharing across forks, hash-keyed semantic caches, merge by set union, sub-atomic fork and promote, cross-repository dedup of alpha-equivalent code) falls out as additional reach from the same primitive. Git is the floor. Datoms are the ceiling lifted.
Programmers Keep Their Tools
The datom representation is the substrate, not the interface. Programmers do not have to learn a new language, a new editor, or a new workflow. They keep writing Clojure, Python, JavaScript, Dart, or whatever they already use. They keep using their editor, their formatter, their linter. They keep saving files.
The Yang compiler sits at the boundary. It reads source in the programmer's language of choice and emits the canonical datom stream. Saving a file is what produces a transaction; the programmer never sees the datoms unless they want to. The source file is still there (a view, a serialization), but the facts that get versioned, forked, queried, and merged are the datoms that Yang produced.
This matters because it means the transition is not a rewrite of the ecosystem. A team can adopt the datom substrate while every developer keeps their familiar tools on the surface. Agents operate on the datom layer, where merges compose. Humans operate on the text layer, where habits live. Yang keeps the two in sync.
The Round Trip
Because the datom AST preserves the full semantics of the original source, the trip goes both ways. Source compiles down to datoms through Yang, and datoms regenerate source through a pretty printer. Nothing meaningful is lost on the way down, so nothing meaningful is invented on the way back up.
This is what makes the substrate safe for humans. A developer can pull the latest datoms from a swarm of agents and read the result as ordinary Python, Clojure, or Dart, formatted however they prefer. The regenerated source is not a reconstruction guess, it is a rendering of the same facts the agents were editing. The AST is canonical; the source is a view. The view is always available, and it always matches.
The round-trip property is developed in detail elsewhere: Many Syntaxes, One AST (round-trip stability across syntaxes), Yin.vm: Chinese Characters for Programming Languages (bijective translation through the Universal AST), and AST Datom Streams and Bytecode Performance (source regenerated from AST datoms).
The Conflict Surface Collapses
Now replay the three Git failure modes:
- Reordering. Permuting function order is updating
:ast/indexon ten entities. Ten small facts. No phantom deletes. - Forking. A fork is a new transaction namespace scoped to a subset of entities. Fork one function without forking the rest of the graph. Fifty variants live side by side as fifty parallel fact sets, and the loser variants are simply never promoted.
- Rename plus add-parameter. One agent writes
[fn-17 :ast/name "total" t3 m3]. Another writes[fn-17 :ast/params [p-1 p-2 p-3] t3 m3]. Different attributes, different facts, no conflict. The merge is a set union.
Formatting changes vanish entirely. Whitespace, bracket style, import ordering are all properties of the text rendering, not of the AST. Two agents that disagree about formatting are not disagreeing about anything the store records.
The only real conflict is two agents asserting different values for the same attribute on the same entity at the same transaction. That is a narrow, precise, semantic disagreement. A human (or a policy) can resolve it with full context, because the conflict is stated in the vocabulary of the program, not of the file.
What Agents Can Finally Do
Once code is datoms, the agent gets capabilities Git never offered:
- Sub-atomic forking. Fork a single function. Run a swarm of variants against a fitness constraint. Promote the winner. No
.git, no branch cleanup. - Datalog over the codebase. "Find every call site that uses the legacy connection without a timeout." That is a query, not a regex.
- Semantic diffs. The diff is a list of facts added and retracted, not a pile of line changes. "Renamed
totaltosum, addedtax-rateparameter, no logic change." - Parallel swarms that actually merge. N agents working on N parts of the graph produce N disjoint fact sets. The merge is a
transact, not a negotiation. - Time travel for free. Every fact has a
t. Ask what the function looked like before the refactor. Ask which transaction introduced the regression. The log is the answer.
Git Becomes a View
This does not mean Git disappears tomorrow. Text is still how humans read code on a plane, and git log is still how humans narrate change to each other. In a datom-native world both are views. The canonical form is the fact stream. Text, patch files, and commit history are projections for humans who want them, the same way text is a projection over the AST.
An agent can emit a Git patch for a reviewer. A reviewer can approve the patch. Behind the scenes the patch is a packaging of the underlying datom transactions. Git is demoted from substrate to serialization.
The Line That Matters
Git solved version control for files. Agents do not edit files. They edit meaning. The substrate has to match the worker, and the worker is no longer a human with a keyboard. It is a continuation that can fork, query, and merge at the speed of the graph.
Killing Git is not hostility. It is promotion. Git did its job. The job changed.
See also: