Yin.VM and DaoDB: How Persistent ASTs Make Self-Modification Practical
Every runtime makes a fundamental choice about what to preserve and what to discard. The JVM erases type parameters after compilation. JavaScript engines discard AST details after parsing. Even Smalltalk images, which preserve everything, don't make the program structure queryable in the way a database is queryable.
Yin.VM makes a different choice. It preserves the complete Abstract Syntax Tree as datoms and makes it queryable via Datalog. This single decision... keeping the AST as first-class, queryable data... unlocks capabilities that seem impossible in traditional runtimes.
What gets erased: a comparison
To understand what Yin.VM preserves, we must first see what other systems discard:
| System Aspect | Static Type Systems (Java, Go, Rust) | Dynamic Type Systems (JavaScript, Python) | Yin.VM + DaoDB |
|---|---|---|---|
| Type Information | Compile-time only; erased at runtime | Runtime values only; not structurally queryable | Persistent datoms; queryable at any point in history |
| AST Structure | Discarded after bytecode generation | Discarded after initial parsing | Preserved as indexed datom stream |
| Execution State | Ephemeral stack/heap | Ephemeral stack/heap | Serializable continuation (also datoms) |
| Historical Changes | No record (unless versioned externally) | No record (unless versioned externally) | Complete append-only log |
| Queryability | Reflection only (limited) | Introspection only (object-local) | Full Datalog over entire program structure |
In traditional systems, the compiler or interpreter is a destructive transformation. High-level structure is consumed to produce low-level execution artifacts. In Yin.VM, the transformation is constructive. Structure is never destroyed... only augmented with execution state.
The architecture: DaoDB as the indexer
The key to making this practical is the division of labor between two components:
DaoDB: the specialized indexer
DaoDB is an interpreter that subscribes to the datom stream and continuously builds efficient query structures. It's responsible for:
- Ingesting the append-only stream of datoms
- Building and maintaining indexes optimized for Datalog queries
- Providing snapshot isolation for consistent reads
- Handling both local and distributed/P2P deployments
- Managing storage tiers (hot in RAM, cold on disk)
DaoDB is like Datomic but can run locally, distributed, or peer-to-peer. It can even split responsibilities... one instance handles transactions, another handles queries, a third handles storage... all coordinating over the same stream.
Yin.VM: the execution engine
Yin.VM is the interpreter that executes code. It's a CESK machine (Control, Environment, Store, Continuation) that operates on AST datoms. It's responsible for:
- Querying DaoDB for AST nodes at execution time
- Caching hot execution paths for performance
- Maintaining continuations (serializable execution state)
- Appending new datoms to record execution traces or state changes
- Subscribing to DaoDB for notifications when cached AST becomes stale
Critically, Yin.VM doesn't build its own data structures to represent the AST. It works directly with DaoDB's indexed view of the datom stream.
The synchronization dance
The relationship between DaoDB and Yin.VM creates an elegant synchronization pattern:
Query for execution
When Yin.VM needs to execute code, it queries DaoDB for the relevant AST nodes. This query can target:
- The latest snapshot (normal execution)
- A specific point in time (debugging historical behavior)
- A range of time (analyzing how code evolved)
- A filtered view (only certain attributes or entities)
The query returns datoms that represent the AST structure. Yin.VM can cache these locally for hot paths, avoiding repeated queries for the same code.
Subscribe for updates
Yin.VM registers with DaoDB to receive notifications when relevant datoms change. A predicate can be provided to filter notifications:
;; Only notify about changes to function definitions in namespace 'core'
(subscribe db
[:where
[?e :ast/type :ast.type/function]
[?e :ast/namespace "core"]])When new datoms arrive that match the predicate, Yin.VM invalidates its cache and re-queries. This enables live updates... code can be modified while running, and the interpreter adapts.
Append for effects
When Yin.VM executes code that produces side effects... writing to application state, logging events, or even modifying the AST itself... it appends new datoms to the stream. These flow back through DaoDB, are indexed, and become immediately queryable.
This creates a feedback loop:
- Yin.VM queries AST datoms from DaoDB
- Executes code, producing new datoms
- Appends datoms to the stream
- DaoDB indexes the new datoms
- Yin.VM receives notification of changes
- Cycle continues
Types as queryable structure
One of the most powerful implications of this architecture is that type information never disappears.
In Java, generic type parameters like List are erased at runtime for backward compatibility. The JVM only sees List. In Python, type hints are ignored by the interpreter entirely.
In Yin.VM, type declarations are datoms in the stream:
[fn-123 :fn/name "process-user" tx-1 {}]
[fn-123 :fn/param param-1 tx-1 {}]
[param-1 :param/name "user" tx-1 {}]
[param-1 :param/type type-user tx-1 {}]
[type-user :type/name "User" tx-1 {}]
[fn-123 :fn/return-type type-result tx-1 {}]This enables queries that are impossible in traditional systems:
;; Find all functions that accept User and return types defined today
[:find ?fn-name ?return-type
:where
[?fn :fn/param ?param]
[?param :param/type ?user-type]
[?user-type :type/name "User"]
[?fn :fn/name ?fn-name]
[?fn :fn/return-type ?return-type]
[?return-type :type/defined-at ?time]
[(> ?time today-start)]]Or:
;; Find all call sites that might be affected by changing User's fields
[:find ?call-site ?line
:where
[?call :ast/type :ast.type/call]
[?call :ast/function ?fn]
[?fn :fn/param ?param]
[?param :param/type ?type]
[?type :type/name "User"]
[?call :ast/location ?call-site]
[?call :ast/line ?line]]These aren't hypothetical queries. They're practical tools for understanding code impact, planning refactors, and building AI assistants that can reason about program structure.
Self-modification through streams
The most profound capability enabled by persistent ASTs is safe self-modification.
In traditional systems, self-modifying code is dangerous. It's hidden side effects, difficult to debug, and often a security vulnerability. But when the AST is explicit, versioned, and queryable, self-modification becomes just another kind of data transformation.
Consider an optimization agent that observes execution patterns:
- Observation: Yin.VM executes code, recording performance traces as datoms
- Analysis: An optimization agent queries DaoDB for hot paths and identifies an inefficiency
- Proposal: The agent generates new AST datoms representing an optimized version
- Evaluation: Another agent or human reviews the proposal by querying differences between ASTs
- Application: If approved, the new AST datoms are appended to the stream
- Notification: Yin.VM receives notification, invalidates cache, and begins using optimized code
The entire process is auditable. At any point, you can query: "Why did this function's implementation change?" and trace back through the datom history to see the performance observations, the agent's analysis, and the approval.
Making Git obsolete
This architecture suggests that traditional version control might be unnecessary within the system's domain. Git tracks changes to files as snapshots. The datom stream tracks changes to facts in continuous time.
| Capability | Git (File-Based) | Datom Stream (Fact-Based) |
|---|---|---|
| Unit of Change | Files (blobs) | Facts (datoms) |
| History Model | Snapshots at commits | Continuous append-only log |
| Branching | Explicit operation; merge conflicts | Implicit via stream forks; facts integrate naturally |
| Blame | git blame shows line-level history | Datalog query shows fact-level history with full context |
| Diff | Line-by-line text comparison | Semantic diff via queries (what facts changed, not what lines) |
Within Datom.World, you don't need Git to track code changes. The stream is the version control system. You can query any historical state, compare semantic differences, and trace the complete provenance of any fact.
Git remains useful for interoperating with the existing ecosystem... exporting code for traditional repositories, or importing external dependencies. But within the system, it's redundant.
The engineering trade-offs
This architecture is practical, but it involves deliberate trade-offs:
Performance: the query cost
Querying DaoDB for AST nodes adds latency compared to direct pointer access in traditional VMs. This is mitigated through:
- Caching: Hot execution paths are cached locally in Yin.VM
- Indexing: DaoDB uses specialized structures (like HISA) for fast queries
- Locality: Queries fetch only the AST subtrees needed, not the entire program
The result is that the "query overhead" only applies at boundaries... when entering a new function, invalidating cache after an update, or doing historical analysis. Within hot loops, execution speed approaches traditional interpreters.
Storage: the append-only cost
Never deleting anything means storage grows continuously. This is addressed through:
- Compression: Immutable logs compress extremely well
- Tiering: Hot data in RAM, warm on SSD, cold on cheap archival storage
- Garbage collection: DaoDB can prune obsolete indexes while preserving the log
The trade-off is deliberate: storage is cheap and getting cheaper. Lost history... the ability to debug past failures, understand evolution, or replay for compliance... is priceless.
Complexity: the mental model
The biggest barrier is not technical... it's conceptual. Developers must learn to think in:
- Streams: not request/response
- Interpreters: not services
- Queries: not call stacks
- Temporal reasoning: not just current state
This is a paradigm shift comparable to moving from imperative to functional programming. It requires new tools, new debugging techniques, and new intuitions about how programs work.
But the payoff is systems that can:
- Evolve themselves based on observed behavior
- Provide perfect auditability for compliance and debugging
- Collaborate seamlessly across distributed nodes
- Time-travel to any historical state for analysis
- Integrate AI agents that reason about code structure
The recursive property
The ultimate implication of this architecture is that it's turtles all the way down.
The datom stream contains application data. It also contains the AST of the application code. It also contains the AST of DaoDB itself. And the AST of Yin.VM. Everything is datoms observing datoms.
This creates a self-describing system where the boundary between "system" and "application" blurs. An AI agent can query not just your application's structure, but the structure of the runtime itself. Optimizations can apply to any layer. Debugging tools work the same way whether you're inspecting user data, application logic, or VM internals.
This is homoiconicity taken to its logical conclusion. Not just code-as-data, but everything-as-datoms.
When this architecture wins
This model is not universally optimal. It makes sense when:
- Auditability matters: Compliance, debugging, understanding system evolution
- Collaboration is key: Multiple agents or humans working on the same system
- Evolution is expected: Requirements will change in unpredictable ways
- User sovereignty matters: Data and code should be user-owned, not platform-owned
- AI integration is planned: Agents need to reason about and modify code
It's less suitable when:
- Microsecond latency is critical (high-frequency trading, real-time graphics)
- The problem domain is fully understood and stable
- Hardware resources are severely constrained
- The team cannot adopt a new paradigm
Conclusion: persistence as power
The key innovation of Yin.VM and DaoDB is making program structure persistent and queryable. This single decision cascades into capabilities that seem magical from a traditional perspective:
- Types that survive runtime and can be queried
- Self-modification that's auditable and safe
- Time travel debugging by default
- Agents that understand code structure
- Version control as a natural property of the system
These aren't bolted-on features. They're emergent properties of treating the AST as data and data as first-class.
The cost is complexity... a new mental model, new tooling, and careful attention to performance. The benefit is software that can evolve, adapt, and explain itself.
For systems where evolution and understanding matter more than raw speed, persistence is not a cost... it's the foundation of power.
Learn more:
- Yin.VM : the CESK continuation machine
- DaoDB : materialized views from streams
- Datom.World : platform overview
- Beyond Interfaces : why streams mirror evolution
- Universal AST vs Assembly : code representation
- Datalog as Compiler Infrastructure : queryable compilation
- Structure vs Interpretation : meaning through observation