Why LLMs Need Structured Code: The Yin.vm Approach

The Core Insight: AST as Datom

Most virtual machines treat code and data as separate domains. Source code compiles to bytecode, bytecode executes and manipulates data, and introspection requires complex debugging infrastructure. Yin.vm takes a radically different approach: everything is a datom.

A datom is an immutable five-element tuple: [entity attribute value time causality]. In Yin.vm, your program's abstract syntax tree (AST) is encoded directly as datoms:

;; A simple function represented as datoms
[fn-1 :yin/op :apply t c]
[fn-1 :yin/name "add" t c]
[fn-1 :yin/args [arg-1 arg-2] t c]
[arg-1 :yin/type :i64 t c]
[arg-2 :yin/type :i64 t c]
[fn-1 :yin/body body-1 t c]
[body-1 :yin/op :+ t c]
[body-1 :yin/operands [arg-1 arg-2] t c]

This seemingly simple choice has profound implications. When your AST is data, stored in DaoDB's distributed tuple store, you get:

  • Queryable code: Use Datalog to ask "show me all functions that access the filesystem"
  • Mobile continuations: Pause execution, serialize as datoms, resume anywhere in the network
  • Time-traveling execution: Query what your program looked like at any point in history
  • Cross-language interop: The same AST can render as Python, Rust, or Clojure
  • LLM integration: Models operate on structured graphs, not text

From Programming Languages to Yin

Traditional languages each have their own surface syntax and runtime semantics. Yin.vm provides a universal semantic layer—a canonical AST that multiple languages can compile to and from.

Why Not Just Use Lisp/Clojure?

Lisp and Clojure code is famously close to being an AST—the parenthesized S-expressions directly reflect tree structure. This is why Lisp has powerful macro systems and homoiconicity (code as data). However, Lisp AST carries hidden assumptions that prevent it from being truly universal:

  • Evaluation model: Lisp assumes eager evaluation by default, with special forms for delayed evaluation. Many languages need lazy evaluation, strict evaluation, or other models as primitives.
  • Namespace and scoping: Lisp's dynamic scope (in some dialects) and lexical scope assumptions don't map cleanly to all languages.
  • Data structure semantics: Lisp lists are linked lists with specific performance characteristics. Other languages need vectors, arrays, or different collection semantics.
  • Type assumptions: Clojure assumes dynamic typing with optional specs. Compiling from statically-typed languages requires preserving type information that Clojure's model doesn't natively represent.
  • Mutation model: Clojure is immutable-by-default with explicit mutation. Languages like C++ or Rust need fine-grained control over mutation, ownership, and borrowing that Lisp doesn't express.

Yin.vm's universal AST makes everything explicit. Instead of inheriting Lisp's evaluation model, it provides explicit continuation primitives. Instead of assuming immutability, it makes mutation explicit with clear semantics. Instead of Lisp's list-oriented data model, it uses datoms that don't carry collection-type assumptions.

This means front-end languages—whether Python, Rust, Java, or Clojure itself—can compile to Yin AST without ambiguity or hidden semantic mismatches. The universal AST is a lower-level semantic primitive than even Lisp, precisely because it doesn't assume any particular evaluation strategy or data model.

Language Compilation: Deterministic and Sound

For conventional programming languages, Yin acts as a stable compilation target:

  • Each language has a deterministic compiler: Lang AST → Yin AST
  • Type-checked subsets can provide formal guarantees
  • Verified transformations preserve semantics
  • Cross-language calls become: "pass a continuation/datom sequence to Yin"

Consider how Clojure code becomes callable from Rust. You don't expose raw AST datoms to Rust—instead, a build pipeline generates type-safe bindings:

;; Clojure function annotated for FFI export
^{:yin/export true
  :yin/ffi {:name "add_i64"
            :params [[:i64 :a] [:i64 :b]]
            :ret :i64}}
(defn add [a b]
  (+ a b))

The Yang compiler (Clojure → Yin) emits:

  1. Yin bytecode with a :code-id
  2. An FFI manifest describing exported symbols

A Rust build.rs script reads the manifest and generates:

// Auto-generated Rust wrapper
pub fn add_i64(vm: &mut YinVm, a: i64, b: i64) -> Result {
    let args = [Value::I64(a), Value::I64(b)];
    let res = vm.call("my.app.math/add", &args)?;
    match res {
        Value::I64(n) => Ok(n),
        other => Err(Error::TypeMismatch),
    }
}

To the Rust compiler, this is just a normal function with clear types. The AST/datom magic stays hidden inside the Yin runtime. Cross-language interop becomes straightforward because all languages speak the same continuation language.

LLMs as Meta-Compilers

When an LLM interacts with code, the relationship is fundamentally different from traditional compilers. LLMs don't provide deterministic mappings—they propose probabilistic transformations. But Yin's AST-as-datom design provides a structured semantic bottleneck that constrains LLM behavior.

Structured Generation, Not Text Synthesis

Instead of generating free-form code text, LLMs emit structured datoms:

  • Use function calling / tools to generate EDN/JSON schemas
  • Output Yin AST nodes with known opcodes and arities
  • Validation layer checks schema, types, and policies
  • Invalid proposals are rejected before execution

The universal AST provides a small, closed vocabulary of operations—far easier to align an LLM to than full Python or C++. The model learns to compose from primitives like :apply, :let, :if, :loop rather than inventing arbitrary syntax.

Continuations as Agent State

Because everything in Yin is a continuation, LLMs can:

  • Request the current continuation as data (VM snapshot)
  • Transform it (insert logging, branch, replace sub-computation)
  • Return a new continuation as a patch or datom stream

The LLM isn't just writing programs once—it's inspecting, editing, and splicing new behavior into running computations. This turns Yin into a live substrate for interactive agent workflows.

Contract Layer for Safety

LLMs are probabilistic, so you need contracts on the universal AST:

  • Type/shape checking: Ensure opcodes exist and arguments match expected types
  • Capability checks: No I/O ops in sandboxed contexts
  • Gas/cost annotations: Ops and continuations carry resource budgets
  • Policy enforcement: Datalog rules reject datoms violating constraints

When a contract fails, Yin rejects the continuation and the LLM receives an error. It can then propose a different patch. The AST becomes a sandboxable capability language rather than unconstrained code generation.

AST + Continuations in DaoDB

Storing Yin's AST and continuations as datoms in DaoDB—a distributed, P2P tuple store—fundamentally changes what "code" means.

Code as Queryable Data

Programs aren't files anymore—they're datasets:

;; Find all continuations blocked on HTTP requests
(d/q '[:find ?cont
       :where
       [?cont :yin/state :blocked]
       [?cont :yin/blocked-on ?op]
       [?op :yin/type :net/http/request]]
     db)

;; Find functions that capture many free variables (memory hogs)
(d/q '[:find ?fn
       :where
       [?fn :yin/free-vars ?vars]
       [(count ?vars) ?n]
       [(> ?n 10)]]
     db)

Datalog becomes your global introspection language for the entire distributed computation graph. You can enforce invariants like "no continuation tagged :untrusted may call :op/shell-exec".

Mobile Code and Live Patching

Because ASTs and continuations are datoms:

  • Shipping code means replicating datoms to another node
  • Migrating computation means copying continuation datoms and resuming elsewhere
  • Hot patching means writing new code datoms and updating code-id references

Programs become truly mobile—they can pause on one machine, travel across the network as datom streams, and resume on different hardware or even in a different language runtime.

System-Wide Optimization

With all code queryable in DaoDB, you can run scheduled Datalog rules that:

  • Identify hot paths (frequently executed code-ids)
  • Mark them for JIT compilation or specialization
  • Track provenance: which code derived from which templates
  • Audit: which peers contributed to library evolution

The distributed database becomes an optimization oracle and code history system.

LLMs Operating Over Distributed Code

When you combine LLMs with AST-as-datom in DaoDB, the model's role shifts from "code generator" to meta-compiler and refactoring agent.

Retrieval-Augmented Programming

Instead of generating from scratch, the LLM:

  1. Queries DaoDB for existing functions and continuations
  2. Finds components that implement required capabilities
  3. Synthesizes new continuations that compose existing pieces

This reduces hallucination—the model recombines known, working code rather than inventing everything.

Program Repair and Explanation

When a failure occurs:

  1. Yin writes trace datoms describing the failed continuation
  2. LLM queries AST and trace slices from DaoDB
  3. Proposes structural fixes as AST patches
  4. Yin validates under policy constraints
  5. New code-id versions are published

The LLM can also narrate what happened by bridging low-level AST datoms and high-level human context. It becomes a debugging assistant operating on structured execution traces.

Autonomous Optimization

LLMs can periodically inspect:

  • Which code paths dominate CPU time
  • Which continuations often fail
  • Where type instability causes overhead

Then propose rewritten or specialized AST variants. DaoDB versioning allows safe rollout and rollback.

Eliminating Hallucinations with Ontology as Type System

In another design discussion, we explored using a knowledge graph with ontology as an invariant representation—a third "basis" alongside LLM vectors and natural language. This technique integrates perfectly with Yin/DaoDB.

Ontology Embedded in DaoDB

DaoDB can hold both world knowledge (entities, relationships) and program knowledge (ASTs, continuations):

;; Ontology layer
[:person/alice :rdf/type :Person t c]
[:person/alice :person/age 42 t c]
[:person/age :rdf/domain :Person t c]
[:person/age :rdf/range :xsd/int t c]

;; Program layer
[fn-1 :yin/op :apply t c]
[fn-1 :yin/operates-on :Account t c]  ; references ontology

Programs reference ontology concepts. Functions declare they operate on specific types like :Account or :Payment. Validators ensure:

  • All referenced predicates exist in the ontology
  • Argument types match domain/range constraints
  • Capabilities (caps, tenancy, auth) are enforced

Hallucinations as Invalid Graph Moves

When an LLM proposes changes, the system enforces:

  • Knowledge claims must be backed by datoms in the graph or marked as speculative
  • Code patches must pass ontology consistency checks
  • References to nonexistent entities/predicates are rejected

Example: if the LLM invents :account/magicField, the validator checks the ontology, finds no such predicate, and rejects the patch. The hallucination dies at the boundary.

Ontology as Type System

The ontology + knowledge graph effectively is a rich type system:

  • Logical types: Person, Account, Payment
  • Relations: owns, debited-from, authorized-by
  • Constraints: domain/range, cardinality, uniqueness

This is stronger than traditional type systems because it encodes domain semantics, not just data shapes. An LLM operating within this ontology can't generate semantically invalid operations—the graph structure prevents it.

Trust Boundaries via Provenance

Tag datoms by source:

  • :source/human (manually authored)
  • :source/system (verified and trusted)
  • :source/llm (proposed by model, pending review)
  • :source/oracle (from external authoritative source)

Business-critical decisions only depend on datoms with high-trust provenance. LLM-generated datoms remain low-trust until upgraded by validation or human approval. Hallucinations become quarantined suggestions, not silently absorbed facts.

A New Computational Substrate

This architecture creates something genuinely novel:

  • Programs that are datasets, queryable, versioned, and distributed
  • Continuations that migrate, pausing, serializing, and resuming anywhere
  • LLMs as refactoring agents, operating on structured graphs instead of text
  • Ontology-enforced correctness, with hallucinations rejected at system boundaries
  • Self-observing, self-modifying systems within strict policy constraints

The boundaries between static/dynamic, data/code, and local/distributed dissolve. What emerges is a distributed computational medium where:

  • Structure (datoms) is the primitive
  • Semantics (AST) is preserved across transformations
  • Interpretation (Yin VM, LLMs, other agents) varies by context
  • Truth (ontology + contracts) is enforced by the substrate

Implications for Software Development

If code is data, continuations are mobile, and LLMs are meta-compilers:

For Traditional Developers

  • Write in your preferred language (Clojure, Rust, Python)
  • Compile to Yin's universal AST with semantic preservation
  • Interop across languages without FFI pain
  • Query your entire codebase with Datalog
  • Time-travel debug any execution state

For AI-Assisted Development

  • LLMs propose structured AST patches, not text
  • Contracts and ontologies catch errors before execution
  • Retrieval-augmented programming reduces hallucination
  • Autonomous optimization and repair in production
  • Explainable systems through trace introspection

For Distributed Systems

  • Mobile code that migrates between nodes seamlessly
  • P2P synchronization via DaoDB's entanglement
  • Global introspection across the computation graph
  • Policy-enforced security at the datom level
  • Live patching without downtime

Conclusion: Beyond the VM

Yin.vm isn't just another virtual machine. By encoding the AST as datoms and storing everything in DaoDB's distributed tuple space, it creates a new kind of computational substrate:

  • Code and data share the same representation
  • Execution state (continuations) is queryable and mobile
  • LLMs operate as constrained meta-compilers over structured graphs
  • Ontologies act as semantic type systems preventing hallucination
  • The entire system is distributed, versioned, and introspectable

This is conceptual compression at its finest: one shape of data (datom), one shape of computation (continuation), one substrate (stream). Everything else emerges from these primitives.

When AST becomes data, programs become mobile, and LLMs become architects, we transcend the traditional boundaries of programming. What emerges is closer to a self-aware distributed operating system—one where code, knowledge, and execution coexist in a unified, queryable, evolving computational fabric.

Learn More: