The Babashka Path: Why I Switched Yin.VM from Rust to Clojure
The Rust Mistake
I started Yin.VM in Rust. The reasoning seemed sound: building a high-performance virtual machine requires low-level control, memory safety, speed, and cross-platform portability. Rust offered all four. The problem? I'm not a Rust expert.
What followed was predictable: months spent fighting the borrow checker, struggling with lifetimes, reimplementing basic data structures, and learning a language while simultaneously trying to build a novel VM architecture. Progress crawled. The vision was clear, but the foundation was quicksand.
Meanwhile, Clojure—a language I'm actually competent in—sat unused. The persistent data structures, powerful macros, and excellent REPL-driven development workflow were all available. But the assumption held: "serious systems need low-level languages."
That assumption was wrong.
The Babashka Lesson
Babashka is a fast-starting Clojure scripting environment built by Michiel Borkent. At its core is SCI (Small Clojure Interpreter)—a Clojure interpreter written in Clojure that runs on the JVM.
On the surface, this seems circular: why implement Clojure in Clojure? Why not write the interpreter in C or Rust for maximum performance?
Because Babashka didn't need maximum performance. It needed:
- Fast iteration - REPL-driven development beats compile cycles
- Rich ecosystem - Leverage existing Clojure libraries
- Maintainability - Write in a language the author knows deeply
- Focus - Spend time on novel features, not reinventing wheels
By standing on the JVM's shoulders and writing in Clojure, Babashka achieved something remarkable: a production-ready tool with fast startup times, extensive library support, and a tiny, maintainable codebase. The constraint—using a "slower" language—led to better design decisions.
The LuxLang Parallel
LuxLang took the same path. It's a Lisp that compiles to multiple targets (JVM, JavaScript, Python, Ruby, Lua). Instead of building a custom runtime from scratch, it started by bootstrapping on the JVM.
The result:
- Self-hosted compiler (Lux compiles itself)
- Multiple backend targets without rewriting the frontend
- Fast development velocity from day one
- Production-ready within a reasonable timeframe
LuxLang proved that ambitious language projects don't need to start from bare metal. They need to start from solid ground.
My Pivot
After months of slow progress in Rust, the decision became clear: switch to Clojure and follow the Babashka path.
The new architecture:
- Implement Yin.VM in Clojure - Leverage my actual expertise
- Use Datascript for AST persistence - A mature datalog database instead of building DaoDB from scratch
- Bootstrap from the JVM - Get a working system quickly
- Progressive enhancement - Self-host later, once the core is proven
This isn't admitting defeat. It's admitting reality.
Why This Matters
1. Competence Beats Performance
A working system in a "slower" language I know beats a non-existent system in a "faster" language I don't.
Rust's performance advantage is meaningless if I spend 80% of my time debugging lifetime errors instead of implementing features. Clojure's slower execution speed is irrelevant when the REPL enables 10x faster iteration.
The bottleneck wasn't the language's runtime. It was my expertise.
2. Datascript Removes Uncertainty Cheaply
The core hypothesis of Yin.VM—that ASTs can be stored as queryable datoms—is unproven. Building DaoDB from scratch in Rust meant months of work before validating the idea.
Datascript changes the equation. It's a mature, battle-tested datalog database implemented in ClojureScript. It can:
- Store AST nodes as entities with relationships
- Query with datalog (the same query language I'm planning for DaoDB)
- Run in-memory for fast iteration
- Provide a compatibility target if I eventually build DaoDB
Within hours, not months, I can test the viability of "AST as datoms":
(require '[datascript.core :as d])
;; Define AST schema
(def ast-schema
{:node/id {:db/unique :db.unique/identity}
:node/type {}
:node/parent {:db/valueType :db.type/ref}
:node/children {:db/valueType :db.type/ref
:db/cardinality :db.cardinality/many}})
;; Create AST database
(def conn (d/create-conn ast-schema))
;; Store a function AST
(d/transact! conn
[{:node/id "factorial"
:node/type :lambda
:node/params [{:node/type :param :node/name "n"}]
:node/body {:node/type :if
:node/condition {:node/type :call
:node/fn "="
:node/args ["n" 0]}}}])
;; Query for all function calls
(d/q '[:find ?call ?fn-name
:where
[?call :node/type :call]
[?call :node/fn ?fn-name]]
@conn)If it works, I keep building. If it doesn't, I pivot quickly. Either way, discovery happens in days, not months.
3. Focus on What's Actually Novel
Yin.VM's innovation isn't:
- A datalog database (Datomic, Datascript, and others exist)
- Persistent data structures (Clojure has these)
- A runtime (the JVM is mature and battle-tested)
The novel parts are:
- Universal AST as the execution substrate - Code across languages shares a semantic foundation
- Continuations as mobile agents - Computation moves to data
- Queryable runtime state - Datalog introspection of live execution
By using Clojure and Datascript, I can focus 100% of my effort on these novel aspects instead of reimplementing infrastructure that already exists.
4. Self-Hosting Becomes Tractable
The path to self-hosting is the same regardless of implementation language. But getting there faster matters.
In Clojure, the self-hosting roadmap is straightforward:
- Phase 1: Implement Yin.VM in Clojure, running on the JVM
- Phase 2: Implement the Yang compiler (Clojure → Universal AST)
- Phase 3: Use Yang to compile Yin.VM's own source code to Universal AST
- Phase 4: Yin.VM interprets/compiles its own Universal AST representation
- Phase 5: Bootstrap complete—Yin.VM runs itself
This is exactly how LuxLang achieved self-hosting. Start on a mature platform, build the compiler, then use the compiler to compile itself.
The Experimental Validation Plan
The Clojure + Datascript approach enables rapid experimentation. Here's my plan:
Phase 1: Validate Core Hypothesis (Weeks 1-4)
Experiment 1: Storage Overhead
Measure memory overhead of representing ASTs as datoms vs traditional tree structures. Use real-world code samples (Python stdlib, JavaScript libraries). Is 3x overhead acceptable? Where's the breaking point?
(defn compare-representations [ast]
(let [tree-size (count (pr-str ast))
datom-db (ast->datascript ast)
datom-size (count (pr-str @datom-db))]
{:tree tree-size
:datom datom-size
:ratio (/ datom-size tree-size)}))Experiment 2: Query Expressiveness
Can I naturally express compiler analyses as datalog queries?
;; Dead code detection
(d/q '[:find ?node
:where
[?node :node/type :lambda]
(not [?call :node/target ?node])]
@conn)
;; Inline opportunities
(d/q '[:find ?fn
:where
[?fn :node/type :lambda]
[?fn :node/size ?size]
[(< ?size 10)]
(single-use ?fn)]
@conn)If these queries feel awkward or perform poorly, that's critical feedback.
Experiment 3: Incremental Compilation
Can I express optimization passes as declarative rules?
(def constant-folding-rules
'[[(constant-foldable ?node ?result)
[?node :node/type :call]
[?node :node/fn ?op]
[(contains? #{+ - * /} ?op)]
[?node :node/args ?args]
(all-literal ?args)
(compute ?op ?args ?result)]])Phase 2: Build a Working System (Weeks 5-12)
Implement a minimal language (simple Lisp/Python subset) that:
- Parses to Universal AST
- Stores in Datascript
- Renders to multiple syntaxes
- Evaluates directly from AST
- Supports lightweight continuations
Phase 3: Decide on DaoDB
After validation, the data will show whether I need DaoDB:
Build DaoDB if:
- Storage overhead >5x becomes problematic
- Query performance is a bottleneck
- Distributed persistence is required (Datascript is in-memory only)
- Append-only immutable history (like Datomic) is essential
Stick with Datascript if:
- Overhead is acceptable
- Performance is sufficient
- In-memory storage works for my use cases
- Maturity and community support matter more than custom optimization
If I build DaoDB later, Datascript becomes my compatibility baseline. The API will be compatible, making migration straightforward.
The Hard Lesson
Here's what the Rust experience taught me:
Ambitious projects fail not because the vision is wrong, but because the foundation is unsuitable.
My vision for Yin.VM was always solid: a VM where ASTs are queryable data, where continuations are mobile agents, where code is a universal semantic substrate. That vision didn't change.
What changed was accepting that building it in Rust—a language I'm not expert in—was slowing everything down. The "right" choice on paper (low-level control, performance) was the wrong choice in practice (unfamiliarity, slow iteration).
Switching to Clojure wasn't giving up on performance. It was prioritizing progress. And here's the surprising part: I might never need to rewrite in Rust.
LuxLang proved this. It started in Clojure on the JVM, achieved self-hosting, and now compiles to multiple targets—all without ever rewriting the compiler in a lower-level language. The Clojure implementation is the production implementation.
Could I circle back to Rust later? Maybe. But LuxLang demonstrates it might not even be necessary. Self-hosting from a Clojure base is entirely viable.
Why Babashka and LuxLang Got It Right
Both projects understood something fundamental:
The goal isn't to build everything from scratch. The goal is to build what's novel on top of what's proven.
- Babashka didn't reimplement the JVM. It leveraged GraalVM's native-image and focused on fast-starting Clojure scripts.
- LuxLang didn't build a custom runtime. It compiled to existing platforms (JVM, JavaScript) and focused on the language design.
- Yin.VM doesn't need a custom datalog database yet. I can use Datascript and focus on AST execution semantics.
This isn't compromise. It's focus. Babashka is production-ready. LuxLang is self-hosted and still implemented in Clojure. Both achieved their goals by building on mature foundations—and neither needed to rewrite in C or Rust to be successful.
The Path Forward
My immediate work:
- Implement Yin.VM core in Clojure
- Store ASTs in Datascript with a clean abstraction layer
- Build the Yang compiler (Clojure → Universal AST)
- Validate the core hypothesis with real code samples
- Measure everything: overhead, query performance, developer ergonomics
Within a month, I'll know if "AST as datoms" is viable. Within three months, I'll have a working prototype. Within six months, self-hosting becomes possible.
And then? The system might stay in Clojure indefinitely, like LuxLang. Or I might eventually port performance-critical components to Rust once I understand exactly where the bottlenecks are. The difference: I'll make that decision based on data from a working system, not speculation about what might be needed.
If I'd continued the Rust path? None of these milestones would be realistic. I'd still be fighting the borrow checker instead of implementing features.
The Meta-Lesson
Use the tools you know. Stand on platforms that work. Build only what's truly novel.
The temptation with visionary projects is to build everything from first principles. "If I'm building something revolutionary, shouldn't I use revolutionary tools?"
No.
Revolutionary ideas are hard enough. Implementing them with unfamiliar tools makes an already-difficult problem impossible.
Babashka used Clojure. LuxLang used the JVM. Yin.VM uses Clojure and Datascript. None of these choices diminish the novelty of what they're building. They enable it.
The fastest path to innovation isn't starting from zero. It's standing on the shoulders of giants and focusing your novel work on the pieces that truly need to be novel.
For Yin.VM, that means the Universal AST execution model, mobile continuations, and queryable runtime state. Everything else? Use what works.
Learn more:
- AST Datom Streams: Bytecode Performance with Semantic Preservation (how AST datoms achieve bytecode-like performance)
- Why yin.vm Succeeds Where Previous Attempts Failed (addressing skeptical objections to this architecture)
- DaoDB (the eventual production database, built after validation)
- Yin VM Documentation (technical deep dive)