A tale of 4 Virtual Machines: Universal AST Benchmarks
In datom.world, the Universal AST serves as the canonical representation of code. Because everything is data, we can project this same AST into multiple execution models. We recently benchmarked our four primary VM implementations using a tail-recursive countdown (n=1000) to measure the overhead of each interpretation strategy.
Benchmark Results
The results confirm that the closer we get to machine-like abstractions, the higher the performance ceiling. Notably, the Semantic VM—once the slowest—now outpaces the AST Walker by a significant margin:
| VM Implementation | Mean Execution Time | Relative Performance |
|---|---|---|
| Register VM | 2.45 ms | 1.0x (Fastest) |
| Stack VM | 3.82 ms | 1.6x slower |
| Semantic VM | 4.38 ms | 1.8x slower |
| AST Walker VM | 6.59 ms | 2.7x slower |
Why the Register VM Wins
The Register VM remains our performance leader due to three primary architectural advantages:
1. Reduced Instruction Density
A register machine expresses complex operations with fewer instructions. A single 'sub' instruction replaces the load-push-sub-store cycle of a stack machine, significantly reducing the overhead of the VM's internal dispatch loop.
2. Minimized Data Shuffling
Unlike the Stack VM, which must 'shuffle' operands via persistent peek and pop operations, the Register VM accesses operands via direct index lookups. This stability minimizes allocation pressure and structural updates.
The Rise of the Semantic VM
The most surprising result is the Semantic VM's massive performance leap, moving from ~27ms to ~4.4ms. By embracing its 'Compiled into Datoms' nature and implementing a linear execution stream, it now executes 50% faster than the AST Walker (which interprets native Clojure maps).
How Datoms Beat Trees
We achieved this by projecting the flat datom stream into a high-performance execution substrate:
- Linear Execution Stream: By projecting the datom stream into a flat, array-backed index with primitive integer dispatch, we eliminated the overhead of persistent map lookups.
- Flat Array Stack: We replaced the persistent vector-based continuation stack with a high-performance Object array in the hot loop, drastically reducing allocation pressure.
- Compact Node Projection: Datoms are projected into compact records (type tag + attribute map), improving cache locality and memory efficiency.
- Integer Dispatch: The hot loop utilizes primitive integer 'case' dispatch, enabling the JVM to generate optimal branch tables (tableswitch).
The Robustness Tax
During our optimization pass, we observed a 5% performance difference between a 'naked' array implementation (~4.5ms) and a 'robust' implementation (~4.7ms). This small delta represents the cost of restoring the safety guarantees inherent in Clojure's original persistent data structures.
Trading O(1) Copying for O(1) Stepping
Clojure's persistent vectors provide near-zero overhead copying through structural sharing, but they incur an O(log32 n) traversal tax on every push and pop. Our Object array stack provides true O(1) stepping, but requires explicit O(n) memory copies when growing the stack or materializing state for effects.
Where the 5% Goes
The robust version introduces three critical safeguards:
- Dynamic Resizing: The stack now checks its capacity on every push and grows iteratively when needed, preventing array-bound crashes during deep recursion.
- Safe Index Lookups: Every node transition now includes a range guard, safely falling back to map lookups for cross-namespace or dynamically generated entity IDs.
- Transient Materialization: State synchronization now uses transient-based loops to minimize GC pressure, adding slight complexity compared to raw vector coercion.
We consider this 'robustness tax' a mandatory investment. It ensures the Semantic VM remains as stable as the AST Walker while retaining its newly won 50% performance lead over tree-walking.
Reproducibility
To reproduce these results, you can run the benchmarking suite directly from the project root using the Clojure CLI:
clj -M:bench --fast-only 1000This command evaluates the 'fast path' (optimized VM loop) for all four implementations. You can also target specific VMs using flags like --semantic-only or --register-only.