A tale of 4 Virtual Machines: Universal AST Benchmarks

In datom.world, the Universal AST serves as the canonical representation of code. Because everything is data, we can project this same AST into multiple execution models. We recently benchmarked our four primary VM implementations using a tail-recursive countdown (n=1000) to measure the overhead of each interpretation strategy.

Benchmark Results

The results confirm that the closer we get to machine-like abstractions, the higher the performance ceiling. Notably, the Semantic VM—once the slowest—now outpaces the AST Walker by a significant margin:

VM ImplementationMean Execution TimeRelative Performance
Register VM2.45 ms1.0x (Fastest)
Stack VM3.82 ms1.6x slower
Semantic VM4.38 ms1.8x slower
AST Walker VM6.59 ms2.7x slower

Why the Register VM Wins

The Register VM remains our performance leader due to three primary architectural advantages:

1. Reduced Instruction Density

A register machine expresses complex operations with fewer instructions. A single 'sub' instruction replaces the load-push-sub-store cycle of a stack machine, significantly reducing the overhead of the VM's internal dispatch loop.

2. Minimized Data Shuffling

Unlike the Stack VM, which must 'shuffle' operands via persistent peek and pop operations, the Register VM accesses operands via direct index lookups. This stability minimizes allocation pressure and structural updates.

The Rise of the Semantic VM

The most surprising result is the Semantic VM's massive performance leap, moving from ~27ms to ~4.4ms. By embracing its 'Compiled into Datoms' nature and implementing a linear execution stream, it now executes 50% faster than the AST Walker (which interprets native Clojure maps).

How Datoms Beat Trees

We achieved this by projecting the flat datom stream into a high-performance execution substrate:

  • Linear Execution Stream: By projecting the datom stream into a flat, array-backed index with primitive integer dispatch, we eliminated the overhead of persistent map lookups.
  • Flat Array Stack: We replaced the persistent vector-based continuation stack with a high-performance Object array in the hot loop, drastically reducing allocation pressure.
  • Compact Node Projection: Datoms are projected into compact records (type tag + attribute map), improving cache locality and memory efficiency.
  • Integer Dispatch: The hot loop utilizes primitive integer 'case' dispatch, enabling the JVM to generate optimal branch tables (tableswitch).

The Robustness Tax

During our optimization pass, we observed a 5% performance difference between a 'naked' array implementation (~4.5ms) and a 'robust' implementation (~4.7ms). This small delta represents the cost of restoring the safety guarantees inherent in Clojure's original persistent data structures.

Trading O(1) Copying for O(1) Stepping

Clojure's persistent vectors provide near-zero overhead copying through structural sharing, but they incur an O(log32 n) traversal tax on every push and pop. Our Object array stack provides true O(1) stepping, but requires explicit O(n) memory copies when growing the stack or materializing state for effects.

Where the 5% Goes

The robust version introduces three critical safeguards:

  • Dynamic Resizing: The stack now checks its capacity on every push and grows iteratively when needed, preventing array-bound crashes during deep recursion.
  • Safe Index Lookups: Every node transition now includes a range guard, safely falling back to map lookups for cross-namespace or dynamically generated entity IDs.
  • Transient Materialization: State synchronization now uses transient-based loops to minimize GC pressure, adding slight complexity compared to raw vector coercion.

We consider this 'robustness tax' a mandatory investment. It ensures the Semantic VM remains as stable as the AST Walker while retaining its newly won 50% performance lead over tree-walking.

Reproducibility

To reproduce these results, you can run the benchmarking suite directly from the project root using the Clojure CLI:

clj -M:bench --fast-only 1000

This command evaluates the 'fast path' (optimized VM loop) for all four implementations. You can also target specific VMs using flags like --semantic-only or --register-only.