Semantic Bytecode Benchmarks: The Cost of Queryability

The Experiment

In Universal AST vs Assembly, we argued that the Universal AST should be "RISC for semantics": low-level in form (flat, explicit datoms) but high-level in meaning (preserving semantic relationships). We implemented this idea in yin.bytecode, creating two bytecode representations:

  • Traditional (numeric): Sequential bytes like [1 0 1 1 4 2] (opcode + operands)
  • Semantic (datoms): Queryable triples like [[:node/1 :op/type :literal] [:node/1 :op/value 42]]

The question: what does queryability cost?

Methodology

We ran identical benchmarks across three Clojure platforms:

  • CLJ (JVM): Clojure on Java 21
  • CLJS (Node.js): ClojureScript on Node 22
  • CLJD (Dart): ClojureDart on Dart 3.4

Each benchmark:

  1. Warms up the runtime (500 iterations)
  2. Runs 5000 iterations of compilation
  3. Runs 5000 iterations of execution
  4. Reports microseconds per operation

Test cases range from trivial (literal value) to complex (nested lambda with multiple operations):

;; Literal
{:type :literal :value 42}

;; Addition
{:type :application
 :operator {:type :variable :name '+}
 :operands [{:type :literal :value 10}
            {:type :literal :value 20}]}

;; Lambda
((fn [x] (+ x 1)) 10)

;; Nested
((fn [x] (+ x (* 2 3))) 10)

;; Conditional
(if (< 5 10) :yes :no)

Results: Compilation Performance

How much slower is semantic bytecode to compile?

Test CaseCLJ (JVM)CLJS (Node)CLJD (Dart)
Literal6.4x7.2x11.4x
Addition5.1x6.2x10.1x
Lambda9.3x4.9x5.7x
Nested6.2x3.9x4.3x
Conditional3.3x4.9x6.9x

Semantic bytecode is 4-11x slower to compile. This overhead comes from:

  • Generating unique node IDs for each instruction
  • Creating datom triples instead of appending bytes
  • Building the constant pool with keyword references
  • Preserving metadata (arity, params, value types)

Results: Execution Performance

How much slower is semantic bytecode to execute?

Test CaseCLJ (JVM)CLJS (Node)CLJD (Dart)
Literal2.3x7.3x4.3x
Addition1.0x2.0x2.3x
Lambda1.7x2.5x3.0x
Nested2.5x2.3x2.4x
Conditional1.1x1.4x1.9x

Semantic bytecode is 1-7x slower to execute. The overhead comes from:

  • Graph traversal by node reference (vs sequential byte reading)
  • Map lookups for node attributes (vs array indexing)
  • Building node attribute maps at each step

Notably, the JVM shows the smallest execution overhead (often near 1x). The JVM's optimizing JIT compiler can inline the map lookups and eliminate much of the abstraction cost.

Results: Query Performance

The key advantage of semantic bytecode is queryability. How fast are queries?

QueryCLJCLJSCLJD
find-applications3.51 us8.62 us9.78 us
find-lambdas3.23 us5.66 us4.67 us
find-variables1.45 us6.18 us4.68 us

These queries find all nodes of a given type in the compiled bytecode. With traditional numeric bytecode, these queries are impossible without parsing the byte stream and reconstructing semantic information.

Example queries enabled by semantic bytecode:

;; Find all function applications
(bc/find-applications datoms)  ; => #{:node/4 :node/7 :node/9}

;; Find all lambdas
(bc/find-lambdas datoms)       ; => #{:node/5}

;; Get attributes for a specific node
(bc/get-node-attrs datoms :node/5)
; => {:op/type :lambda
;     :op/arity 1
;     :op/params [x]
;     :op/captures-env? true}

The Real Win: Semantic JIT Optimization

Fast queries are the biggest benefit of semantic bytecode, not despite the execution overhead, but because they enable optimizations that are impossible with traditional bytecode.

Consider what a JIT compiler can do with queryable bytecode:

  • Find all call sites for a function and inline them
  • Detect pure functions (no mutations) and memoize results
  • Identify hot loops by querying application patterns
  • Specialize polymorphic calls based on observed argument types
  • Eliminate dead code by finding unreachable nodes
  • Constant-fold expressions where all inputs are literals

With traditional bytecode, the JIT must reconstruct this semantic information by analyzing opcode patterns. With semantic bytecode, it's a 3-10 microsecond query. This changes what optimizations are practical at runtime.

;; JIT optimization example: find all pure function calls
[:find ?call ?fn
 :where
 [?call :op/type :apply]
 [?call :op/operator-node ?fn]
 [?fn :op/type :lambda]
 [?fn :op/mutates? false]]  ; Only possible with semantic bytecode

The execution overhead of semantic bytecode may be self-correcting: the same queryability that costs 2x in naive interpretation enables JIT optimizations that can recover (or exceed) that performance.

Size Comparison

For the nested AST ((fn [x] (+ x (* 2 3))) 10):

RepresentationUnitsPool Entries
Semantic39 datoms7
Traditional22 bytes7

Semantic bytecode is currently more verbose in terms of instruction units, but each datom carries queryable semantic information that raw bytes lack. Furthermore, this overhead is not fundamental: datoms can be bitpacked into 64-bit numbers. Once this optimization is implemented, semantic bytecode will achieve size parity with traditional bytecode while retaining its superior queryability.

Platform Comparison

Raw execution times for the nested AST benchmark (microseconds per call):

PlatformSemantic CompileLegacy CompileSemantic ExecuteLegacy Execute
CLJ (JVM)21.283.4216.226.52
CLJS (Node)80.1820.4271.4030.78
CLJD (Dart)46.9110.8242.9517.81

The JVM is fastest overall, with Dart in the middle and Node.js slowest. The relative overhead of semantic bytecode is roughly consistent across platforms (approximately 4-6x compile, 2-2.5x execute for this test case).

The Tradeoff

Traditional bytecode optimizes for execution speed:

  • Compact representation (bytes)
  • Sequential access (program counter)
  • Minimal allocation during execution

Semantic bytecode optimizes for introspection:

  • Queryable representation (datoms)
  • Graph traversal (node references)
  • Preserved semantic relationships

This mirrors the distinction in Universal AST vs Assembly: assembly optimizes for hardware execution, while the Universal AST optimizes for semantic preservation. Crucially, semantic preservation can be used to optimize for hardware execution (via JIT), but the reverse is not true: you cannot recover lost semantics from optimized assembly.

When to Use Each

Use Traditional Bytecode When:

  • Execution speed is critical
  • Memory is constrained
  • You only need to run code, not analyze it
  • The bytecode is a throwaway intermediate representation

Use Semantic Bytecode When:

  • You need to query or transform the code
  • The bytecode is a persistent representation (stored in DaoDB)
  • You want to analyze code structure (find all mutations, all calls, etc.)
  • You need provenance tracking or debugging
  • Cross-language translation requires semantic preservation

The Hybrid Approach

In practice, Yin.vm can use both representations simultaneously:

  1. Semantic bytecode stored in DaoDB for queries and analysis
  2. Traditional bytecode compiled on-demand for hot execution paths

This is the "full-speed-plus-full-introspection" approach mentioned in the Yin.vm overview. The AST (as semantic datoms) is the canonical representation. Traditional bytecode is an optimization, compiled from the AST when needed.

Room for Optimization

Important: These benchmarks reflect an unoptimized implementation. No performance tuning has been applied to the semantic bytecode VM. The current implementation prioritizes clarity over speed.

Potential optimizations include:

  • Bitpacking datoms into 64-bit numbers for memory parity with traditional bytecode
  • Pre-indexing datoms by node ID (O(1) lookup instead of linear scan)
  • Caching node attribute maps during execution
  • Using arrays instead of vectors for hot paths
  • Compiling frequently-executed nodes to native closures
  • Lazy datom materialization (only create what's queried)

We believe that with these optimizations, semantic bytecode execution can approach traditional bytecode performance while retaining full queryability. The JVM results (often near 1x overhead) suggest the abstraction cost is not fundamental, just unoptimized.

Conclusion

The benchmarks quantify the current, unoptimized state:

  • Semantic bytecode costs 4-11x in compilation time
  • Semantic bytecode costs 1-7x in execution time (unoptimized)
  • Semantic bytecode enables queries that take 1-10 microseconds
  • Traditional bytecode cannot be queried at all

The cost of introspection is real but bounded, and likely reducible with optimization. For most applications, even the current 2-3x execution overhead is acceptable when the alternative is losing the ability to query and analyze code at all.

More importantly, these benchmarks validate the architectural choice: semantic preservation has a measurable cost, and that cost is worth paying when code needs to be more than just executed.

Related posts: