Semantic Bytecode Benchmarks: The Cost of Queryability

Published Jan 26, 2026

The Experiment

In Universal AST vs Assembly, we argued that the Universal AST should be "RISC for semantics": low-level in form (flat, explicit datoms) but high-level in meaning (preserving semantic relationships). We implemented this idea in yin.bytecode, creating two bytecode representations:

Traditional (numeric): Sequential bytes like [1 0 1 1 4 2] (opcode + operands)
Semantic (datoms): Queryable triples like [[:node/1 :op/type :literal] [:node/1 :op/value 42]]

The question: what does queryability cost?

Methodology

We ran identical benchmarks across three Clojure platforms:

CLJ (JVM): Clojure on Java 21
CLJS (Node.js): ClojureScript on Node 22
CLJD (Dart): ClojureDart on Dart 3.4

Each benchmark:

Warms up the runtime (500 iterations)
Runs 5000 iterations of compilation
Runs 5000 iterations of execution
Reports microseconds per operation

Test cases range from trivial (literal value) to complex (nested lambda with multiple operations):

;; Literal
{:type :literal :value 42}

;; Addition
{:type :application
 :operator {:type :variable :name '+}
 :operands [{:type :literal :value 10}
            {:type :literal :value 20}]}

;; Lambda
((fn [x] (+ x 1)) 10)

;; Nested
((fn [x] (+ x (* 2 3))) 10)

;; Conditional
(if (< 5 10) :yes :no)

Results: Compilation Performance

How much slower is semantic bytecode to compile?

Test Case	CLJ (JVM)	CLJS (Node)	CLJD (Dart)
Literal	6.4x	7.2x	11.4x
Addition	5.1x	6.2x	10.1x
Lambda	9.3x	4.9x	5.7x
Nested	6.2x	3.9x	4.3x
Conditional	3.3x	4.9x	6.9x

Semantic bytecode is 4-11x slower to compile. This overhead comes from:

Generating unique node IDs for each instruction
Creating datom triples instead of appending bytes
Building the constant pool with keyword references
Preserving metadata (arity, params, value types)

Results: Execution Performance

How much slower is semantic bytecode to execute?

Test Case	CLJ (JVM)	CLJS (Node)	CLJD (Dart)
Literal	2.3x	7.3x	4.3x
Addition	1.0x	2.0x	2.3x
Lambda	1.7x	2.5x	3.0x
Nested	2.5x	2.3x	2.4x
Conditional	1.1x	1.4x	1.9x

Semantic bytecode is 1-7x slower to execute. The overhead comes from:

Graph traversal by node reference (vs sequential byte reading)
Map lookups for node attributes (vs array indexing)
Building node attribute maps at each step

Notably, the JVM shows the smallest execution overhead (often near 1x). The JVM's optimizing JIT compiler can inline the map lookups and eliminate much of the abstraction cost.

Results: Query Performance

The key advantage of semantic bytecode is queryability. How fast are queries?

Query	CLJ	CLJS	CLJD
find-applications	3.51 us	8.62 us	9.78 us
find-lambdas	3.23 us	5.66 us	4.67 us
find-variables	1.45 us	6.18 us	4.68 us

These queries find all nodes of a given type in the compiled bytecode. With traditional numeric bytecode, these queries are impossible without parsing the byte stream and reconstructing semantic information.

Example queries enabled by semantic bytecode:

;; Find all function applications
(bc/find-applications datoms)  ; => #{:node/4 :node/7 :node/9}

;; Find all lambdas
(bc/find-lambdas datoms)       ; => #{:node/5}

;; Get attributes for a specific node
(bc/get-node-attrs datoms :node/5)
; => {:op/type :lambda
;     :op/arity 1
;     :op/params [x]
;     :op/captures-env? true}

The Real Win: Semantic JIT Optimization

Fast queries are the biggest benefit of semantic bytecode, not despite the execution overhead, but because they enable optimizations that are impossible with traditional bytecode.

Consider what a JIT compiler can do with queryable bytecode:

Find all call sites for a function and inline them
Detect pure functions (no mutations) and memoize results
Identify hot loops by querying application patterns
Specialize polymorphic calls based on observed argument types
Eliminate dead code by finding unreachable nodes
Constant-fold expressions where all inputs are literals

With traditional bytecode, the JIT must reconstruct this semantic information by analyzing opcode patterns. With semantic bytecode, it's a 3-10 microsecond query. This changes what optimizations are practical at runtime.

;; JIT optimization example: find all pure function calls
[:find ?call ?fn
 :where
 [?call :op/type :apply]
 [?call :op/operator-node ?fn]
 [?fn :op/type :lambda]
 [?fn :op/mutates? false]]  ; Only possible with semantic bytecode

The execution overhead of semantic bytecode may be self-correcting: the same queryability that costs 2x in naive interpretation enables JIT optimizations that can recover (or exceed) that performance.

Size Comparison

For the nested AST ((fn [x] (+ x (* 2 3))) 10):

Representation	Units	Pool Entries
Semantic	39 datoms	7
Traditional	22 bytes	7

Semantic bytecode is currently more verbose in terms of instruction units, but each datom carries queryable semantic information that raw bytes lack. Furthermore, this overhead is not fundamental: datoms can be bitpacked into 64-bit numbers. Once this optimization is implemented, semantic bytecode will achieve size parity with traditional bytecode while retaining its superior queryability.

Platform Comparison

Raw execution times for the nested AST benchmark (microseconds per call):

Platform	Semantic Compile	Legacy Compile	Semantic Execute	Legacy Execute
CLJ (JVM)	21.28	3.42	16.22	6.52
CLJS (Node)	80.18	20.42	71.40	30.78
CLJD (Dart)	46.91	10.82	42.95	17.81

The JVM is fastest overall, with Dart in the middle and Node.js slowest. The relative overhead of semantic bytecode is roughly consistent across platforms (approximately 4-6x compile, 2-2.5x execute for this test case).

The Tradeoff

Traditional bytecode optimizes for execution speed:

Compact representation (bytes)
Sequential access (program counter)
Minimal allocation during execution

Semantic bytecode optimizes for introspection:

Queryable representation (datoms)
Graph traversal (node references)
Preserved semantic relationships

This mirrors the distinction in Universal AST vs Assembly: assembly optimizes for hardware execution, while the Universal AST optimizes for semantic preservation. Crucially, semantic preservation can be used to optimize for hardware execution (via JIT), but the reverse is not true: you cannot recover lost semantics from optimized assembly.

When to Use Each

Use Traditional Bytecode When:

Execution speed is critical
Memory is constrained
You only need to run code, not analyze it
The bytecode is a throwaway intermediate representation

Use Semantic Bytecode When:

You need to query or transform the code
The bytecode is a persistent representation (stored in DaoDB)
You want to analyze code structure (find all mutations, all calls, etc.)
You need provenance tracking or debugging
Cross-language translation requires semantic preservation

The Hybrid Approach

In practice, Yin.vm can use both representations simultaneously:

Semantic bytecode stored in DaoDB for queries and analysis
Traditional bytecode compiled on-demand for hot execution paths

This is the "full-speed-plus-full-introspection" approach mentioned in the Yin.vm overview. The AST (as semantic datoms) is the canonical representation. Traditional bytecode is an optimization, compiled from the AST when needed.

Room for Optimization

Important: These benchmarks reflect an unoptimized implementation. No performance tuning has been applied to the semantic bytecode VM. The current implementation prioritizes clarity over speed.

Potential optimizations include:

Bitpacking datoms into 64-bit numbers for memory parity with traditional bytecode
Pre-indexing datoms by node ID (O(1) lookup instead of linear scan)
Caching node attribute maps during execution
Using arrays instead of vectors for hot paths
Compiling frequently-executed nodes to native closures
Lazy datom materialization (only create what's queried)

We believe that with these optimizations, semantic bytecode execution can approach traditional bytecode performance while retaining full queryability. The JVM results (often near 1x overhead) suggest the abstraction cost is not fundamental, just unoptimized.

Conclusion

The benchmarks quantify the current, unoptimized state:

Semantic bytecode costs 4-11x in compilation time
Semantic bytecode costs 1-7x in execution time (unoptimized)
Semantic bytecode enables queries that take 1-10 microseconds
Traditional bytecode cannot be queried at all

The cost of introspection is real but bounded, and likely reducible with optimization. For most applications, even the current 2-3x execution overhead is acceptable when the alternative is losing the ability to query and analyze code at all.

More importantly, these benchmarks validate the architectural choice: semantic preservation has a measurable cost, and that cost is worth paying when code needs to be more than just executed.

Related posts:

Universal AST vs Assembly (the conceptual foundation)
AST as Higher Dimensional Datom Streams (why ASTs are materialized views)
Yin.vm: Chinese Characters for Programming Languages (the Universal Semantic AST)