Semantic Bytecode Benchmarks: The Cost of Queryability
The Experiment
In Universal AST vs Assembly, we argued that the Universal AST should be "RISC for semantics": low-level in form (flat, explicit datoms) but high-level in meaning (preserving semantic relationships). We implemented this idea in yin.bytecode, creating two bytecode representations:
- Traditional (numeric): Sequential bytes like
[1 0 1 1 4 2](opcode + operands) - Semantic (datoms): Queryable triples like
[[:node/1 :op/type :literal] [:node/1 :op/value 42]]
The question: what does queryability cost?
Methodology
We ran identical benchmarks across three Clojure platforms:
- CLJ (JVM): Clojure on Java 21
- CLJS (Node.js): ClojureScript on Node 22
- CLJD (Dart): ClojureDart on Dart 3.4
Each benchmark:
- Warms up the runtime (500 iterations)
- Runs 5000 iterations of compilation
- Runs 5000 iterations of execution
- Reports microseconds per operation
Test cases range from trivial (literal value) to complex (nested lambda with multiple operations):
;; Literal
{:type :literal :value 42}
;; Addition
{:type :application
:operator {:type :variable :name '+}
:operands [{:type :literal :value 10}
{:type :literal :value 20}]}
;; Lambda
((fn [x] (+ x 1)) 10)
;; Nested
((fn [x] (+ x (* 2 3))) 10)
;; Conditional
(if (< 5 10) :yes :no)Results: Compilation Performance
How much slower is semantic bytecode to compile?
| Test Case | CLJ (JVM) | CLJS (Node) | CLJD (Dart) |
|---|---|---|---|
| Literal | 6.4x | 7.2x | 11.4x |
| Addition | 5.1x | 6.2x | 10.1x |
| Lambda | 9.3x | 4.9x | 5.7x |
| Nested | 6.2x | 3.9x | 4.3x |
| Conditional | 3.3x | 4.9x | 6.9x |
Semantic bytecode is 4-11x slower to compile. This overhead comes from:
- Generating unique node IDs for each instruction
- Creating datom triples instead of appending bytes
- Building the constant pool with keyword references
- Preserving metadata (arity, params, value types)
Results: Execution Performance
How much slower is semantic bytecode to execute?
| Test Case | CLJ (JVM) | CLJS (Node) | CLJD (Dart) |
|---|---|---|---|
| Literal | 2.3x | 7.3x | 4.3x |
| Addition | 1.0x | 2.0x | 2.3x |
| Lambda | 1.7x | 2.5x | 3.0x |
| Nested | 2.5x | 2.3x | 2.4x |
| Conditional | 1.1x | 1.4x | 1.9x |
Semantic bytecode is 1-7x slower to execute. The overhead comes from:
- Graph traversal by node reference (vs sequential byte reading)
- Map lookups for node attributes (vs array indexing)
- Building node attribute maps at each step
Notably, the JVM shows the smallest execution overhead (often near 1x). The JVM's optimizing JIT compiler can inline the map lookups and eliminate much of the abstraction cost.
Results: Query Performance
The key advantage of semantic bytecode is queryability. How fast are queries?
| Query | CLJ | CLJS | CLJD |
|---|---|---|---|
| find-applications | 3.51 us | 8.62 us | 9.78 us |
| find-lambdas | 3.23 us | 5.66 us | 4.67 us |
| find-variables | 1.45 us | 6.18 us | 4.68 us |
These queries find all nodes of a given type in the compiled bytecode. With traditional numeric bytecode, these queries are impossible without parsing the byte stream and reconstructing semantic information.
Example queries enabled by semantic bytecode:
;; Find all function applications
(bc/find-applications datoms) ; => #{:node/4 :node/7 :node/9}
;; Find all lambdas
(bc/find-lambdas datoms) ; => #{:node/5}
;; Get attributes for a specific node
(bc/get-node-attrs datoms :node/5)
; => {:op/type :lambda
; :op/arity 1
; :op/params [x]
; :op/captures-env? true}The Real Win: Semantic JIT Optimization
Fast queries are the biggest benefit of semantic bytecode, not despite the execution overhead, but because they enable optimizations that are impossible with traditional bytecode.
Consider what a JIT compiler can do with queryable bytecode:
- Find all call sites for a function and inline them
- Detect pure functions (no mutations) and memoize results
- Identify hot loops by querying application patterns
- Specialize polymorphic calls based on observed argument types
- Eliminate dead code by finding unreachable nodes
- Constant-fold expressions where all inputs are literals
With traditional bytecode, the JIT must reconstruct this semantic information by analyzing opcode patterns. With semantic bytecode, it's a 3-10 microsecond query. This changes what optimizations are practical at runtime.
;; JIT optimization example: find all pure function calls
[:find ?call ?fn
:where
[?call :op/type :apply]
[?call :op/operator-node ?fn]
[?fn :op/type :lambda]
[?fn :op/mutates? false]] ; Only possible with semantic bytecodeThe execution overhead of semantic bytecode may be self-correcting: the same queryability that costs 2x in naive interpretation enables JIT optimizations that can recover (or exceed) that performance.
Size Comparison
For the nested AST ((fn [x] (+ x (* 2 3))) 10):
| Representation | Units | Pool Entries |
|---|---|---|
| Semantic | 39 datoms | 7 |
| Traditional | 22 bytes | 7 |
Semantic bytecode is currently more verbose in terms of instruction units, but each datom carries queryable semantic information that raw bytes lack. Furthermore, this overhead is not fundamental: datoms can be bitpacked into 64-bit numbers. Once this optimization is implemented, semantic bytecode will achieve size parity with traditional bytecode while retaining its superior queryability.
Platform Comparison
Raw execution times for the nested AST benchmark (microseconds per call):
| Platform | Semantic Compile | Legacy Compile | Semantic Execute | Legacy Execute |
|---|---|---|---|---|
| CLJ (JVM) | 21.28 | 3.42 | 16.22 | 6.52 |
| CLJS (Node) | 80.18 | 20.42 | 71.40 | 30.78 |
| CLJD (Dart) | 46.91 | 10.82 | 42.95 | 17.81 |
The JVM is fastest overall, with Dart in the middle and Node.js slowest. The relative overhead of semantic bytecode is roughly consistent across platforms (approximately 4-6x compile, 2-2.5x execute for this test case).
The Tradeoff
Traditional bytecode optimizes for execution speed:
- Compact representation (bytes)
- Sequential access (program counter)
- Minimal allocation during execution
Semantic bytecode optimizes for introspection:
- Queryable representation (datoms)
- Graph traversal (node references)
- Preserved semantic relationships
This mirrors the distinction in Universal AST vs Assembly: assembly optimizes for hardware execution, while the Universal AST optimizes for semantic preservation. Crucially, semantic preservation can be used to optimize for hardware execution (via JIT), but the reverse is not true: you cannot recover lost semantics from optimized assembly.
When to Use Each
Use Traditional Bytecode When:
- Execution speed is critical
- Memory is constrained
- You only need to run code, not analyze it
- The bytecode is a throwaway intermediate representation
Use Semantic Bytecode When:
- You need to query or transform the code
- The bytecode is a persistent representation (stored in DaoDB)
- You want to analyze code structure (find all mutations, all calls, etc.)
- You need provenance tracking or debugging
- Cross-language translation requires semantic preservation
The Hybrid Approach
In practice, Yin.vm can use both representations simultaneously:
- Semantic bytecode stored in DaoDB for queries and analysis
- Traditional bytecode compiled on-demand for hot execution paths
This is the "full-speed-plus-full-introspection" approach mentioned in the Yin.vm overview. The AST (as semantic datoms) is the canonical representation. Traditional bytecode is an optimization, compiled from the AST when needed.
Room for Optimization
Important: These benchmarks reflect an unoptimized implementation. No performance tuning has been applied to the semantic bytecode VM. The current implementation prioritizes clarity over speed.
Potential optimizations include:
- Bitpacking datoms into 64-bit numbers for memory parity with traditional bytecode
- Pre-indexing datoms by node ID (O(1) lookup instead of linear scan)
- Caching node attribute maps during execution
- Using arrays instead of vectors for hot paths
- Compiling frequently-executed nodes to native closures
- Lazy datom materialization (only create what's queried)
We believe that with these optimizations, semantic bytecode execution can approach traditional bytecode performance while retaining full queryability. The JVM results (often near 1x overhead) suggest the abstraction cost is not fundamental, just unoptimized.
Conclusion
The benchmarks quantify the current, unoptimized state:
- Semantic bytecode costs 4-11x in compilation time
- Semantic bytecode costs 1-7x in execution time (unoptimized)
- Semantic bytecode enables queries that take 1-10 microseconds
- Traditional bytecode cannot be queried at all
The cost of introspection is real but bounded, and likely reducible with optimization. For most applications, even the current 2-3x execution overhead is acceptable when the alternative is losing the ability to query and analyze code at all.
More importantly, these benchmarks validate the architectural choice: semantic preservation has a measurable cost, and that cost is worth paying when code needs to be more than just executed.
Related posts:
- Universal AST vs Assembly (the conceptual foundation)
- AST as Higher Dimensional Datom Streams (why ASTs are materialized views)
- Yin.vm: Chinese Characters for Programming Languages (the Universal Semantic AST)