RDF Triples vs Datoms

The RDF Foundation

The Resource Description Framework (RDF) models knowledge as triples: (subject predicate object). The concept is elegant. But RDF has many serialization formats: RDF/XML, JSON-LD, Turtle, N-Triples, N-Quads, TriG, RDF/JSON. Here's the same fact in JSON-LD:

{
  "@context": "http://xmlns.com/foaf/0.1/",
  "@id": "http://example.org/person/alice",
  "knows": {
    "@id": "http://example.org/person/bob"
  }
}

RDF defines multiple standard serialization formats, each optimized for different contexts: RDF/XML for traditional XML tooling, JSON-LD for web APIs, Turtle for human readability, N-Triples for streaming.

The triple concept is elegant: it's compositional, enables powerful graph queries and inference. The semantic web was built on this foundation, and for good reason.

But wouldn't it be simpler if an RDF triple could be as straightforward as this?

[alice :knows bob]

A literal vector with subject, predicate, object. No XML tags, no JSON context negotiation. Just data.

The Datom Structure: [e a v t m]

Datomic extended the RDF triple model to make time a first-class dimension, introducing the datom as a five-tuple [e a v t op]. The fourth element t is transaction time, and the fifth element op is a boolean indicating whether the datom is an assertion (add) or retraction. Datom.world extends this model further, replacing the boolean with m, a metadata entity:

  • e: entity (like RDF's subject)
  • a: attribute (like RDF's predicate)
  • v: value or entity reference (like RDF's object)
  • t: transaction ID (monotonic within its stream)
  • m: metadata entity (not a boolean, but an entity reference for extensible context)

This isn't just "RDF plus ordering." It's a recognition that causality and provenance are first-class dimensions of data, not afterthoughts. The metadata entity m allows arbitrary context to be attached to each datom, going beyond Datomic's simple add/retract distinction.

Time (t): Causality as Structure

In RDF, to track when Alice started knowing Bob, you need reification: turning the triple itself into an entity so you can make statements about it. Here's the JSON-LD:

{
  "@context": {
    "foaf": "http://xmlns.com/foaf/0.1/",
    "dc": "http://purl.org/dc/terms/"
  },
  "@id": "http://example.org/statement/1",
  "@type": "rdf:Statement",
  "rdf:subject": {"@id": "http://example.org/person/alice"},
  "rdf:predicate": {"@id": "foaf:knows"},
  "rdf:object": {"@id": "http://example.org/person/bob"},
  "dc:created": "2025-01-01T00:00:00Z"
}

This nested map expands to five separate RDF triples. You need four triples to identify which triple you're talking about (the subject, predicate, object, and type), plus one for the timestamp. The original semantic triple is now embedded in a meta-structure.

In datom.world, you could model this as an entity (like RDF's reified statement), but you don't need to. Time is intrinsic:

[alice :knows bob 1001 nil]

Transaction 1001 is a monotonic ID generated by the stream. Every fact exists at a transaction within its stream. You can query as-of any transaction without creating statement entities. Causality is built into the data model, not added as metadata on top.

Metadata (m): Extensible Context and Cross-Stream Causality

The fifth element, m, is an entity ID used for metadata about the datom itself. This enables:

  • Provenance (who asserted this?)
  • Confidence scores
  • Access control tokens
  • Capability references
  • Cross-stream causality: since t is stream-local, determining causality across different streams requires interpreting the metadata graph via m

Instead of creating parallel vocabularies, you extend the model itself. Metadata is data, queryable through the same mechanisms. Crucially, m is how you establish causal relationships between facts from different streams, since transaction IDs have no meaning outside their originating stream.

What RDF Gets Right

  • Universality: Any fact can be expressed as subject-predicate-object
  • Composability: Triples can be composed into graphs

RDF triples are data about data. They describe relationships without prescribing storage, execution, or inference models.

Where RDF Falls Short

Despite the elegance of the triple model, RDF never achieved widespread adoption outside of specialized domains like academic research, libraries, and government data. Why?

1. Serialization Complexity

RDF's original serialization format was RDF/XML, which was notoriously verbose. While later formats (JSON-LD, Turtle, N-Triples) improved readability, they're still verbose compared to the simplicity of the underlying triple concept. Each format requires its own parser, validator, and tooling.

More critically, the serialization format is not the data model. A triple is a conceptual structure. RDF/XML, JSON-LD, and Turtle are different representations of that structure. You negotiate which format to use, then parse it into an internal representation (often... integers for performance, as we'll see).

Datoms take the opposite approach: the serialization format is the data model. A datom is a five-element vector. That's not a representation of a datom, that is a datom. There's no parsing step where you transform from one structure to another. The vector literal [42 :person/name "Alice" 1001 nil] is what gets indexed, what gets queried, what gets sent over the wire as raw EDN or encoded with Transit. The format is the data.

2. The Consensus Bottleneck

The semantic web vision required universal agreement on ontologies before applications could interoperate. While you could technically build an application with your own private ontology, doing so removed the main benefit: automatic data integration across systems. But agreement is expensive.

For a software agent to book a flight and hotel by reading data from different websites, both the airline and hotel needed to describe price, date, and booking using shared vocabularies. If American Airlines used vocab:cost and Hilton used vocab:price, machine integration would fail. Achieving this agreement across industries proved incredibly slow and political.

Organizations had immediate problems to solve and couldn't wait for W3C committees to standardize vocabularies. It was infinitely cheaper and faster for two engineers to agree on a JSON format: {"id": 123, "cost": 99.99}. They didn't need the whole world to agree, just the two systems talking to each other. The industry voted with its feet: relational databases and JSON APIs won.

The datom model avoids this trap by shifting from global specification to local interpretation:

  • Local IDs, Late-Binding Identity: Entity IDs are integers assigned locally within each stream, sized by the stream type (as small as 16-bit on constrained devices), requiring zero consensus. When datoms migrate to a remote DaoDB, the remote upgrades entity IDs to 128-bit by using the originating node's 64-bit ID as the high bits. The node ID serves as a namespace jail: the jail itself has node ID zero, and all entity references in the migrated datoms are relative to the originating node ID. Correlation between systems uses shared unique attributes (:person/email) at query time, not during storage. This moves the "agreement" cost from the storage layer (hard, upfront) to the query layer (flexible, as-needed).
  • Agents Replace Ontologies: Instead of passive dictionaries requiring industry-wide agreement, datom.world uses executable agents that process data. Meaning is defined by what the agent does, not what a specification says. You don't wait for a standards committee to define standard:invoice. You write an agent that processes your invoice datoms.
  • Fixed Structure, No Meta-Agreement: The five-tuple structure is non-negotiable. Time and provenance are built-in dimensions, not interpretive layers requiring additional agreement on how to track history. Systems don't need to agree on how to handle change, they just replay the append-only log.

Datoms solve the local problem first: a high-performance, immutable database that works immediately for your application. Integration with other systems is a separate, downstream task handled by query logic and agent interpretation, not an upfront barrier to entry.

3. No Answer to Execution

RDF correctly focuses on data representation, not execution. But the Semantic Web vision required some answer to: who runs the code? How do agents process data? How does computation happen?

The vision had no cohesive answer. Think of the difference between a PDF and a web browser:

  • RDF is like a PDF: It perfectly preserves information. But a PDF cannot do anything. It can't calculate, it can't update itself, it can't act. You need an external program to view it and a human to interpret it.
  • The Semantic Web dream: Autonomous agents that navigate, learn, and act on their own. But because RDF has no execution model, you can't send RDF to a server and have it "run."

Developers didn't build on RDF. Relational databases already solved execution: stored procedures, triggers, application servers with JDBC connections. RDF became a dumb datastore, a static format for knowledge representation without computational power.

Datom.world provides an answer: agents are yin.vm continuations. Because yin.vm unifies functions, closures, and continuations, agents are simply functions that consume streams of datoms and can migrate across nodes. The Universal AST serves as the compilation target, preserving semantics across languages.

Continuations exist as datoms. It's datoms all the way down. The execution state (control, environment, store, continuation) is stored as datoms in DaoDB. This means agents can be serialized, migrated across nodes, and resumed from any point. Code and runtime state share the same representation.

Instead of relying on consensus about what a predicate means, you have executable agents that process datoms according to explicit rules. The meaning isn't in the ontology: it's in the agent's behavior.

4. Conflation of Identity and Performance

RDF uses global URIs (http://example.org/alice) for entity identity. This conflates two distinct concerns: semantic identity (who is this person?) and internal references (efficient database indexing).

In practice, RDF implementations work around URI performance costs by hashing URIs to integers or using dictionary encoding (mapping URIs to sequential IDs). But this creates a new problem: the model says entities are URIs, but the implementation uses integers. You've now split identity across two layers:

  • The conceptual model (URIs as first-class entities)
  • The implementation reality (hashed integers for performance)

This mismatch means developers must think in URIs but optimize for integers. Query planning, caching strategies, and distributed coordination all depend on understanding both representations.

Datoms separate these concerns cleanly:

Entity IDs: Stream-Local Integers

Entity IDs are internal references within a stream. Streams are typed, and the stream type determines the size of each position in the datom [e a v t m]:

[42 :person/name "Alice" 1001 nil]

Stream types determine entity ID size. Constrained edge devices can use streams typed with entity IDs as small as 16-bit. When a continuation migrates to the cloud, the remote DaoDB upgrades entity IDs to 128-bit: the originating node's 64-bit ID becomes the high bits, and the original entity ID is zero-extended to fill the low bits. The node ID acts as a namespace jail. The remote DaoDB's own entities have node ID zero, while migrated entities retain their originating node ID. All entity references within the migrated datoms are interpreted relative to the originating node ID.

This structure accommodates trillions of devices. The 64-bit node ID space supports billions of nodes, each with its own 64-bit entity ID space, ensuring isolation while allowing entities to coexist on the same remote DaoDB. This mirrors IPv6's design: 64-bit network prefix plus 64-bit interface ID.

Entity IDs 0-1024 are reserved for system entities (built-in attributes like :db/ident, type markers, and other primitives) with universal meaning across all namespaces. User entities start at 1025, which serves as the "zero basis" for user data. On migration, the zero basis changes but entity IDs remain unchanged because they are relative offsets.

Why not UUIDs or sequential UUIDs (squuids)?

  • Index efficiency: 64-bit integers are faster to compare (single CPU instruction vs two for 128-bit). In sorted index segments, binary search over 8-byte keys is more cache-friendly than 16-byte keys.
  • Storage density: Every datom and every index entry stores an entity ID. 8 bytes vs 16 bytes matters at scale.
  • Privacy: Sequential UUIDs (squuids) leak creation timestamps. Local IDs reveal nothing.
  • Simplicity: Local sequential assignment requires no coordination, no timestamp embedding, no randomness generation.

Semantic Identity: Unique Attributes

Identity is not a property of entity IDs. It's a semantic property expressed through unique attributes:

;; DaoDB-A: Personal database
[1 :person/name "Alice"]
[1 :person/uuid #uuid "550e8400-..."]
[1 :person/email "alice@example.com"]

;; DaoDB-B: Work database
[7 :person/uuid #uuid "550e8400-..."]  ;; Same UUID
[7 :person/department "Engineering"]

Different entity IDs (1 vs 7), but the same person. Correlation happens through the shared :person/uuid attribute, not the entity ID.

Attributes marked as :db.unique/identity enable:

  • Lookup refs: [:person/email "alice@example.com"] resolves to the local entity ID
  • Upsert semantics: facts added to a unique attribute find or create the entity
  • Cross-DaoDB queries: join on shared unique values

Different entities can use different identity schemes: :person/uuid for globally unique entities, :person/email for natural identity, :document/sha256 for content-addressed identity, or no unique attributes for private, non-correlatable entities.

Entity IDs are internal. Identity is semantic. Correlation is explicit.

Interpreters Materialize Datoms

The datom [e a v t m] is canonical. What you do with it depends on the interpreter.

DaoDB is one interpreter. It materializes datoms into covered indexes (EAVT, AEVT, AVET, VAET) for efficient querying. But DaoDB is not the only way to interpret a datom stream.

A content-addressing interpreter can materialize datoms differently. For universal structures like AST nodes, it hashes (a, v) to produce SHA-256 content hashes. The same expression (+ 1 2) produces the same hash everywhere, enabling deduplication and verification across the network.

This is the power of separating structure from interpretation:

  • The datom stream is the single source of truth
  • Interpreters materialize views optimized for different purposes
  • New interpreters can be added without changing the underlying data

Streams: Bounded and Unbounded

Streams can be open (unbounded, still receiving datoms) or bounded (closed at some transaction t, finite, stable for reasoning). A bounded stream is the database value. There is no separate "value" type. Stability comes from bounding, not from a different abstraction.

Datalog queries (d/q) work on any bounded stream. A raw stream without indexes requires O(n) scan. A DaoDB-managed stream with EAVT/AEVT/AVET/VAET indexes enables O(log n) lookup. Same interface, same semantics, different performance. Indexes are a performance optimization, not a semantic requirement.

Restrictions as Features

Datoms are more restrictive than RDF:

  • Attributes are namespaced keywords, not arbitrary URIs
  • Entities are opaque IDs (sized by stream type, globally referenceable as 128-bit after migration)
  • Time is monotonic and append-only within each stream (not globally)
  • Values must be ground (no variables in storage)

These restrictions enable:

  • Efficient indexing: EAVT, AEVT, AVET, VAET covering indexes
  • Immutability: datoms never change, simplifying caching and distribution
  • Consistent reads: as-of queries are trivial
  • Clear semantics: no ambiguity about what a fact means

For a deeper exploration of how these restrictions enable powerful capabilities, see The Power of Restriction: Why Datom Tuples Work.

Graphs: Constructed, Not Assumed

Another crucial difference: RDF is a graph model. Datoms are tuples from which graphs may be constructed.

In datom.world:

  • Tuples are the primitive
  • Graphs are constructed by following entity references in queries
  • Relationships are explicit, not assumed from predicate semantics
  • Graph structure emerges from interpretation, not from syntax

Don't assume structure. Don't rely on consensus vocabularies to define meaning. Make interpretation explicit through executable code.

Conclusion: Everything Is Data

RDF triples proved that knowledge can be decomposed into atomic facts. Datoms extend this insight: applications need time and context as native dimensions.

In datom.world:

  • Time isn't metadata: it's part of the datom itself
  • Provenance isn't annotation: it's intrinsic via the metadata dimension
  • Graphs aren't assumed: they're constructed from tuples
  • Everything is data, including the context in which data exists, even code and its runtime state in the form of datoms

Five dimensions. No hidden state. Explicit causality.

This is how we build systems where everything is data.