What Makes Datalog Datalog: Semantics, Not Syntax

Published Nov 16, 2025

The Question

Datomic implements Datalog with an EDN-based syntax that looks like structured data. Prolog-style Datalog uses predicates and horn clauses. Souffle uses a C-like syntax. LogicBlox has its own notation. SQL's WITH RECURSIVE enables Datalog-style queries.

Are these all actually Datalog? Or are some pretenders?

The answer: Datalog is defined by its semantics, not its syntax. If the evaluation model is the same, it's Datalog—no matter what the surface representation looks like.

What Datalog Actually Is

Datalog is a restricted subset of first-order logic with a specific evaluation strategy. At its core:

1. Relational Foundation

Data is stored as relations—sets of tuples. Not nested objects. Not trees. Flat tuples.

parent(alice, bob)
parent(bob, charlie)
parent(charlie, diana)

This is a relation parent with two columns. Each row is a fact.

2. Horn Clause Logic

Queries are Horn clauses—logical implications of the form:

head :- body₁, body₂, ..., bodyₙ

Read as: "The head is true if all body clauses are true."

Example:

ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).

This defines ancestor recursively:

Base case: If parent(X, Y), then ancestor(X, Y)
Recursive case: If parent(X, Z) and ancestor(Z, Y), then ancestor(X, Y)

3. Safe Recursion

Datalog supports recursive rules with guaranteed termination. This is enforced via:

Range restriction — Every variable in the head must appear in a positive (non-negated) body clause
Stratification — Negation cannot create circular dependencies
No function symbols — Only constants and variables (prevents infinite term generation)

Example of safe recursion:

reachable(X, Y) :- edge(X, Y).
reachable(X, Y) :- edge(X, Z), reachable(Z, Y).

This terminates because:

Variables X, Y, Z are bound by edge facts
No new nodes are generated (only existing nodes from edge)
The number of possible reachable facts is finite

4. Fixed-Point Evaluation

Datalog evaluates rules by iterating until a fixed point is reached:

Iteration 0: Base facts (from database)
Iteration 1: Apply rules → derive new facts
Iteration 2: Apply rules → derive more facts
...
Iteration N: Apply rules → no new facts
FIXED POINT REACHED ✓

This is called least fixed-point semantics. The result is the smallest set of facts that satisfies all rules.

5. Set Semantics

Results are sets, not bags (multisets) or lists:

Result: {(alice, bob), (alice, charlie)}

NOT: [(alice, bob), (alice, bob), (alice, charlie)]  # No duplicates!

Each fact is derived at most once. Adding the same rule multiple times doesn't change the result.

6. Monotonicity

Adding facts never removes previously derived conclusions (until you add stratified negation):

If ancestor(alice, bob) was derived,
Adding more parent() facts won't un-derive it.

This property enables incremental evaluation. You can add new facts and recompute only the affected conclusions.

7. Stratified Negation

Datalog supports negation, but it must be stratified—no circular dependencies through negation:

unreachable(X, Y) :- node(X), node(Y), not reachable(X, Y).

This is safe because:

reachable is fully computed first (stratum 0)
unreachable uses the result (stratum 1)
No circular dependency: unreachable doesn't feed back into reachable

Stratification ensures negation is computable and meaningful.

Syntax Is Just Surface Representation

Different Datalog implementations use wildly different syntaxes, but they all share the same evaluation model:

Prolog-Style (Classic)

parent(alice, bob).
parent(bob, charlie).

ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).

?- ancestor(alice, charlie).

This is the canonical Datalog syntax, borrowed from Prolog.

Datomic/Datascript (EDN/Clojure)

[:find ?ancestor ?descendant
 :where
 [?ancestor :parent ?child]
 [?child :parent ?descendant]]

This looks like structured data (EDN—Extensible Data Notation). It's a vector containing keywords and symbols.

Key insight: This is homoiconic. The query is data that Clojure code can manipulate:

;; Compose queries as data
(def base-query
  '[:find ?person
    :where
    [?person :person/name ?name]])

(def age-filter
  '[[?person :person/age ?age]
    [(> ?age 18)]])

;; Programmatically build queries
(concat base-query [:where] age-filter)

Despite the syntax difference, this is Datalog because:

Variables (?person, ?name) unify
Clauses are conjunctive (AND semantics)
Supports recursion via rules
Set semantics (:find returns a set)
Stratified negation (not, not-join)

Souffle (Research-Oriented)

.decl parent(x: symbol, y: symbol)
.decl ancestor(x: symbol, y: symbol)
.input parent

ancestor(x, y) :- parent(x, y).
ancestor(x, y) :- parent(x, z), ancestor(z, y).

.output ancestor

Souffle adds type declarations and explicit input/output directives. It compiles to C++ for high performance.

LogicBlox (Commercial)

ancestor(x, y) <- parent(x, y).
ancestor(x, y) <- parent(x, z), ancestor(z, y).

Uses <- instead of :-. Same semantics, different arrow.

SQL WITH RECURSIVE

WITH RECURSIVE ancestor AS (
  SELECT parent, child FROM parent_table
  UNION
  SELECT p.parent, a.child
  FROM parent_table p
  JOIN ancestor a ON p.child = a.parent
)
SELECT * FROM ancestor;

SQL's WITH RECURSIVE enables Datalog-style recursive queries. Same fixed-point evaluation, SQL syntax.

All Datalog, Different Clothes

These are all Datalog because they share the same semantic properties:

Relational model (tuples, not nested terms)
Horn clause logic (if-then rules)
Fixed-point semantics (iterate until stable)
Safe recursion (guaranteed termination)
Stratified negation (safe NOT)
Set semantics (no duplicates)
Monotonicity (adding facts preserves conclusions)

The syntax is irrelevant. Datalog is an evaluation strategy, not a programming language.

What Datalog Is NOT

To clarify the boundaries:

Datalog Is NOT Full Prolog

Prolog has:

Function symbols — Terms like f(g(h(x))) can nest infinitely
Backtracking search — Explores multiple solutions with chronological backtracking
Cut operator (!) — Non-logical control for pruning search
Non-monotonic updates — assert/retract modify the database
Turing-complete — Can express arbitrary computation (doesn't always terminate)

Datalog restricts:

No function symbols — Only constants and variables (prevents infinite term generation)
No backtracking — Set-based evaluation (all solutions at once)
No cut — Declarative only (no procedural control)
Monotonic reasoning — Facts only accumulate (until stratified negation)
Guaranteed termination — Finite domain + range restriction = always halts

Example of legal Prolog, illegal Datalog:

% Prolog: Function symbols allow infinite nesting
list([]).
list([H|T]) :- list(T).
append([], L, L).
append([H|T1], L2, [H|T3]) :- append(T1, L2, T3).

?- append([1,2], [3,4], X).  % Builds X = [1,2,3,4]

This uses function symbols ([H|T] is cons(H, T)). Datalog cannot express this because:

The term [1,[2,[3,[4,[]]]]] is a nested function
Prolog can generate infinitely many different terms
Datalog requires finite domains

Datalog Is NOT SQL (But Close)

SQL has:

Aggregate functions — SUM, AVG, COUNT, GROUP BY
Nulls — Three-valued logic (true/false/unknown)
Order-dependent operations — LIMIT, OFFSET, ORDER BY
Procedural extensions — Stored procedures, triggers
Imperative updates — UPDATE, DELETE, INSERT

Classic Datalog has:

Simple selection/projection/join — No aggregates
No nulls — Closed-world assumption (missing = false)
Order-independent — Set semantics (no ORDER BY)
Declarative rules only — No procedures
Immutable facts — No updates (monotonic reasoning)

Modern Datalog extensions (Datomic, Souffle) add aggregates, but core Datalog is simpler and more restricted than SQL.

Why the Restrictions Matter

Datalog's restrictions seem limiting. No function symbols? No loops? No updates? Why bother?

Because these restrictions unlock powerful guarantees:

1. Guaranteed Termination

Every Datalog query always terminates. No infinite loops. No non-termination.

This is why Datalog is perfect for compiler infrastructure. You can run arbitrary user-defined queries without worrying about hanging the compiler.

2. Efficient Evaluation

The restrictions enable aggressive optimization:

Join ordering — Query planner chooses optimal join order
Indexing — Multi-dimensional indexes for fast lookup
Semi-naive evaluation — Incremental fixed-point computation
Magic sets — Push down selections for efficiency
Tabling — Memoize intermediate results

These optimizations are impossible in full Prolog (backtracking and function symbols break them).

3. Parallel Evaluation

Datalog rules can be applied in parallel. No shared mutable state. No race conditions.

// Apply rules in parallel
parallel_for_each rule in rules:
    derive_facts(rule, current_facts)

// Merge results (set union)
new_facts = union(all derived facts)

This enables massive parallelism on modern hardware.

4. Incremental Maintenance

Monotonicity enables incremental updates:

Add new fact: parent(diana, eve)

Recompute only affected conclusions:
  ancestor(diana, eve)  ← new
  ancestor(charlie, eve) ← new (via diana)
  ancestor(bob, eve)     ← new (via charlie)
  ancestor(alice, eve)   ← new (via bob)

Everything else unchanged!

Traditional databases must recompute everything. Datalog recomputes only what changed.

5. Declarative Optimization

You write what you want, not how to compute it. The query planner chooses the execution strategy.

;; User writes this
[:find ?x ?z
 :where
 [?x :related-to ?y]
 [?y :connected-to ?z]
 [(> ?z 100)]]

;; Engine optimizes:
Option 1: Filter z > 100 first, then join
Option 2: Join first, then filter
Option 3: Use index on :connected-to for z > 100

Chooses best based on statistics!

This is why Datalog as compiler infrastructure is so powerful. Compiler writers express semantic truths, and the engine optimizes automatically.

The Power of 'Just' Datalog

Datalog seems simple compared to Prolog or SQL. But this simplicity is intentional:

Expressive enough for complex analyses (reachability, type inference, escape analysis)
Restricted enough for efficiency (guaranteed termination, parallelism, indexing)
Declarative enough for composability (rules are independent)
Safe enough for user-programmable compilers (no hanging queries)

This is the sweet spot for compiler optimization.

Homoiconicity: Why Datomic's Syntax Matters

While syntax doesn't change what Datalog is, Datomic's choice to represent queries as data structures (EDN) unlocks unique capabilities:

Queries as Data

;; A query is just a data structure
(def my-query
  '[:find ?person ?name
    :where
    [?person :person/name ?name]])

Programmatic Composition

;; Build queries dynamically
(defn add-age-filter [query min-age]
  (update query :where conj
    ['?person :person/age '?age]
    [list '> '?age min-age]))

(add-age-filter my-query 18)
;; => [:find ?person ?name
;;     :where
;;     [?person :person/name ?name]
;;     [?person :person/age ?age]
;;     [(> ?age 18)]]

Macro Generation

;; Generate queries from specifications
(defmacro defquery [name spec]
  `(def ~name
     ~(compile-spec-to-query spec)))

(defquery ancestors
  {:entity :person
   :relation :parent
   :transitive true})

Serialization

;; Serialize queries as EDN
(pr-str my-query)
;; => "[:find ?person ?name :where [?person :person/name ?name]]"

;; Send over network, store in database, etc.
;; Then read back:
(read-string query-string)

This homoiconicity (code as data) is why Lisp-family languages excel at metaprogramming. Datomic's EDN syntax makes queries first-class data that can be manipulated, composed, and generated programmatically.

DaoDB's Datalog Design

When building DaoDB for Yin.vm, the design choices are clear:

Syntax: Datomic's EDN-Based Datalog

DaoDB implements Datomic's syntax of Datalog—using Clojure's EDN (Extensible Data Notation) for homoiconic queries:

[:find ?fn ?optimization
 :where
 [?fn :ast/type :function]
 [?fn :purity/pure true]
 [?call :ast/calls ?fn]
 [?loop :ast/type :loop]
 [?call :ast/ancestor ?loop]
 [(vector :hoist-call ?call ?loop) ?optimization]]

Semantics: Pure Datalog

Horn clauses (rules with heads and bodies)
Fixed-point evaluation (iterate until stable)
Stratified negation (safe NOT)
Range restriction (guaranteed termination)
Set semantics (no duplicates)

Extensions: Modern Datalog Features

Aggregates — sum, count, max for optimization metrics
Temporal queries — Query across transaction time (history)
Provenance — Which rule derived this fact? (for debugging)
User-defined predicates — Custom functions in queries
Pull API — Fetch entity graphs (Datomic-style)

Optimizations: High-Performance Datalog

Semi-naive evaluation — Incremental fixed-point (only new facts)
Multi-indexed EAVT — Fast lookup on entity/attribute/value/time
Join ordering — Cost-based query planner
Parallel evaluation — Rules applied concurrently
Compilation to execution streams — Bytecode-like performance

The Essence: Evaluation Model, Not Surface Syntax

Datalog is not defined by its syntax. It's defined by:

Relational model — Data is tuples
Horn clause logic — Rules are if-then implications
Fixed-point evaluation — Iterate until no new facts
Safe recursion — Guaranteed termination
Stratified negation — Computable NOT
Set semantics — No duplicates
Monotonicity — Facts accumulate

These properties make Datalog:

Powerful — Recursive queries, transitive closure, fixpoints
Efficient — Indexing, join ordering, parallelism
Safe — Always terminates, no infinite loops
Declarative — Describe what, not how
Composable — Rules are independent

This is why Datalog is the perfect foundation for compiler infrastructure. It provides:

Expressive power for complex analyses
Efficiency for whole-program reasoning
Safety for user-programmable optimizations
Declarative clarity for maintainability

Conclusion: Look Beyond the Syntax

When you see Datalog queries that look different:

Prolog-style: ancestor(X,Y) :- parent(X,Y).
Datomic-style: [:find ?x :where [?x :parent ?y]]
SQL-style: WITH RECURSIVE ancestor AS ...

Don't be fooled by the surface representation. Ask:

Does it use Horn clauses?
Does it evaluate via fixed-point?
Does it guarantee termination?
Does it support safe recursion?
Does it have set semantics?

If yes, it's Datalog—no matter what it looks like.

Syntax is just clothing. Semantics are the soul.

This is why datom.world chose Datalog as the foundation for DaoDB and Yin.vm. The evaluation model—not the syntax—is what makes Datalog the perfect query language for code as data.

Learn more:

Datalog as Compiler Infrastructure (why Datalog makes every optimization queryable)
AST as Higher Dimensional Construction of Datom Streams (code as queryable datoms)
AST Datom Streams: Bytecode Performance with Semantic Preservation (execution + queries)
DaoDB Documentation (the Datalog database powering datom.world)
Yin VM Documentation (technical implementation)