Logseq's Knowledge Graph Paradox: Why Markdown Files Aren't Datoms

Logseq is a beautiful product. Built with Clojure/ClojureScript. Uses DataScript for queries. Claims to be a "knowledge graph." It should be perfect.

But there's a fundamental architectural tension at its core.

Logseq tries to be a knowledge graph while storing data as Markdown files.

And that tension explains both its strengths and its limitations.

What Logseq Got Right

Logseq made excellent technology choices:

  • ClojureScript: Functional, immutable, Lisp
  • DataScript: In-memory Datalog database
  • Block-based outliner: Everything is a block, blocks reference each other
  • Local-first: Files stored on disk, not in cloud
  • Privacy-focused: No tracking, no data collection

The vision is clear: your notes as a queryable knowledge graph.

You write:

- Reading [[The Information]] by James Gleick
  - Core insight: [[Information Theory]] connects to [[Entropy]]
  - Related: [[Claude Shannon]]'s work on [[Communication]]

Logseq should understand:

  • "The Information" is a book
  • "Information Theory" is a concept
  • "Claude Shannon" is a person
  • These entities are related via semantic links

You should be able to query:

;; Find all books related to Information Theory
[:find ?book
 :where
 [?b :block/content ?book]
 [?b :block/refs ?concept]
 [?concept :block/original-name "Information Theory"]]

This is the promise. And it almost works.

The Markdown Impedance Mismatch

Architecture: Two Worlds Colliding

Logseq's original architecture (file-based graphs):

  1. Storage layer: Markdown files on disk
  2. Parse layer: Read Markdown, extract blocks and links
  3. Database layer: Load parsed data into DataScript (in-memory)
  4. Query layer: Datalog queries over in-memory graph
  5. Write layer: Serialize back to Markdown on disk

The problem:

Markdown is not a data format. It's a text format with conventions.

DataScript expects datoms: [e a v t]

Markdown gives you: lines of text with [[brackets]]

What Gets Lost in Translation

When you write:

- [[Claude Shannon]] invented [[Information Theory]] in 1948

You mean:

[claude-shannon :person/invented information-theory]
[information-theory :concept/year 1948]
[claude-shannon :entity/type :person]
[information-theory :entity/type :concept]

But Logseq sees:

[block-123 :block/content "[[Claude Shannon]] invented [[Information Theory]] in 1948"]
[block-123 :block/refs [page-shannon page-info-theory]]
[page-shannon :block/original-name "Claude Shannon"]
[page-info-theory :block/original-name "Information Theory"]

The semantic relationship is trapped inside the text.

Consequences

Because semantics live in text, not structure:

1. Ambiguous Relationships

- [[Python]] is influenced by [[Lisp]]
- [[Python]] (the snake) lives in [[Asia]]

Logseq can't distinguish between Python-the-language and Python-the-snake. Both are pages named "Python." The type of relationship is lost.

2. Lost Metadata

- Met with [[Alice]] on 2024-11-15
  - Discussed [[Project X]]
  - Action items:
    - [[TODO]] Follow up with [[Bob]]

This encodes:

  • A meeting event (timestamp, participants)
  • A discussion topic
  • A task (status, assignee, context)

But Logseq just sees blocks with page references. The :meeting/participants, :task/assignee, and :discussion/topics attributes don't exist.

3. Brittle Queries

Want to find "all tasks assigned to Bob"?

;; This doesn't work
[:find ?task
 :where
 [?task :task/assignee "Bob"]]

;; You have to do this (fragile text matching)
[:find ?block
 :where
 [?block :block/content ?content]
 [(clojure.string/includes? ?content "TODO")]
 [?block :block/refs ?person]
 [?person :block/original-name "Bob"]]

You're grepping, not querying. The structure is gone.

4. No Schema Evolution

What if you decide "meetings should track duration"?

With datoms:

;; Add new attribute
(transact! [[:db/add meeting-1 :meeting/duration-minutes 30]])

With Markdown:

- Met with [[Alice]] on 2024-11-15 (duration: 30 min)
  ^-- Hope you remember this convention!
  ^-- Hope you parse it correctly!
  ^-- Hope it doesn't break old queries!

Convention, not structure.

The Database Version: Logseq's Pivot

The Logseq team knows this is a problem. That's why they built the DB version.

The New Architecture

Logseq DB (alpha, as of late 2024):

  • SQLite + DataScript: Persistent database with in-memory Datalog layer
  • Schema-first: Built-in properties and classes
  • Scriptable: nbb-logseq scripts can read/write datoms directly
  • EDN export: Full graph export as Clojure data structures

This is a fundamental shift:

  • Storage: Markdown → SQLite
  • Schema: Implicit (text conventions) → Explicit (property types)
  • Export: Markdown → EDN (Clojure datoms)

Why This Matters

With the DB version:

;; Define a schema
{:person/name {:type :string}
 :person/invented {:type :ref :many true}
 :concept/year {:type :number}}

;; Store actual datoms
(transact!
  [[:db/add claude-shannon :person/name "Claude Shannon"]
   [:db/add claude-shannon :person/invented info-theory]
   [:db/add info-theory :concept/year 1948]])

Now queries are structural, not textual:

;; Find all concepts invented by people
[:find ?concept ?year ?inventor
 :where
 [?person :person/invented ?concept]
 [?concept :concept/year ?year]
 [?person :person/name ?inventor]]

This is what a knowledge graph should be.

But There's a Trade-Off

The DB version gains:

  • ✅ True semantic structure
  • ✅ Queryable relationships
  • ✅ Schema evolution
  • ✅ Type safety

But loses:

  • ❌ Plain-text interoperability
  • ❌ Git-friendly diffs
  • ❌ Markdown portability
  • ❌ Simple file system browsing

And there's a deeper issue:

The Sync Problem

Logseq DB stores graphs in:

~/logseq/graphs/GRAPH-NAME/db.sqlite

SQLite is not designed for multi-device sync. You can't just Dropbox a .sqlite file and expect it to merge correctly.

Logseq's solution: Logseq Sync (a paid cloud service).

But this reintroduces the problem Logseq was supposed to solve: data ownership.

Why Datom.world Would Solve This

The core problem:

Logseq is trying to be two things:

  1. A plain-text note system (Markdown)
  2. A semantic knowledge graph (DataScript)

These are fundamentally incompatible.

Datom.world resolves this by making datoms the storage format:

1. Datoms Are the Source of Truth

Instead of:

Markdown files → Parse → DataScript (in-memory) → Serialize → Markdown

Just:

Datoms → DaoDB → Query

No impedance mismatch. No round-tripping. No lost semantics.

2. Markdown Is a View

You can still render datoms as Markdown:

;; Query for meeting notes
[:find ?block ?content
 :where
 [?meeting :meeting/date "2024-11-15"]
 [?meeting :meeting/participant ?person]
 [?person :person/name "Alice"]
 [?meeting :meeting/notes ?block]
 [?block :block/content ?content]]

;; Render as Markdown
"- Met with [[Alice]] on 2024-11-15
  - Discussed [[Project X]]
  - Action items:
    - TODO Follow up with [[Bob]]"

Markdown becomes a presentation format, not the storage format.

3. Multi-Device Sync Without Servers

DaoDB is designed for distributed sync:

  • Append-only datom streams
  • CRDT merge semantics
  • Vector clocks for causality
  • No central server required

Your graph syncs peer-to-peer. No Logseq Sync subscription. No cloud intermediary. Pure local-first.

4. Schema Evolution Is Native

Add a new attribute:

(transact! [[:db/add meeting-1 :meeting/duration-minutes 30]])

Old queries still work. New queries use the new attribute. No migration.

5. Queryable Provenance

Datoms track transaction time:

;; Find when I first linked Python to Lisp
[:find ?tx-time
 :where
 [?python :concept/influenced-by ?lisp ?tx]
 [?tx :db/txInstant ?tx-time]
 [?python :concept/name "Python"]
 [?lisp :concept/name "Lisp"]]

This is impossible with Markdown. Git history doesn't track semantic changes, only text changes.

The Obsidian Parallel

Andrew Ng recently praised Obsidian for its local-first Markdown storage.

Obsidian is excellent for what it is: a text editor that understands links.

But Obsidian doesn't claim to be a knowledge graph. It's honestly just Markdown.

Logseq wants to be more. It uses DataScript. It has Datalog queries. It has graph visualization.

It deserves datoms, not Markdown.

What Logseq Could Become

Imagine if Logseq:

  • Stored datoms natively (via DaoDB)
  • Rendered Markdown as a view (editable)
  • Synced via CRDT merge (no cloud)
  • Allowed schema-on-write (first write defines type)
  • Supported AST-level editing (semantic operations)

You'd have:

  1. True knowledge graph: semantic relationships, not text conventions
  2. Local-first sync: peer-to-peer, no servers
  3. Markdown compatibility: export/import, but not storage
  4. Time-travel queries: full history via transaction IDs
  5. User-defined types: meetings, tasks, concepts, books, people

This would be what Logseq promises but can't fully deliver.

The Lesson: Storage Format Is Destiny

Logseq is a testament to Clojure's power. DataScript is elegant. The UI is beautiful. The vision is right.

But Markdown files fundamentally limit what's possible.

You cannot build a true knowledge graph on top of text files.

You can approximate one. You can simulate one. But you can't be one.

Because:

  • Text has no inherent structure (only conventions)
  • Parsing is lossy (semantics trapped in strings)
  • Round-tripping is fragile (serialize/deserialize introduces drift)
  • Sync is brittle (text diffs aren't semantic diffs)

Storage format is destiny.

If you store Markdown, you get a text editor with links.

If you store datoms, you get a queryable knowledge graph.

Logseq's Path Forward

The DB version shows Logseq understands this. They're moving in the right direction.

But they're constrained by:

  • Legacy Markdown users
  • SQLite's sync limitations
  • Desire for cloud revenue (Logseq Sync)

A better path:

  1. Adopt datoms as native storage (via DaoDB or similar)
  2. Keep Markdown as export/import (for portability)
  3. Enable local-first CRDT sync (no servers)
  4. Let users define schemas (schema-on-write)
  5. Expose Datalog as first-class (not hidden behind UI)

This would make Logseq what it should be: a true knowledge graph, locally owned, infinitely queryable.

Conclusion: The Right Tool for the Right Job

If you want:

  • Plain-text notes with Git versioning → Obsidian
  • Block-based outliner with Markdown export → Logseq (file-based)
  • Queryable knowledge graph with text rendering → Logseq DB (getting there)
  • True semantic graph, locally synced, datom-native → Datom.world

Logseq is amazing for what it is. But it's trapped between two worlds:

  • The simplicity of Markdown
  • The power of Datalog

You can't fully have both.

Datom.world makes a choice: datoms first. Everything else - text, UI, visualizations - is a view.

That's the only way to build a real knowledge graph.

Learn More


Note: This critique comes from deep respect for Logseq's vision. The DB version is a major step forward. The tension described here is not a flaw - it's the natural consequence of trying to bridge two incompatible paradigms. Logseq's evolution toward database-first storage validates the core thesis: knowledge graphs need datoms, not Markdown.