Logseq's Knowledge Graph Paradox: Why Markdown Files Aren't Datoms
Logseq is a beautiful product. Built with Clojure/ClojureScript. Uses DataScript for queries. Claims to be a "knowledge graph." It should be perfect.
But there's a fundamental architectural tension at its core.
Logseq tries to be a knowledge graph while storing data as Markdown files.
And that tension explains both its strengths and its limitations.
What Logseq Got Right
Logseq made excellent technology choices:
- ClojureScript: Functional, immutable, Lisp
- DataScript: In-memory Datalog database
- Block-based outliner: Everything is a block, blocks reference each other
- Local-first: Files stored on disk, not in cloud
- Privacy-focused: No tracking, no data collection
The vision is clear: your notes as a queryable knowledge graph.
You write:
- Reading [[The Information]] by James Gleick
- Core insight: [[Information Theory]] connects to [[Entropy]]
- Related: [[Claude Shannon]]'s work on [[Communication]]Logseq should understand:
- "The Information" is a book
- "Information Theory" is a concept
- "Claude Shannon" is a person
- These entities are related via semantic links
You should be able to query:
;; Find all books related to Information Theory
[:find ?book
:where
[?b :block/content ?book]
[?b :block/refs ?concept]
[?concept :block/original-name "Information Theory"]]This is the promise. And it almost works.
The Markdown Impedance Mismatch
Architecture: Two Worlds Colliding
Logseq's original architecture (file-based graphs):
- Storage layer: Markdown files on disk
- Parse layer: Read Markdown, extract blocks and links
- Database layer: Load parsed data into DataScript (in-memory)
- Query layer: Datalog queries over in-memory graph
- Write layer: Serialize back to Markdown on disk
The problem:
Markdown is not a data format. It's a text format with conventions.
DataScript expects datoms: [e a v t]
Markdown gives you: lines of text with [[brackets]]
What Gets Lost in Translation
When you write:
- [[Claude Shannon]] invented [[Information Theory]] in 1948You mean:
[claude-shannon :person/invented information-theory]
[information-theory :concept/year 1948]
[claude-shannon :entity/type :person]
[information-theory :entity/type :concept]But Logseq sees:
[block-123 :block/content "[[Claude Shannon]] invented [[Information Theory]] in 1948"]
[block-123 :block/refs [page-shannon page-info-theory]]
[page-shannon :block/original-name "Claude Shannon"]
[page-info-theory :block/original-name "Information Theory"]The semantic relationship is trapped inside the text.
Consequences
Because semantics live in text, not structure:
1. Ambiguous Relationships
- [[Python]] is influenced by [[Lisp]]
- [[Python]] (the snake) lives in [[Asia]]Logseq can't distinguish between Python-the-language and Python-the-snake. Both are pages named "Python." The type of relationship is lost.
2. Lost Metadata
- Met with [[Alice]] on 2024-11-15
- Discussed [[Project X]]
- Action items:
- [[TODO]] Follow up with [[Bob]]This encodes:
- A meeting event (timestamp, participants)
- A discussion topic
- A task (status, assignee, context)
But Logseq just sees blocks with page references. The :meeting/participants, :task/assignee, and :discussion/topics attributes don't exist.
3. Brittle Queries
Want to find "all tasks assigned to Bob"?
;; This doesn't work
[:find ?task
:where
[?task :task/assignee "Bob"]]
;; You have to do this (fragile text matching)
[:find ?block
:where
[?block :block/content ?content]
[(clojure.string/includes? ?content "TODO")]
[?block :block/refs ?person]
[?person :block/original-name "Bob"]]You're grepping, not querying. The structure is gone.
4. No Schema Evolution
What if you decide "meetings should track duration"?
With datoms:
;; Add new attribute
(transact! [[:db/add meeting-1 :meeting/duration-minutes 30]])With Markdown:
- Met with [[Alice]] on 2024-11-15 (duration: 30 min)
^-- Hope you remember this convention!
^-- Hope you parse it correctly!
^-- Hope it doesn't break old queries!Convention, not structure.
The Database Version: Logseq's Pivot
The Logseq team knows this is a problem. That's why they built the DB version.
The New Architecture
Logseq DB (alpha, as of late 2024):
- SQLite + DataScript: Persistent database with in-memory Datalog layer
- Schema-first: Built-in properties and classes
- Scriptable: nbb-logseq scripts can read/write datoms directly
- EDN export: Full graph export as Clojure data structures
This is a fundamental shift:
- Storage: Markdown → SQLite
- Schema: Implicit (text conventions) → Explicit (property types)
- Export: Markdown → EDN (Clojure datoms)
Why This Matters
With the DB version:
;; Define a schema
{:person/name {:type :string}
:person/invented {:type :ref :many true}
:concept/year {:type :number}}
;; Store actual datoms
(transact!
[[:db/add claude-shannon :person/name "Claude Shannon"]
[:db/add claude-shannon :person/invented info-theory]
[:db/add info-theory :concept/year 1948]])Now queries are structural, not textual:
;; Find all concepts invented by people
[:find ?concept ?year ?inventor
:where
[?person :person/invented ?concept]
[?concept :concept/year ?year]
[?person :person/name ?inventor]]This is what a knowledge graph should be.
But There's a Trade-Off
The DB version gains:
- ✅ True semantic structure
- ✅ Queryable relationships
- ✅ Schema evolution
- ✅ Type safety
But loses:
- ❌ Plain-text interoperability
- ❌ Git-friendly diffs
- ❌ Markdown portability
- ❌ Simple file system browsing
And there's a deeper issue:
The Sync Problem
Logseq DB stores graphs in:
~/logseq/graphs/GRAPH-NAME/db.sqliteSQLite is not designed for multi-device sync. You can't just Dropbox a .sqlite file and expect it to merge correctly.
Logseq's solution: Logseq Sync (a paid cloud service).
But this reintroduces the problem Logseq was supposed to solve: data ownership.
Why Datom.world Would Solve This
The core problem:
Logseq is trying to be two things:
- A plain-text note system (Markdown)
- A semantic knowledge graph (DataScript)
These are fundamentally incompatible.
Datom.world resolves this by making datoms the storage format:
1. Datoms Are the Source of Truth
Instead of:
Markdown files → Parse → DataScript (in-memory) → Serialize → MarkdownJust:
Datoms → DaoDB → QueryNo impedance mismatch. No round-tripping. No lost semantics.
2. Markdown Is a View
You can still render datoms as Markdown:
;; Query for meeting notes
[:find ?block ?content
:where
[?meeting :meeting/date "2024-11-15"]
[?meeting :meeting/participant ?person]
[?person :person/name "Alice"]
[?meeting :meeting/notes ?block]
[?block :block/content ?content]]
;; Render as Markdown
"- Met with [[Alice]] on 2024-11-15
- Discussed [[Project X]]
- Action items:
- TODO Follow up with [[Bob]]"Markdown becomes a presentation format, not the storage format.
3. Multi-Device Sync Without Servers
DaoDB is designed for distributed sync:
- Append-only datom streams
- CRDT merge semantics
- Vector clocks for causality
- No central server required
Your graph syncs peer-to-peer. No Logseq Sync subscription. No cloud intermediary. Pure local-first.
4. Schema Evolution Is Native
Add a new attribute:
(transact! [[:db/add meeting-1 :meeting/duration-minutes 30]])Old queries still work. New queries use the new attribute. No migration.
5. Queryable Provenance
Datoms track transaction time:
;; Find when I first linked Python to Lisp
[:find ?tx-time
:where
[?python :concept/influenced-by ?lisp ?tx]
[?tx :db/txInstant ?tx-time]
[?python :concept/name "Python"]
[?lisp :concept/name "Lisp"]]This is impossible with Markdown. Git history doesn't track semantic changes, only text changes.
The Obsidian Parallel
Andrew Ng recently praised Obsidian for its local-first Markdown storage.
Obsidian is excellent for what it is: a text editor that understands links.
But Obsidian doesn't claim to be a knowledge graph. It's honestly just Markdown.
Logseq wants to be more. It uses DataScript. It has Datalog queries. It has graph visualization.
It deserves datoms, not Markdown.
What Logseq Could Become
Imagine if Logseq:
- Stored datoms natively (via DaoDB)
- Rendered Markdown as a view (editable)
- Synced via CRDT merge (no cloud)
- Allowed schema-on-write (first write defines type)
- Supported AST-level editing (semantic operations)
You'd have:
- True knowledge graph: semantic relationships, not text conventions
- Local-first sync: peer-to-peer, no servers
- Markdown compatibility: export/import, but not storage
- Time-travel queries: full history via transaction IDs
- User-defined types: meetings, tasks, concepts, books, people
This would be what Logseq promises but can't fully deliver.
The Lesson: Storage Format Is Destiny
Logseq is a testament to Clojure's power. DataScript is elegant. The UI is beautiful. The vision is right.
But Markdown files fundamentally limit what's possible.
You cannot build a true knowledge graph on top of text files.
You can approximate one. You can simulate one. But you can't be one.
Because:
- Text has no inherent structure (only conventions)
- Parsing is lossy (semantics trapped in strings)
- Round-tripping is fragile (serialize/deserialize introduces drift)
- Sync is brittle (text diffs aren't semantic diffs)
Storage format is destiny.
If you store Markdown, you get a text editor with links.
If you store datoms, you get a queryable knowledge graph.
Logseq's Path Forward
The DB version shows Logseq understands this. They're moving in the right direction.
But they're constrained by:
- Legacy Markdown users
- SQLite's sync limitations
- Desire for cloud revenue (Logseq Sync)
A better path:
- Adopt datoms as native storage (via DaoDB or similar)
- Keep Markdown as export/import (for portability)
- Enable local-first CRDT sync (no servers)
- Let users define schemas (schema-on-write)
- Expose Datalog as first-class (not hidden behind UI)
This would make Logseq what it should be: a true knowledge graph, locally owned, infinitely queryable.
Conclusion: The Right Tool for the Right Job
If you want:
- Plain-text notes with Git versioning → Obsidian
- Block-based outliner with Markdown export → Logseq (file-based)
- Queryable knowledge graph with text rendering → Logseq DB (getting there)
- True semantic graph, locally synced, datom-native → Datom.world
Logseq is amazing for what it is. But it's trapped between two worlds:
- The simplicity of Markdown
- The power of Datalog
You can't fully have both.
Datom.world makes a choice: datoms first. Everything else - text, UI, visualizations - is a view.
That's the only way to build a real knowledge graph.
Learn More
- DaoDB: Distributed Datalog database for datom storage
- Structure vs Interpretation: Why storage format matters
- Datoms as Streams: Local-first sync architecture
- When the IDE Edits AST, Not Text: Semantic editing principles
- The Coming AI Data Crisis: Why data ownership matters
- Datom.world: The datom-native ecosystem