Skip to content

Use case: migrate a markdown knowledge base

You have a knowledge base in markdown — an engineering wiki, a docs folder, an Obsidian vault, a Notion export. It’s searchable as text but flat: the links between notes, the entities they describe, and their metadata aren’t queryable.

This guide migrates that markdown into little big brain so it becomes a temporal knowledge graph with hybrid search. The migration itself is an agent task driven through the MCP tool belt: an agent (Claude in Claude Code or Cursor, or any MCP client) reads each file with its own file tools and writes typed facts with little big brain’s tools. Afterward you retrieve, query, and reason over the result.

Connect the MCP server to your agent and point it at a graph for the KB (e.g. kb). For a hosted stack:

{
"mcpServers": {
"lbb": { "url": "https://mcp.littlebigbrain.com/mcp/<your-stack-slug>?graph=kb" }
}
}
  1. Model the domain (define an ontology). You don’t have to invent the schema by hand — the agent has just been given read access to the files, so the fastest start is to have it propose an ontology from the KB itself, then you review and refine.

    A prompt like:

    Prompt: propose an ontology
    Read a representative sample of the markdown under docs/ — a dozen or so
    files across the different sections — and propose an ontology for a "kb"
    graph: the entity types and the relations between them, in the shape the
    lbb_configure define_ontology tool expects. Keep it small and purposeful
    (about 5–15 entity types), use the vocabulary the docs already use, and make
    the relations things I'd actually want to query or traverse. For each type
    and relation, cite an example file that motivates it, and flag any facts in
    the sample the ontology can't yet express. Show me the proposal for review
    before defining anything.

    Tips for getting a good ontology out of the model:

    • Sample, don’t dump. Have it read a diverse slice across sections and doc kinds — enough to see the domain’s shape without burning its whole context on every file.
    • Ask for the define_ontology shape directly, so the proposal is runnable rather than prose you have to translate.
    • Push for a small vocabulary. Coarse entity types plus typed attributes beat one type per noun. A kind, status, or tag almost always belongs as a property on an entity, not as its own type — tell the model to prefer attributes and only mint a type for things it will connect with relations.
    • Keep genuine subtypes you’ll want to reason over separately (e.g. Runbook as a specialization of Doc). Those pay off via type-closure reasoning — a query for Doc returns runbooks too, while you can still filter to just runbooks.
    • Model relations as verbs you’ll traverse (OWNED_BY, DEPENDS_ON, LINKS_TO), not every incidental co-occurrence. Ask which relations map to questions you actually want to answer.
    • Make it justify itself and check coverage. Requiring a source file per type/relation grounds the proposal and curbs invented types; the list of facts it can’t express tells you what to add or merge.
    • Review before you commit — the ontology is fixed at graph creation. Iterate on the proposal, prune and merge, then define. (You can additively evolve it later — widen a relation’s domain or add types — or start a fresh graph, but you can’t rename or delete in place.)

    Once you’re happy with the proposal, define it. For an engineering wiki the result might be:

    // lbb_configure
    {
    "action": "define_ontology",
    "graph": "kb",
    "entity_types": [
    { "name": "Doc" }, { "name": "Service" }, { "name": "Team" },
    { "name": "Concept" }, { "name": "Runbook" }
    ],
    "relations": [
    { "name": "DOCUMENTS", "source": ["Doc"], "target": ["Service", "Concept"] },
    { "name": "OWNED_BY", "source": ["Doc", "Service"], "target": ["Team"] },
    { "name": "LINKS_TO", "source": ["Doc"], "target": ["Doc"], "reducer": "append_only" },
    { "name": "MENTIONS", "source": ["Doc"], "target": ["Concept", "Service"], "reducer": "append_only" }
    ]
    }

    Confirm it landed with lbb_inspect action: "ontology" before you start committing facts.

  2. Drive the migration. Point the agent at your markdown folder and let it loop over the files. For each file it: reads the prose, extracts entities and the relations between them, and calls lbb_commit. A prompt like:

    Prompt: run the migration
    Migrate every markdown file under docs/ into the "kb" graph. For each file,
    create a Doc entity keyed by its path, store the prose as evidence, capture
    front-matter (title, owner, tags, updated date) as entity properties, and
    turn [[wiki-links]] and markdown links into LINKS_TO edges and concept/service
    mentions into MENTIONS/DOCUMENTS edges. Use lbb_index when you're done.

    Each file becomes one lbb_commit. The prose goes in as evidence (so the full text is searchable and every fact keeps its provenance), front-matter becomes typed entity properties, and links become edges:

    // lbb_commit — one markdown file → typed facts
    {
    "graph": "kb",
    "triplets": [
    { "source": { "type": "Doc", "name": "Billing Service Overview" },
    "relation": "DOCUMENTS",
    "target": { "type": "Service", "name": "billing-service" },
    "evidence": "The billing service reads customer identity from user-db and charges via the payments provider." },
    { "source": { "type": "Doc", "name": "Billing Service Overview" },
    "relation": "LINKS_TO",
    "target": { "type": "Doc", "name": "Payments Runbook" },
    "evidence": "See [[Payments Runbook]] for on-call steps." },
    { "source": { "type": "Doc", "name": "Billing Service Overview" },
    "relation": "OWNED_BY",
    "target": { "type": "Team", "name": "Payments Team" } }
    ],
    "entity_properties": [
    { "entity": { "type": "Doc", "name": "Billing Service Overview" },
    "key": "docs/billing/overview.md",
    "properties": {
    "path": "docs/billing/overview.md",
    "title": "Billing Service Overview",
    "owner": "Payments Team",
    "tags": ["billing", "payments"],
    "updated_at": "2026-06-30"
    } }
    ],
    "edge_idempotency": "skip_unchanged"
    }

    You don’t hand-write these — the agent generates them from your prose. A few patterns worth prompting for:

    • Key each Doc by its file path so re-migrating updates the same entity instead of creating duplicates.
    • Preserve the link graph. [[wiki-links]] and markdown links become LINKS_TO / MENTIONS edges — this is what turns a pile of files into a navigable graph.
    • Keep the prose as evidence, and promote the claims worth querying (a service dependency, an owner, a concept) to typed relations.
  3. Build indexes. Refresh the persisted BM25, vector, and adjacency indexes so search and traversal cover everything you just wrote:

    // lbb_index
    { "graph": "kb" }

    (On the hosted product, base indexes also build automatically.)

Your markdown is now a queryable graph. The same MCP tool belt — or the SDKs from your own app — reads it back.

Hybrid search answers natural-language questions and returns the docs and the entities/relations behind them, each with its source evidence:

// lbb_search
{ "graph": "kb", "query": "how does the billing service authenticate customers?" }
  • Reason over it. Because Runbook specializes Doc, a query for Doc surfaces runbooks too via type closure; rules can entail transitive DEPENDS_ON across services the docs describe.
  • Use it as retrieval for an app or agent. Point the TypeScript or Python SDK at the kb graph for graph-aware RAG, or hand it to an agent as long-term memory.

Dropping the files into a vector store gives you similarity search and nothing else. Migrating into little big brain keeps what that throws away:

  • Structure. Links become edges, so you can traverse (“what does this doc depend on?”), not just match similar chunks.
  • Provenance. Every fact cites the file it came from — answers are traceable, not black-box.
  • Typed metadata. Owners, tags, and dates are queryable attributes, so exact filters and aggregates work (“runbooks owned by Payments, updated this quarter”).
  • Temporal. Re-migrate over time and the graph keeps history — you can see how a doc’s facts changed and pin a query to a past snapshot.
  • Hybrid ranking. Lexical + BM25 + vector + graph-neighborhood signals are fused, so recall doesn’t hinge on embeddings alone.

Re-run the same migration whenever the KB changes. Content-based idempotency keys and edge_idempotency: "skip_unchanged" keep it clean: unchanged files are no-ops, edited files add new evidence and update properties, and the temporal history records the change. A scheduled job that migrates changed files and calls lbb_index keeps the graph current.