Skip to content

Use case: ingest → index → search

This is the canonical little big brain pipeline: take some domain knowledge, write it as typed facts, build search indexes, and answer natural-language questions over it. We’ll model a small “systems and data” knowledge graph — the kind of thing you’d feed an agent that answers questions about your infrastructure.

  1. Model the domain. Decide your entity types and relations. Here: SERVICE, DATABASE, and DATASET entities, connected by WRITES_TO, READS_FROM, and STORES relations. On the hosted product the default ontology already covers generic AI-context vocabulary; for a bespoke schema, define a graph ontology first (see SPARQL & SHACL or the MCP define_ontology).

  2. Write facts as triplets. Each triplet carries a confidence and evidence string — the provenance stays attached to the assertion.

    import { LbbClient } from "@lbb/client";
    const lbb = new LbbClient({
    baseUrl: "https://db.eu.littlebigbrain.com",
    apiKey: process.env.LBB_API_KEY,
    });
    await lbb.graph("main").facts.create(
    {
    triplets: [
    { source: { type: "SERVICE", name: "auth-service" }, relation: "WRITES_TO",
    target: { type: "DATABASE", name: "user-db" },
    confidence: 0.93, evidence: "auth-service writes identity records to user-db" },
    { source: { type: "SERVICE", name: "billing-service" }, relation: "READS_FROM",
    target: { type: "DATABASE", name: "user-db" },
    confidence: 0.9, evidence: "billing reads customer identity for invoicing" },
    { source: { type: "DATABASE", name: "user-db" }, relation: "STORES",
    target: { type: "DATASET", name: "customer identity data" },
    confidence: 0.97, evidence: "user-db is the system of record for customer identity" },
    ],
    },
    { idempotencyKey: "systems-import-v1" },
    );

    For large loads (thousands+ of facts), use bulk NDJSON ingest — see Hybrid retrieval and the POST /v1/graph/import endpoint in the HTTP API.

  3. Build indexes. BM25, vector/ANN, and adjacency runs are derived from the snapshot. wait: true blocks until they’re ready.

    await lbb.indexes.run({ wait: true });
  4. Ask a question. Hybrid search fuses lexical + BM25 + vector + ontology + graph-neighborhood signals. The query never has to match keywords exactly — “customer identity” surfaces user-db and the assertions that connect it.

    const results = await lbb.search.hybrid(
    "which systems store customer identity data",
    { topK: 5, source: "persisted", consistency: "strong", targets: ["entities", "assertions"] },
    );
    for (const a of results.assertions ?? []) {
    console.log(a.relation?.name, "", a.target?.name, a.score);
    }
  5. Follow the graph. Once search finds an anchor entity, traverse its neighborhood or ask why an assertion exists — the evidence you wrote in step 2 comes back as provenance.

    await lbb.traverse({ entity: { entity_type: "DATABASE", name: "user-db" }, relation: "READS_FROM", direction: "in" });
    await lbb.why({ /* the assertion you want to justify */ });
  • Facts are append-only. Re-running the import with the same idempotency key is a no-op; new evidence on an existing edge is recorded, not overwritten. You get a temporal record for free — read history to see how a relationship changed.
  • Indexes are disposable. If you change your embedding model or tokenizer, rebuild — the graph is the source of truth, indexes are derived.
  • Consistency is explicit. strong overlays any facts committed after the last index run, so search never lags a write you just made.

As new facts arrive, commit them and either let the hosted product rebuild base indexes automatically or call indexes.delta / indexes.run yourself. Use indexes.gc to retire superseded runs. For a scheduled sync, a small script that commits a day’s facts and refreshes indexes is enough — this is exactly how the project dogfoods its own product-development graph.