Use case: agent long-term memory
A language model has no memory between sessions. little big brain gives an agent a durable, typed, queryable memory: the agent writes facts as it learns them and retrieves them later with natural-language search or graph queries. Because the store is a temporal graph with provenance, the agent’s memory is auditable — you can see what it knows, when it learned it, and why.
There are two integration styles.
Style 1: MCP tool belt (recommended for agents)
Section titled “Style 1: MCP tool belt (recommended for agents)”Connect the agent to the MCP server. The agent gets six
task-shaped tools — lbb_search, lbb_inspect, lbb_query, lbb_commit,
lbb_configure, lbb_index — and reads/writes the graph as tool calls. No glue
code: the tools are self-describing and the error messages are self-correcting.
Set it up
Section titled “Set it up”For a hosted stack (OAuth sign-in, key never touches the agent’s machine):
{ "mcpServers": { "lbb": { "url": "https://mcp.littlebigbrain.com/mcp/<your-stack-slug>" } }}For local development against lbb-server, use the stdio shim with a key — see
the MCP guide.
A typical agent loop
Section titled “A typical agent loop”- Give the agent a typed graph (once).
lbb_configurewithaction: "define_ontology"creates a graph whose vocabulary matches the agent’s domain — e.g.Customer/Ticket/Agententities andOPENED,RESOLVED,ESCALATED_TOrelations. - Write as it works. When the agent learns a fact, it calls
lbb_commitwith triplets (and optionalentity_propertiesfor typed attributes likestatusorpriority). Idempotency keys are content-derived, so retries are safe by default. - Recall when it needs to.
lbb_searchanswers “have I seen this customer’s issue before?” with hybrid retrieval;lbb_inspect action=entitypulls the exact neighborhood of a known entity;lbb_queryruns analytical questions like “how many tickets did team X escalate last month?”. - Refresh acceleration.
lbb_indexrebuilds persisted indexes after a batch of writes (the hosted product also builds base indexes automatically).
The tools return compact structured envelopes ({summary, data, counts?, next?})
sized for a model’s context window, with cursor paging for large results and
supernode sampling so a high-degree entity never floods the response.
Style 2: SDK inside your own agent framework
Section titled “Style 2: SDK inside your own agent framework”If you’re orchestrating the agent yourself (LangGraph, a custom loop, an API backend), call the TypeScript or Python SDK directly from your tools. This gives you full control over what the model can do.
# A "remember" tool your agent can call.def remember(fact: dict) -> None: lbb.graph("memory").facts.create({"triplets": [fact]}, idempotency_key=content_hash(fact))
# A "recall" tool.def recall(question: str) -> list[dict]: res = lbb.search.hybrid(question, top_k=8, source="persisted", consistency="strong", targets=["entities", "assertions", "observations"]) return res.get("assertions", [])Store the model’s own notes as observations so the evidence text is searchable, and promote stable facts to assertions with a relation so they become graph-navigable.
What makes this a good memory
Section titled “What makes this a good memory”- Typed, not a blob. Facts are entities and relations with attributes, so the agent can ask precise graph questions, not just do fuzzy recall.
- Temporal. Nothing is destructively overwritten.
historyshows how a fact evolved; you can pin a query to a past snapshot withas_of_commit_seq. - Provenance-first. Every assertion carries the evidence that produced it, so the agent (and you) can justify what it “knows” — critical for trust and debugging hallucinated writes.
- Multi-tenant isolation. Each stack is an isolated tenant; per-user or per-workspace memory is just a separate stack or graph.
Retrieval quality tips
Section titled “Retrieval quality tips”- Prefer
consistency: "strong"so the agent immediately sees facts it just wrote in the same session. - Use
search.multi(RRF) when a question decomposes into several sub-questions. - Turn on managed embeddings for real semantic recall, and grade results with
search_feedbackto fine-tune retrieval to your domain — see Embeddings & feedback tuning.
Close the loop: feedback that makes retrieval better
Section titled “Close the loop: feedback that makes retrieval better”An agent that only reads and writes memory is static. The interesting property is that an agent loop can tell little big brain how good its retrieval was, and little big brain uses that signal to retrieve better next time. This turns your agent into a system that improves with use.
The mechanism is graded relevance feedback: after the agent uses retrieved
context, it grades each retrieved item — 3 (this was relevant/useful), 1
(partially), 0 (irrelevant). Those grades accumulate as qrels (query →
relevance judgments), and qrels fine-tune the managed embedding model. A
better-tuned model ranks genuinely useful memories higher in the vector leg of
hybrid search — so the next loop starts from better context.
┌──────────────────────────────────────────────────────────┐ │ │ ┌────▼─────┐ ┌──────┐ ┌──────────────┐ ┌───────────────────┴──┐ │ retrieve │──▶│ act │──▶│ observe │──▶│ grade each result │ │ lbb_search│ │(LLM) │ │ outcome │ │ search_feedback 3/1/0│ └──────────┘ └──────┘ └──────────────┘ └───────────┬──────────┘ ▲ │ │ fine-tune embeddings ◀── export qrels ─┘ └──────────────── activate tuned model ──────────────An agentic turn with a feedback step
Section titled “An agentic turn with a feedback step”import osfrom lbb import LbbClient
lbb = LbbClient("https://db.eu.littlebigbrain.com", api_key=os.environ["LBB_API_KEY"])
def agent_turn(question: str) -> str: # 1. RETRIEVE — pull candidate context from memory. hits = lbb.search.hybrid( question, top_k=8, source="persisted", consistency="strong", targets=["entities", "assertions", "observations"], ) candidates = hits.get("entities", []) + hits.get("assertions", [])
# 2. ACT — the model answers, and tells you which items it actually used. answer, used_ids = llm_answer(question, candidates) # your model call
# 3. OBSERVE — did the turn succeed? (user accepted, tests passed, # the tool call worked, a downstream check confirmed the answer…) outcome = observe_outcome(answer) # your success signal
# 4. GRADE — send feedback per retrieved item so lbb learns what helped. for c in candidates: if c["id"] in used_ids: grade = 3 if outcome.success else 1 # used + worked / used but weak else: grade = 0 # retrieved but ignored lbb.search_feedback(query=question, target=c["selector"], grade=grade)
return answerThe grade heuristic is yours to design. Common, cheap signals: did the model
cite the item, did the user accept the answer, did a downstream check /
test / tool call succeed, was the turn retried. Even a coarse
used-and-worked → 3 / used-but-failed → 1 / ignored → 0 rule produces a usable
training signal over enough turns.
Turn feedback into a better model
Section titled “Turn feedback into a better model”Grading accumulates qrels; periodically (nightly, or every N turns) convert them into a fine-tuned embedding model and activate it:
qrels = lbb.search_feedback_export() # accumulated judgments as customer qrels# Fine-tune on the qrels, then activate the resulting model for the graph.# The console drives this end to end: Models → Training runs the fine-tune,# Retrieval → Embeddings activates the new managed model per graph.After activating a tuned model, confirm the hybrid vector leg is actually using
it — add explain: true to a search and check explain.vector_model_id. See
Embeddings & feedback tuning for the full mechanics.
The result is a closed loop: retrieve → act → observe → grade → fine-tune → retrieve better. Reads inform writes to the model; the agent’s own experience becomes the training data that sharpens its memory.
Why RDF & SHACL fit agent loops
Section titled “Why RDF & SHACL fit agent loops”little big brain’s graph is backed by RDF terms and validated with SHACL, and both are unusually well-suited to agents:
- RDF triples are the natural write unit for an LLM. One subject–predicate–object fact at a time, self-describing and composable, with stable identifiers so the same entity merges across turns, sessions, and even across different agents — no schema migration to add a new kind of fact.
- SHACL is a guardrail for what the agent writes. Validate agent-authored facts against shapes to catch malformed, incomplete, or hallucinated writes before they pollute memory — and the conformance report is machine-readable, so the agent can read exactly which constraint failed and self-correct. That validate step is a second feedback loop, this time on writes, complementing the retrieval loop above on reads.
- Inference rules mean the graph gets smarter without more LLM calls. SHACL-AF rules derive new facts deterministically from existing ones, so the agent doesn’t have to re-reason what the system can entail.
The full rationale and examples are in Why RDF & SHACL suit AI agents.
Related
Section titled “Related”- MCP server reference
- Migrate a markdown knowledge base — seed an agent’s memory from existing docs through the same MCP tool belt.
- Ingest → index → search
- Hybrid retrieval, filters & facets