Skip to content

Use case: agent long-term memory

A language model has no memory between sessions. little big brain gives an agent a durable, typed, queryable memory: the agent writes facts as it learns them and retrieves them later with natural-language search or graph queries. Because the store is a temporal graph with provenance, the agent’s memory is auditable — you can see what it knows, when it learned it, and why.

There are two integration styles.

Section titled “Style 1: MCP tool belt (recommended for agents)”

Connect the agent to the MCP server. The agent gets six task-shaped tools — lbb_search, lbb_inspect, lbb_query, lbb_commit, lbb_configure, lbb_index — and reads/writes the graph as tool calls. No glue code: the tools are self-describing and the error messages are self-correcting.

For a hosted stack (OAuth sign-in, key never touches the agent’s machine):

{
"mcpServers": {
"lbb": { "url": "https://mcp.littlebigbrain.com/mcp/<your-stack-slug>" }
}
}

For local development against lbb-server, use the stdio shim with a key — see the MCP guide.

  1. Give the agent a typed graph (once). lbb_configure with action: "define_ontology" creates a graph whose vocabulary matches the agent’s domain — e.g. Customer/Ticket/Agent entities and OPENED, RESOLVED, ESCALATED_TO relations.
  2. Write as it works. When the agent learns a fact, it calls lbb_commit with triplets (and optional entity_properties for typed attributes like status or priority). Idempotency keys are content-derived, so retries are safe by default.
  3. Recall when it needs to. lbb_search answers “have I seen this customer’s issue before?” with hybrid retrieval; lbb_inspect action=entity pulls the exact neighborhood of a known entity; lbb_query runs analytical questions like “how many tickets did team X escalate last month?”.
  4. Refresh acceleration. lbb_index rebuilds persisted indexes after a batch of writes (the hosted product also builds base indexes automatically).

The tools return compact structured envelopes ({summary, data, counts?, next?}) sized for a model’s context window, with cursor paging for large results and supernode sampling so a high-degree entity never floods the response.

Style 2: SDK inside your own agent framework

Section titled “Style 2: SDK inside your own agent framework”

If you’re orchestrating the agent yourself (LangGraph, a custom loop, an API backend), call the TypeScript or Python SDK directly from your tools. This gives you full control over what the model can do.

# A "remember" tool your agent can call.
def remember(fact: dict) -> None:
lbb.graph("memory").facts.create({"triplets": [fact]},
idempotency_key=content_hash(fact))
# A "recall" tool.
def recall(question: str) -> list[dict]:
res = lbb.search.hybrid(question, top_k=8, source="persisted", consistency="strong",
targets=["entities", "assertions", "observations"])
return res.get("assertions", [])

Store the model’s own notes as observations so the evidence text is searchable, and promote stable facts to assertions with a relation so they become graph-navigable.

  • Typed, not a blob. Facts are entities and relations with attributes, so the agent can ask precise graph questions, not just do fuzzy recall.
  • Temporal. Nothing is destructively overwritten. history shows how a fact evolved; you can pin a query to a past snapshot with as_of_commit_seq.
  • Provenance-first. Every assertion carries the evidence that produced it, so the agent (and you) can justify what it “knows” — critical for trust and debugging hallucinated writes.
  • Multi-tenant isolation. Each stack is an isolated tenant; per-user or per-workspace memory is just a separate stack or graph.
  • Prefer consistency: "strong" so the agent immediately sees facts it just wrote in the same session.
  • Use search.multi (RRF) when a question decomposes into several sub-questions.
  • Turn on managed embeddings for real semantic recall, and grade results with search_feedback to fine-tune retrieval to your domain — see Embeddings & feedback tuning.

Close the loop: feedback that makes retrieval better

Section titled “Close the loop: feedback that makes retrieval better”

An agent that only reads and writes memory is static. The interesting property is that an agent loop can tell little big brain how good its retrieval was, and little big brain uses that signal to retrieve better next time. This turns your agent into a system that improves with use.

The mechanism is graded relevance feedback: after the agent uses retrieved context, it grades each retrieved item — 3 (this was relevant/useful), 1 (partially), 0 (irrelevant). Those grades accumulate as qrels (query → relevance judgments), and qrels fine-tune the managed embedding model. A better-tuned model ranks genuinely useful memories higher in the vector leg of hybrid search — so the next loop starts from better context.

┌──────────────────────────────────────────────────────────┐
│ │
┌────▼─────┐ ┌──────┐ ┌──────────────┐ ┌───────────────────┴──┐
│ retrieve │──▶│ act │──▶│ observe │──▶│ grade each result │
│ lbb_search│ │(LLM) │ │ outcome │ │ search_feedback 3/1/0│
└──────────┘ └──────┘ └──────────────┘ └───────────┬──────────┘
▲ │
│ fine-tune embeddings ◀── export qrels ─┘
└──────────────── activate tuned model ──────────────
import os
from lbb import LbbClient
lbb = LbbClient("https://db.eu.littlebigbrain.com", api_key=os.environ["LBB_API_KEY"])
def agent_turn(question: str) -> str:
# 1. RETRIEVE — pull candidate context from memory.
hits = lbb.search.hybrid(
question, top_k=8, source="persisted", consistency="strong",
targets=["entities", "assertions", "observations"],
)
candidates = hits.get("entities", []) + hits.get("assertions", [])
# 2. ACT — the model answers, and tells you which items it actually used.
answer, used_ids = llm_answer(question, candidates) # your model call
# 3. OBSERVE — did the turn succeed? (user accepted, tests passed,
# the tool call worked, a downstream check confirmed the answer…)
outcome = observe_outcome(answer) # your success signal
# 4. GRADE — send feedback per retrieved item so lbb learns what helped.
for c in candidates:
if c["id"] in used_ids:
grade = 3 if outcome.success else 1 # used + worked / used but weak
else:
grade = 0 # retrieved but ignored
lbb.search_feedback(query=question, target=c["selector"], grade=grade)
return answer

The grade heuristic is yours to design. Common, cheap signals: did the model cite the item, did the user accept the answer, did a downstream check / test / tool call succeed, was the turn retried. Even a coarse used-and-worked → 3 / used-but-failed → 1 / ignored → 0 rule produces a usable training signal over enough turns.

Grading accumulates qrels; periodically (nightly, or every N turns) convert them into a fine-tuned embedding model and activate it:

qrels = lbb.search_feedback_export() # accumulated judgments as customer qrels
# Fine-tune on the qrels, then activate the resulting model for the graph.
# The console drives this end to end: Models → Training runs the fine-tune,
# Retrieval → Embeddings activates the new managed model per graph.

After activating a tuned model, confirm the hybrid vector leg is actually using it — add explain: true to a search and check explain.vector_model_id. See Embeddings & feedback tuning for the full mechanics.

The result is a closed loop: retrieve → act → observe → grade → fine-tune → retrieve better. Reads inform writes to the model; the agent’s own experience becomes the training data that sharpens its memory.

little big brain’s graph is backed by RDF terms and validated with SHACL, and both are unusually well-suited to agents:

  • RDF triples are the natural write unit for an LLM. One subject–predicate–object fact at a time, self-describing and composable, with stable identifiers so the same entity merges across turns, sessions, and even across different agents — no schema migration to add a new kind of fact.
  • SHACL is a guardrail for what the agent writes. Validate agent-authored facts against shapes to catch malformed, incomplete, or hallucinated writes before they pollute memory — and the conformance report is machine-readable, so the agent can read exactly which constraint failed and self-correct. That validate step is a second feedback loop, this time on writes, complementing the retrieval loop above on reads.
  • Inference rules mean the graph gets smarter without more LLM calls. SHACL-AF rules derive new facts deterministically from existing ones, so the agent doesn’t have to re-reason what the system can entail.

The full rationale and examples are in Why RDF & SHACL suit AI agents.