Skip to content

Core concepts

A short tour of the ideas the SDKs and API assume. If you just want to run something, start with the Quickstart; come back here when a term is unfamiliar.

Every operation is scoped by three names that together form a GraphKey:

  • tenant — an isolation boundary. On the hosted SaaS a stack maps to a tenant; your stack API key can only touch its own tenant.
  • graph — a named graph within the tenant (default main). A graph’s ontology is fixed at creation.
  • branch — a copy-on-write snapshot of a graph (default main). A child branch reuses the parent’s WAL and index objects until it writes new commits.

In the SDKs you scope writes with client.graph("name"); in the HTTP API you pass ?graph=/?branch= query parameters.

little big brain does not destructively update facts by default. Ingest appends immutable graph events to a write-ahead log (WAL):

  • entities (named nodes like auth-service or user-db)
  • observations (evidence text / source metadata)
  • relationship events (assertions between entities)
  • supersession links and provenance references

Current state is a reducer projection over that event history. Because history is retained, you can read how a relationship changed over time and pin a query to a past snapshot. A tombstone/retraction path exists for the cases that genuinely need deletion.

  • Entity — a named node.
  • Relation — a typed edge predicate such as DEPENDS_ON, WRITES_TO, or STORES.
  • Assertion — a relationship event connecting two entities.
  • Observation — the evidence (text or source) that supports an extracted entity or assertion. Provenance stays visible in graph and search results.

A snapshot is identified by a commit sequence. Graph traversal, current-state reads, BM25, vector search, facets, and hybrid search all agree on the snapshot they serve. Search explain payloads report which persisted index runs were used and how much of the WAL tail was overlaid.

Persisted search offers two consistency modes:

  • strong (default) — load the latest compatible persisted index run and overlay any WAL entries committed after that run, so results match the current graph head.
  • eventual — search only the latest persisted index run (cheaper, may lag the head by the unindexed tail).

BM25, vector/ANN, and adjacency indexes are disposable acceleration structures built from the graph, stored as immutable object-storage runs with manifests. Object storage is the durable source of truth; index runs can be rebuilt, garbage-collected, and re-derived at any time. In the hosted product, base indexes are built automatically and search never 404s — it falls back to an ephemeral index while a persisted run is built.

Semantic graph search fuses five signals over one snapshot:

  1. lexical matching
  2. ontology resolution and concept expansion
  3. BM25 full-text
  4. vector / ANN search
  5. graph-neighborhood traversal

Multi-search runs several semantic subqueries and combines them with reciprocal-rank fusion (RRF). Search can target entities, assertions, observations, relation types, ontology terms/concepts, and neighborhoods.

  • Filters are structured JSON expressions: equality, set membership, existence, numeric comparison, token containment, glob, regex, and boolean and/or/not. They apply across BM25, vector, and hybrid paths.
  • Facets count metadata buckets after filters — common fields include target_kind, entity_type, relation, source_id, label, target, and text.

The default AI-context ontology gives lbb entity types, relation types, concepts, terms, and close/broader/narrower mappings, plus phrase-based relation resolution. Semantic search uses ontology resolution to canonicalize query terms and expand related concepts. You can import a custom ontology (JSON-LD, Turtle, RDF/XML, CSV/TSV, or the friendly “spec” shape) at graph creation.

Three complementary read surfaces sit over the same object-storage permutation view:

  • Structured query — a SPARQL-subset SELECT/ASK over a basic graph pattern with typed filters, GROUP BY (including scalar and date-bucket keys), aggregates, HAVING, and ORDER BY.
  • SPARQL 1.1 text — the conformant engine; also served at a native /sparql Protocol endpoint for tools like YASGUI and Protégé.
  • SHACL — Core + SHACL-SPARQL + SHACL-AF for validation, shape selection, property paths, and rule-based inference.

See SPARQL, structured query & SHACL.

Reads can return facts you never explicitly wrote, derived from the ones you did:

  • Type closure — an RDFS subClassOf hierarchy, so a query for a broad type also matches its subtypes. On by default; resolved at query time (no rebuild).
  • Rule inference — SHACL-AF rules entail new edges to a bounded fixpoint (transitivity, classification, role derivation).

Both fold into ordinary queries via the entailment and reason controls, are deterministic and transparent, and can be pinned to a historical snapshot. See Reasoning & inference.

The ObjectBackend contract underpins everything. Two backends exist: LocalFsObjectBackend (filesystem) and S3ObjectBackend (S3-compatible; verified against MinIO and Hetzner Object Storage). Both provide immutable writes, conditional updates via CAS (If-None-Match / If-Match), range reads, content-hash ETags, checksum validation, and listing.

An in-process metrics registry records graph-operation counters/latencies, object-backend counters/bytes/latencies, and HTTP counters/latencies. The server exposes JSON at /api/metrics and Prometheus text at /metrics. Metric labels are deliberately low-cardinality.