Skip to content

Console: Models & feedback

The Models pillar closes the retrieval-quality loop: grade results, curate labeled data, and fine-tune embedding models. It has three views: Feedback, Labeled data, and Training.

Grade search results to build relevance judgments. For a query, mark each result 3 (relevant), 1 (partial), or 0 (irrelevant) — the same grades the search_feedback SDK method records. You can judge against your own queries or against public evaluation sets (e.g. BEIR · SciFact, MS MARCO · dev) to get a comparable baseline.

Graded feedback accumulates as customer qrels — the training signal for fine-tuning.

Manage the labeled sets produced by feedback:

  • Filter by All / Pending / Judged to see what still needs grading.
  • Flag duplicate judgments.
  • Export JSONL to take the qrels out of the product.
  • Delete a labeled set when it’s no longer needed.

This is the curation surface between raw grading and a training run.

Kick off and monitor embedding fine-tuning runs against your labeled data. A run takes the exported qrels and produces a fine-tuned model that becomes a managed model you can activate per graph in Retrieval → Embeddings. The view surfaces run progress through the training stages.

Fine-tuning is what turns generic semantic recall into domain-tuned recall — the model learns which results your users consider relevant.

search → grade results (Feedback) → curate qrels (Labeled data)
→ fine-tune (Training) → activate model (Embeddings) → better search