Console: Models & feedback
The Models pillar closes the retrieval-quality loop: grade results, curate labeled data, and fine-tune embedding models. It has three views: Feedback, Labeled data, and Training.
Feedback
Section titled “Feedback”Grade search results to build relevance judgments. For a query, mark each result
3 (relevant), 1 (partial), or 0 (irrelevant) — the same grades the
search_feedback SDK method records. You can judge against your own queries or
against public evaluation sets (e.g. BEIR · SciFact, MS MARCO · dev) to
get a comparable baseline.
Graded feedback accumulates as customer qrels — the training signal for fine-tuning.
Labeled data
Section titled “Labeled data”Manage the labeled sets produced by feedback:
- Filter by All / Pending / Judged to see what still needs grading.
- Flag duplicate judgments.
- Export JSONL to take the qrels out of the product.
- Delete a labeled set when it’s no longer needed.
This is the curation surface between raw grading and a training run.
Training
Section titled “Training”Kick off and monitor embedding fine-tuning runs against your labeled data. A run takes the exported qrels and produces a fine-tuned model that becomes a managed model you can activate per graph in Retrieval → Embeddings. The view surfaces run progress through the training stages.
Fine-tuning is what turns generic semantic recall into domain-tuned recall — the model learns which results your users consider relevant.
The full loop
Section titled “The full loop”search → grade results (Feedback) → curate qrels (Labeled data) → fine-tune (Training) → activate model (Embeddings) → better search