Console: Models & feedback

The Models pillar closes the retrieval-quality loop: grade results, curate labeled data, and fine-tune embedding models. It has three views: Feedback, Labeled data, and Training.

Feedback

Grade search results to build relevance judgments. For a query, mark each result 3 (relevant), 1 (partial), or 0 (irrelevant) — the same grades the search_feedback SDK method records. You can judge against your own queries or against public evaluation sets (e.g. BEIR · SciFact, MS MARCO · dev) to get a comparable baseline.

Graded feedback accumulates as customer qrels — the training signal for fine-tuning.

Labeled data

Manage the labeled sets produced by feedback:

Filter by All / Pending / Judged to see what still needs grading.
Flag duplicate judgments.
Export JSONL to take the qrels out of the product.
Delete a labeled set when it’s no longer needed.

This is the curation surface between raw grading and a training run.

Training

Kick off and monitor embedding fine-tuning runs against your labeled data. A run takes the exported qrels and produces a fine-tuned model that becomes a managed model you can activate per graph in Retrieval → Embeddings. The view surfaces run progress through the training stages.

Fine-tuning is what turns generic semantic recall into domain-tuned recall — the model learns which results your users consider relevant.

The full loop

search  →  grade results (Feedback)  →  curate qrels (Labeled data)
        →  fine-tune (Training)       →  activate model (Embeddings)  →  better search

Console: Models & feedback

Feedback

Labeled data

Training

The full loop

Related