pip install mellea faiss-cpu sentence-transformers, Ollama running locally.
Retrieval-augmented generation (RAG) reduces hallucination by grounding the
model’s answer in documents you supply. Mellea adds two things a plain RAG loop
lacks: an LLM-based relevance filter before generation, and optional
groundedness checking after.
The pipeline
Step 1: Index your documents
Use any embedding model and vector store. This example usessentence-transformers and a FAISS flat inner-product index:
IndexFlatIP scores by inner product, which is equivalent to cosine similarity
for L2-normalised embeddings — the default output of sentence-transformers.
Choosing k: start with 5. Too small risks missing the relevant document;
too large floods the filter step and the context window. Tune after measuring
filter acceptance rates.
Step 2: Filter candidates with @generative
Vector similarity finds topically related documents but cannot determine
whether a document actually answers the question. Add an @generative LLM filter:
del embedding_model before starting the Mellea session avoids having both
models resident simultaneously — important on memory-constrained machines.
If all candidates are filtered out, fall back gracefully rather than calling
m.instruct() with an empty context:
Step 3: Generate with grounding_context
Pass the surviving documents as named entries in grounding_context. Mellea
injects them into the prompt and tracks them as separate context components:
grounding_context is separate from user_variables so each component is
rendered and traced independently. Without it, m.instruct() generates from
the model’s parametric knowledge — no grounding.
Step 4: Add requirements to the answer (optional)
Userequirements to enforce answer format, length, or citation style:
Step 5: Check groundedness (optional)
After generation, useGuardianCheck with GuardianRisk.GROUNDEDNESS to
verify the answer does not hallucinate beyond the retrieved documents:
context_text that you used in grounding_context —
this ensures the groundedness model evaluates the answer against exactly what
the generator was given.
Backend note:GuardianCheckrequiresgranite3-guardian:2bpulled in Ollama. Runollama pull granite3-guardian:2bbefore using it.
Putting it together
What to tune
| Parameter | Effect | Starting point |
|---|---|---|
k in search() | Candidates passed to the filter | 5 |
is_relevant docstring | How strictly the filter interprets relevance | Adjust phrasing to match your domain |
grounding_context key names | Tracing and debugging in spans | Use descriptive names in production |
requirements on m.instruct() | Answer length, citation, tone | Add after baseline quality is good |
GuardianCheck context_text | What the groundedness model checks against | Match exactly what you pass to grounding_context |
See also: Resilient RAG with Fallback Filtering | Making Agents Reliable | The Requirements System