- Building a tool-using agent with
instruct()andModelOption.TOOLS - Enforcing structured output with requirements and a retry budget
- Inspecting
SamplingResultto understand failures - Detecting harmful outputs with
GuardianCheck - Grounding safety checks against retrieved context
pip install mellea, Ollama running locally with granite4:micro downloaded.
Step 1: A simple tool-using agent
Start with two tools — a search stub and a calculator — and wire them into aninstruct() call:
Step 2: Adding output requirements
Require the agent to format its answer as a short structured response:Step 3: Inspecting failures and handling a retry budget
UseRejectionSamplingStrategy with return_sampling_results=True to observe
what happens when requirements fail:
result.success is True when at least one attempt satisfied all requirements.
result.sample_generations gives you every attempt in order — useful for
debugging or for choosing the best available output when the budget runs out.
Step 4: Adding Guardian harm detection
GuardianCheck wraps a MelleaSession call and evaluates the output against a
set of GuardianRisk category. Run it after your agent responds to flag outputs before
they reach downstream code.
Note:Eachm.validate()evaluates the checks against the most recent session output. Run it immediately after theinstruct()call before any other session activity modifies the context.
GuardianCheck runs as an independent inference call against your local
Ollama instance. The results are ValidationResult objects with ._result
(bool) and ._reason (str).
Step 5: Sharing a backend across Guardian checks
When you run multipleGuardianCheck instances, each one loads or contacts the
model separately by default. Pass backend=shared_backend to reuse a single
loaded backend and avoid the overhead of repeated initialisation:
GuardianRisk values you can check:
HARM, GROUNDEDNESS, PROFANITY, ANSWER_RELEVANCE, JAILBREAK,
FUNCTION_CALL, SOCIAL_BIAS, VIOLENCE, SEXUAL_CONTENT,
UNETHICAL_BEHAVIOR.
Step 6: Groundedness checks with retrieved context
When your agent retrieves documents before answering, add aGROUNDEDNESS check
to confirm the response is grounded in what was retrieved rather than
hallucinated:
Tip: Pass the same text you supplied asgrounding_contexttocontext_textinGuardianCheck. This ensures the groundedness model evaluates the response against exactly what the agent was given.
Step 7: A ReACT agent with Guardian checks
For goal-driven agentic loops, combinereact() with Guardian validation. The
react() function is an async built-in that runs the Reason-Act loop until the
goal is reached or the step budget is exhausted:
Advanced:react()implements the Reason + Act loop: the LLM alternates between producing a reasoning step (“Thought”) and invoking a tool (“Action”) until it determines the goal is satisfied or the step budget runs out. You can inspect the intermediate steps via the second return value (the trace list). For fine-grained control over each reasoning step, build a custom loop usingm.instruct()withModelOption.TOOLSdirectly.
What you built
A progression from a basic tool-using agent to a safety-validated, grounded agentic system:| Layer | What it adds |
|---|---|
instruct() + ModelOption.TOOLS | LLM can call Python tools |
requirements + simple_validate | Deterministic and LLM-judged output constraints |
RejectionSamplingStrategy | Explicit retry budget |
return_sampling_results=True | Inspect every attempt for debugging |
GuardianCheck | Post-generation safety risk detection |
Shared backend | Amortise model loading across multiple checks |
GuardianRisk.GROUNDEDNESS + context_text | Detect hallucination relative to retrieved context |
react() | Goal-driven multi-step agentic loop |
See also: The Requirements System | Security and Taint Tracking | Tools and Agents