Skip to main content
This tutorial shows how to build a tool-using agent with Mellea and progressively add reliability layers: output requirements, retry budgets, and Guardian safety checks that detect harmful or off-topic responses before they reach your users. By the end you will have covered:
  • Building a tool-using agent with instruct() and ModelOption.TOOLS
  • Enforcing structured output with requirements and a retry budget
  • Inspecting SamplingResult to understand failures
  • Detecting harmful outputs with GuardianCheck
  • Grounding safety checks against retrieved context
Prerequisites: Tutorial 02 and Tutorial 03 complete, pip install mellea, Ollama running locally with granite4:micro downloaded.

Step 1: A simple tool-using agent

Start with two tools — a search stub and a calculator — and wire them into an instruct() call:
import mellea
from mellea.backends import ModelOption, tool

@tool
def web_search(query: str) -> str:
    """Search the web for information about a topic.

    Args:
        query: The search query.
    """
    # Stub — replace with a real search client in production.
    return f"Top result for '{query}': Mellea is a Python framework for generative programs."

@tool(name="calculator")
def calculate(expression: str) -> str:
    """Evaluate a safe arithmetic expression and return the result as a string.

    Args:
        expression: An arithmetic expression, e.g. '12 * 7 + 3'.
    """
    allowed = set("0123456789 +-*/(). ")
    if not all(c in allowed for c in expression):
        return "Error: expression contains disallowed characters."
    return str(eval(expression))  # noqa: S307 — only safe characters pass the guard above

m = mellea.start_session()

response = m.instruct(
    "What is Mellea, and how many characters are in the word 'Mellea'?",
    model_options={ModelOption.TOOLS: [web_search, calculate]},
)
print(str(response))
# Output will vary — LLM responses depend on model and temperature.
The model can call either or both tools during its response. With no requirements attached, the output format is up to the model.

Step 2: Adding output requirements

Require the agent to format its answer as a short structured response:
import mellea
from mellea.backends import ModelOption, tool
from mellea.stdlib.requirements import req, simple_validate

@tool
def web_search(query: str) -> str:
    """Search the web for information about a topic.

    Args:
        query: The search query.
    """
    return f"Top result for '{query}': Mellea is a Python framework for generative programs."

@tool(name="calculator")
def calculate(expression: str) -> str:
    """Evaluate a safe arithmetic expression.

    Args:
        expression: An arithmetic expression.
    """
    allowed = set("0123456789 +-*/(). ")
    if not all(c in allowed for c in expression):
        return "Error: expression contains disallowed characters."
    return str(eval(expression))  # noqa: S307

m = mellea.start_session()

response = m.instruct(
    "What is Mellea, and how many characters are in the word 'Mellea'?",
    model_options={ModelOption.TOOLS: [web_search, calculate]},
    requirements=[
        req("The response must answer both questions."),
        req(
            "The response must be 50 words or fewer.",
            validation_fn=simple_validate(
                lambda x: (
                    len(x.split()) <= 50,
                    f"Response is {len(x.split())} words; must be 50 or fewer.",
                )
            ),
        ),
    ],
)
print(str(response))
# Output will vary — LLM responses depend on model and temperature.
The word-count requirement runs deterministically. The “answer both questions” requirement falls back to LLM-as-a-judge. If either fails, Mellea retries with the failure reason embedded in the repair request.

Step 3: Inspecting failures and handling a retry budget

Use RejectionSamplingStrategy with return_sampling_results=True to observe what happens when requirements fail:
import mellea
from mellea.backends import ModelOption, tool
from mellea.stdlib.requirements import req, simple_validate
from mellea.stdlib.sampling import RejectionSamplingStrategy

@tool
def web_search(query: str) -> str:
    """Search the web for information about a topic.

    Args:
        query: The search query.
    """
    return f"Top result for '{query}': Mellea is a Python framework for generative programs."

@tool(name="calculator")
def calculate(expression: str) -> str:
    """Evaluate a safe arithmetic expression.

    Args:
        expression: An arithmetic expression.
    """
    allowed = set("0123456789 +-*/(). ")
    if not all(c in allowed for c in expression):
        return "Error: expression contains disallowed characters."
    return str(eval(expression))  # noqa: S307

m = mellea.start_session()

result = m.instruct(
    "What is Mellea, and how many characters are in the word 'Mellea'?",
    model_options={ModelOption.TOOLS: [web_search, calculate]},
    requirements=[
        req("The response must answer both questions."),
        req(
            "The response must be 50 words or fewer.",
            validation_fn=simple_validate(
                lambda x: (
                    len(x.split()) <= 50,
                    f"Response is {len(x.split())} words; must be 50 or fewer.",
                )
            ),
        ),
    ],
    strategy=RejectionSamplingStrategy(loop_budget=3),
    return_sampling_results=True,
)

if result.success:
    print("Passed:", str(result.result))
else:
    print(f"All {len(result.sample_generations)} attempts failed.")
    for i, attempt in enumerate(result.sample_generations):
        print(f"  Attempt {i + 1}: {str(attempt.value)[:80]}...")
result.success is True when at least one attempt satisfied all requirements. result.sample_generations gives you every attempt in order — useful for debugging or for choosing the best available output when the budget runs out.

Step 4: Adding Guardian harm detection

GuardianCheck wraps a MelleaSession call and evaluates the output against a set of GuardianRisk category. Run it after your agent responds to flag outputs before they reach downstream code.
import mellea
from mellea.backends import ModelOption, tool
from mellea.stdlib.requirements import req, simple_validate
from mellea.stdlib.requirements.safety.guardian import GuardianCheck, GuardianRisk
from mellea.stdlib.sampling import RejectionSamplingStrategy

@tool
def web_search(query: str) -> str:
    """Search the web for information about a topic.

    Args:
        query: The search query.
    """
    return f"Top result for '{query}': Mellea is a Python framework for generative programs."

@tool(name="calculator")
def calculate(expression: str) -> str:
    """Evaluate a safe arithmetic expression.

    Args:
        expression: An arithmetic expression.
    """
    allowed = set("0123456789 +-*/(). ")
    if not all(c in allowed for c in expression):
        return "Error: expression contains disallowed characters."
    return str(eval(expression))  # noqa: S307

m = mellea.start_session()

response = m.instruct(
    "What is Mellea, and how many characters are in the word 'Mellea'?",
    model_options={ModelOption.TOOLS: [web_search, calculate]},
    requirements=[
        req("The response must answer both questions."),
        req(
            "The response must be 50 words or fewer.",
            validation_fn=simple_validate(
                lambda x: (
                    len(x.split()) <= 50,
                    f"Response is {len(x.split())} words; must be 50 or fewer.",
                )
            ),
        ),
    ],
    strategy=RejectionSamplingStrategy(loop_budget=3),
)

output_text = str(response)

# Run Guardian checks on the agent output.
harm_check = GuardianCheck(
    GuardianRisk.HARM,
    backend_type="ollama",
    ollama_url="http://localhost:11434",
)
jailbreak_check = GuardianCheck(
    GuardianRisk.JAILBREAK,
    backend_type="ollama",
    ollama_url="http://localhost:11434",
)

# session.validate() returns a list of ValidationResult objects.
validation_results = m.validate([harm_check, jailbreak_check])

safe = all(r._result for r in validation_results)
if safe:
    print("Output passed safety checks:", output_text)
else:
    for check_result in validation_results:
        if not check_result._result:
            print(f"Safety check failed — {check_result._reason}")
Note: m.validate() evaluates the checks against the most recent session output. Run it immediately after the instruct() call before any other session activity modifies the context.
Each GuardianCheck runs as an independent inference call against your local Ollama instance. The results are ValidationResult objects with ._result (bool) and ._reason (str).

Step 5: Sharing a backend across Guardian checks

When you run multiple GuardianCheck instances, each one loads or contacts the model separately by default. Pass backend=shared_backend to reuse a single loaded backend and avoid the overhead of repeated initialisation:
import mellea
from mellea.backends import ModelOption, model_ids, tool
from mellea.backends.ollama import OllamaModelBackend
from mellea.stdlib.requirements.safety.guardian import GuardianCheck, GuardianRisk

@tool
def web_search(query: str) -> str:
    """Search the web for information about a topic.

    Args:
        query: The search query.
    """
    return f"Top result for '{query}': Mellea is a Python framework for generative programs."

m = mellea.start_session()

response = m.instruct(
    "What is Mellea?",
    model_options={ModelOption.TOOLS: [web_search]},
)

# Create a single Guardian backend and reuse it across all checks.
# Pull the model first: ollama pull granite3-guardian:2b
guardian_backend = OllamaModelBackend(model_ids.IBM_GRANITE_GUARDIAN_3_0_2B.ollama_name)

checks = [
    GuardianCheck(GuardianRisk.HARM, backend=guardian_backend),
    GuardianCheck(GuardianRisk.PROFANITY, backend=guardian_backend),
    GuardianCheck(GuardianRisk.ANSWER_RELEVANCE, backend=guardian_backend),
    GuardianCheck(GuardianRisk.JAILBREAK, backend=guardian_backend),
]

results = m.validate(checks)

for risk, result in zip(checks, results):
    status = "PASS" if result._result else "FAIL"
    print(f"[{status}] {risk}: {result._reason or 'ok'}")
The full list of GuardianRisk values you can check: HARM, GROUNDEDNESS, PROFANITY, ANSWER_RELEVANCE, JAILBREAK, FUNCTION_CALL, SOCIAL_BIAS, VIOLENCE, SEXUAL_CONTENT, UNETHICAL_BEHAVIOR.

Step 6: Groundedness checks with retrieved context

When your agent retrieves documents before answering, add a GROUNDEDNESS check to confirm the response is grounded in what was retrieved rather than hallucinated:
import mellea
from mellea.backends import ModelOption, tool
from mellea.stdlib.requirements.safety.guardian import GuardianCheck, GuardianRisk

RETRIEVED_CONTEXT = (
    "Mellea is an open-source Python framework for building generative programs. "
    "It provides instruct(), @generative, and @mify as its core primitives. "
    "Mellea is backend-agnostic and supports Ollama, OpenAI, and custom backends."
)

@tool
def retrieve_docs(topic: str) -> str:
    """Retrieve documentation about a topic.

    Args:
        topic: The topic to retrieve documentation for.
    """
    # In production, call your vector store or search index here.
    return RETRIEVED_CONTEXT

m = mellea.start_session()

response = m.instruct(
    "Using the retrieved documentation, describe what Mellea is.",
    model_options={ModelOption.TOOLS: [retrieve_docs]},
    grounding_context={"docs": RETRIEVED_CONTEXT},
)

output_text = str(response)

# Check the response is grounded in the retrieved context.
groundedness_check = GuardianCheck(
    GuardianRisk.GROUNDEDNESS,
    backend_type="ollama",
    ollama_url="http://localhost:11434",
    context_text=RETRIEVED_CONTEXT,
)

results = m.validate([groundedness_check])
grounded = results[0]._result

if grounded:
    print("Grounded response:", output_text)
else:
    print("Response may contain hallucinated content.")
    print("Reason:", results[0]._reason)
Tip: Pass the same text you supplied as grounding_context to context_text in GuardianCheck. This ensures the groundedness model evaluates the response against exactly what the agent was given.

Step 7: A ReACT agent with Guardian checks

For goal-driven agentic loops, combine react() with Guardian validation. The react() function is an async built-in that runs the Reason-Act loop until the goal is reached or the step budget is exhausted:
import asyncio
import mellea
from mellea.backends import tool
from mellea.stdlib.context import ChatContext
from mellea.stdlib.frameworks.react import react
from mellea.stdlib.requirements.safety.guardian import GuardianCheck, GuardianRisk

@tool
def web_search(query: str) -> str:
    """Search the web for information about a topic.

    Args:
        query: The search query.
    """
    return f"Search result for '{query}': Mellea is a Python framework."

@tool(name="calculator")
def calculate(expression: str) -> str:
    """Evaluate a safe arithmetic expression.

    Args:
        expression: An arithmetic expression.
    """
    allowed = set("0123456789 +-*/(). ")
    if not all(c in allowed for c in expression):
        return "Error: expression contains disallowed characters."
    return str(eval(expression))  # noqa: S307

m = mellea.start_session()

async def run_agent(goal: str) -> str:
    result, _ = await react(
        goal=goal,
        context=ChatContext(),
        backend=m.backend,
        tools=[web_search, calculate],
    )
    return str(result)

output = asyncio.run(run_agent(
    "Find out what Mellea is, then calculate how many characters are in 'Mellea'."
))

# Validate the agent's final output.
harm_check = GuardianCheck(
    GuardianRisk.HARM,
    backend_type="ollama",
    ollama_url="http://localhost:11434",
)
results = m.validate([harm_check])

if results[0]._result:
    print("Agent output (safe):", output)
else:
    print("Agent output flagged:", results[0]._reason)
# Output will vary — LLM responses depend on model and temperature.
Advanced: react() implements the Reason + Act loop: the LLM alternates between producing a reasoning step (“Thought”) and invoking a tool (“Action”) until it determines the goal is satisfied or the step budget runs out. You can inspect the intermediate steps via the second return value (the trace list). For fine-grained control over each reasoning step, build a custom loop using m.instruct() with ModelOption.TOOLS directly.

What you built

A progression from a basic tool-using agent to a safety-validated, grounded agentic system:
LayerWhat it adds
instruct() + ModelOption.TOOLSLLM can call Python tools
requirements + simple_validateDeterministic and LLM-judged output constraints
RejectionSamplingStrategyExplicit retry budget
return_sampling_results=TrueInspect every attempt for debugging
GuardianCheckPost-generation safety risk detection
Shared backendAmortise model loading across multiple checks
GuardianRisk.GROUNDEDNESS + context_textDetect hallucination relative to retrieved context
react()Goal-driven multi-step agentic loop

See also: The Requirements System | Security and Taint Tracking | Tools and Agents