Skip to main content
Mellea-specific terms used throughout this guide. Terms are listed alphabetically. Cross-links from guide pages point here on first use only.

act() / aact()

act() is the generic session method that runs any Component and returns a result. Every higher-level method (instruct(), chat(), query(), transform()) builds a Component and delegates to act(). Use act() directly when working with custom components or building your own inference loops. aact() is the async counterpart — same signature, same return types. See: act() and aact()

aLoRA (Activated LoRA)

An Activated LoRA (aLoRA) is a LoRA adapter dynamically loaded by LocalHFBackend at inference time to serve as a lightweight requirement verifier. Instead of running a full LLM call to check a requirement, the adapter is activated on the same model weights already in memory. See: LoRA and aLoRA Adapters

@generative

A decorator that converts a typed Python function into an AI-powered function. @generative uses the function’s name, docstring, parameters, and return type annotation to instruct the LLM. The output is constrained to match the return type. Write the function in idiomatic Python — the more natural the signature and docstring, the better the model understands and imitates it.
from mellea import generative, start_session

@generative
def classify_language(code: str) -> str:
    """Return the programming language of the code snippet."""
    ...

m = start_session()
lang = classify_language(m, code="print('hello')")
See: Generative Functions

Backend

A backend is an inference engine that Mellea uses to run LLM calls. Examples: OllamaModelBackend, OpenAIBackend, LocalHFBackend, LocalVLLMBackend, WatsonxAIBackend. Backends are configured via MelleaSession or start_session(). See: Backends and Configuration

ChatContext

The standard multi-turn context implementation. ChatContext accumulates the full conversation history and passes it to the backend on each call. Create one at the start of a session and pass it through all calls to maintain state:
from mellea.stdlib import ChatContext
ctx = ChatContext()
Use window_size to cap how many turns are sent to the backend:
ctx = ChatContext(window_size=10)
Use SimpleContext instead for stateless, single-turn calls. See: Context and Sessions

CBlock

A CBlock (content block) is the low-level unit of content in Mellea. A CBlock holds text (or image data) and is assembled by a Component into the prompt sent to the backend. Multiple CBlocks compose into a single LLM request. See: Mellea Core Internals

Component

A Component is a reusable, composable unit in Mellea that encapsulates a prompt structure, its requirements, and its parsing logic. Instruction, Message, MObject, and Document are all Component subclasses. Components are the building blocks of generative programs. See: Building Custom Components

ComponentParseError

The exception raised by Component.parse() when the model’s output cannot be parsed into the component’s declared return type S. parse() catches any exception from _parse() and re-raises it as ComponentParseError so all callers get a consistent error type regardless of the underlying parse implementation.
from mellea.core import ComponentParseError

try:
    result = form.parse(thunk)
except ComponentParseError as e:
    print(f"Parsing failed: {e}")
See: Building Custom Components

ContextTurn

A single turn of model input and model output stored inside a Context. Each call to m.instruct(), m.chat(), or m.act() appends a ContextTurn to the active context. Turns are consumed by the backend formatter to build the conversation history sent to the model.

Context

A Context holds the conversation history threaded through a MelleaSession. Mellea provides SimpleContext (single-turn) and ChatContext (multi-turn). Push and pop operations let you branch and restore context state across calls. See: Context and Sessions

Document

A Component that wraps a plain-text reference document for inclusion in a prompt. Pass one or more Document objects in the _docs field of a Message or directly as grounding context in an Instruction. Unlike RichDocument, Document holds pre-extracted text rather than a parsed file.
from mellea.stdlib.components.docs.document import Document
doc = Document(text="...", title="My doc", doc_id="ref-1")

Generative function

A Python function decorated with @generative. Mellea uses the function’s type annotation as the output schema and its docstring as the prompt. Generative functions are called with a MelleaSession as the first argument and return the annotated type. See: Generative Functions

Generative program

Any computer program that contains calls to an LLM. Mellea is a library for writing robust, composable generative programs. See: Generative Programming

GenerateLog

A dataclass that captures a single model call in detail. Pass a list[GenerateLog] to m.validate() via the generate_logs= parameter to record the judge prompt and raw verdict for each requirement validation:
from mellea import start_session
from mellea.core import GenerateLog
from mellea.stdlib.requirements import req

logs: list[GenerateLog] = []
m = start_session()
result = m.instruct("Summarise this text.")
m.validate([req("Must be under 30 words.")], generate_logs=logs)

for log in logs:
    print(log.prompt)   # full judge prompt sent to the model
    print(log.result.value if log.result else None)  # raw verdict string
Key fields: prompt, result (ModelOutputThunk | None), backend, model_options, is_final_result. See: Evaluate with LLM-as-a-Judge

grounding_context

The grounding_context parameter of m.instruct() accepts a dictionary of named text entries that Mellea injects into the prompt as grounding evidence. Each entry is tracked as a separate context component, so it can be traced and rendered independently from the instruction template. Use grounding_context to anchor the model’s output to retrieved documents, knowledge-base passages, or any reference material — without mixing that content into user_variables:
answer = m.instruct(
    "Answer the question: {{question}}",
    user_variables={"question": query},
    grounding_context={"doc0": doc_text_0, "doc1": doc_text_1},
)
Without grounding_context, m.instruct() generates from the model’s parametric knowledge only. It is the primary integration point for RAG pipelines. See: Build a RAG Pipeline

GuardianCheck

A safety requirement in Mellea that validates LLM outputs against defined safety rules before they are returned to the caller. Uses the Granite Guardian model as a verifier. Constructed with a GuardianRisk value and optional backend and context_text parameters. See: Making Agents Reliable | Security and Taint Tracking

GuardianRisk

An enum that specifies which safety risk category GuardianCheck should detect. Each check runs as an independent inference call against the Guardian model. Available values: HARM, GROUNDEDNESS, PROFANITY, ANSWER_RELEVANCE, JAILBREAK, FUNCTION_CALL, SOCIAL_BIAS, VIOLENCE, SEXUAL_CONTENT, UNETHICAL_BEHAVIOR.
from mellea.stdlib.requirements.safety.guardian import GuardianCheck, GuardianRisk

harm_check = GuardianCheck(GuardianRisk.HARM, backend_type="ollama")
See: Making Agents Reliable

KV smashing

The technique of concatenating key-value attention caches from separately prefilled prompt chunks along the time axis, producing a single merged DynamicCache that covers the full context. Used by LocalHFBackend to avoid re-running forward passes on content that has already been cached. When a prompt contains a mix of cached and uncached CBlock objects, Mellea prefills each block independently, then smashes the resulting caches together before generation — giving results identical to a single full-context forward pass at a fraction of the prefill cost. See: Prefix Caching and KV Blocks

LiteLLM / LiteLLMBackend

LiteLLMBackend wraps LiteLLM — a unified interface over 100+ model providers. Use it to reach providers not covered by Mellea’s native backends: Bedrock via IAM, Vertex AI, Together AI, Cohere, and others.
pip install 'mellea[litellm]'
m = mellea.start_session(
    backend_name="litellm",
    model_id="bedrock/converse/us.amazon.nova-pro-v1:0",
)
See: Backends and Configuration

LLM-as-a-judge

The default validation strategy for req() in Mellea. After the model generates an output, a second LLM call is made using the requirement’s description as the evaluation criterion. Mellea converts the judge’s response to True / False by looking for "yes" (case-insensitive) in the reply. Use simple_validate instead when the criterion is deterministic (word count, regex, type check) — no second LLM call is needed. See: Evaluate with LLM-as-a-Judge

ImageBlock

A Mellea type that represents an image in a backend-agnostic, encoded form. Use ImageBlock.from_pil_image(pil_image) to convert a Pillow Image object into an ImageBlock. Both raw PIL images and ImageBlock objects are accepted in the images=[...] parameter of instruct() and chat(). Use ImageBlock when you need an already-encoded representation, or when the PIL image is not directly available (e.g., passing between functions or caching). See: Use Images and Vision Models

Intrinsic

An Intrinsic is a backend-level primitive in Mellea — a structured generation operation with special handling (e.g., constrained decoding, RAG retrieval). The LocalHFBackend exposes Intrinsics directly; server backends route them through adapter endpoints. See: Intrinsics

Instruction

The core Component in the IVR loop. An Instruction wraps a prompt description, optional requirements, in-context examples, and grounding context into a single object that m.act() can execute. m.instruct() is a convenience wrapper that builds an Instruction for you.
from mellea.stdlib.components.instruction import Instruction
instr = Instruction(
    description="Summarise the following text: {{text}}",
    requirements=[req("Must be under 50 words.")],
    user_variables={"text": "..."},
)
result = m.act(instr)

IVR (Instruct-Validate-Repair)

A core generative programming pattern in Mellea:
  1. Instruct — call the LLM with a prompt.
  2. Validate — check the output against a Requirement.
  3. Repair — if validation fails, retry or fix the output.
See: Instruct, Validate, Repair

m decompose

m decompose is a CLI tool that takes a complex task description and uses an LLM to break it into ordered subtasks, extract constraints, and generate a ready-to-run Python script.
m decompose run --prompt-file task.txt --out-dir ./output/
The output includes a JSON breakdown of subtasks and a result.py you can run immediately. Also available programmatically via cli.decompose.pipeline.decompose().

MelleaSession

The primary entry point for Mellea. A MelleaSession wraps a backend and provides instruct(), chat(), act(), aact(), query(), and transform() as session-level methods. Use mellea.start_session() to create one with defaults.
import mellea
m = mellea.start_session()  # returns a MelleaSession

mify / @mify

The @mify decorator turns any Python class into an MObject — an LLM-queryable, tool-accessible wrapper around your data. You specify which fields and methods are visible to the LLM; everything else remains hidden. See: MObjects and mify

MObject

An MObject is a Python class decorated with @mify. It wraps existing data objects so they can be queried and transformed by the LLM via m.query() and m.transform(). Unlike @generative, @mify does not change the class’s Python interface — it adds a layer that the LLM can see and call. See: MObjects and mify

ModelOption

An enum (mellea.backends.ModelOption) of backend-agnostic inference options: TEMPERATURE, SEED, MAX_NEW_TOKENS, SYSTEM_PROMPT, etc. Using ModelOption keys ensures the same options work across all backends.
from mellea.backends import ModelOption
See: Configure Model Options

ModelOutputThunk

The return type of m.instruct(), m.act(), and most session-level generative calls. It wraps the model’s raw output and an optional parsed representation typed to your output schema (accessible via .result). The value is computed lazily — the underlying inference call may not have completed when the thunk is returned. Accessing .value blocks until the result is ready. For async code, use await thunk.avalue() to await completion, or await thunk.astream() to consume output chunk by chunk as it arrives. You can also call str(thunk) to get the raw string output directly. Use thunk.is_computed() to check whether the value has already been filled without triggering evaluation.

PreconditionException

Raised when a requirement attached to a @generative function’s input arguments fails — i.e., before the LLM call is made. Catch it to handle pre-call validation failures gracefully.
from mellea.stdlib.components.genslot import PreconditionException

try:
    result = my_generative_fn(m, ...)
except PreconditionException as e:
    print(e.validation)  # list of ValidationResult
See: Handling Exceptions and Failures

Purple elephant effect

The tendency for a model to produce the very thing you instructed it to avoid, because the instruction draws attention to it. Named after the cognitive phenomenon: “Don’t think about a purple elephant” — and now you are. In Mellea, avoid it by using check() instead of req() for negative constraints. check() validates the output without including the constraint description in the generation prompt:
from mellea.stdlib.requirements import req, check

requirements=[
    req("Mention key features."),                        # model is told this
    check("Must not use the phrase 'industry-leading'"), # model is not told this
]
See: Evaluate with LLM-as-a-Judge

ReAct

Reason + Act — a goal-driven agentic loop where the LLM alternates between reasoning about the next step and calling a tool, repeating until the goal is achieved. Mellea provides mellea.stdlib.frameworks.react.react() as a built-in async implementation:
from mellea.stdlib.frameworks.react import react
result, _ = await react(goal="...", context=ChatContext(), backend=m.backend, tools=[...])
See: Tools and Agents

Requirement

A Requirement is a validation constraint applied to a generative function’s output. Requirements can be programmatic (lambda, regex, type check) or generative (another LLM call). Used in the IVR pattern. req() and check() are the common shorthand constructors from mellea.stdlib.requirements:
  • req(description) — creates a Requirement whose description is included in the prompt, so the model knows to aim for it.
  • check(description) — creates a check-only Requirement whose description is not included in the prompt (avoids the “purple elephant effect” — mentioning a forbidden thing often makes the model produce it).
  • simple_validate(fn) — wraps a lambda or function into a validation_fn, bypassing LLM-as-a-judge for fast deterministic checks.
  • PythonExecutionReq — verifies that Python code in the LLM’s output runs without raising an exception. Import from mellea.stdlib.requirements.python_reqs. Accepts timeout, allowed_imports, and use_sandbox (Docker-based isolation).
See: Requirements System

RichDocument

A RichDocument wraps a Docling parsed document to make PDFs, tables, and structured files queryable by the LLM. Extract tables as Table objects and pass them directly to m.transform() or m.query().
pip install 'mellea[docling]'
See: Working with Data

SimpleLRUCache

An LRU (least-recently-used) cache for storing DynamicCache KV blocks in LocalHFBackend. Pass one at construction time to enable prefix caching:
from mellea.backends.cache import SimpleLRUCache

backend = LocalHFBackend(
    model_id="ibm-granite/granite-3.3-2b-instruct",
    cache=SimpleLRUCache(capacity=5),
)
When the cache reaches capacity, the least recently used block is evicted and its GPU memory freed. Choose capacity based on available VRAM and block size — 1–3 for large documents, up to 10 for small reused fragments. See: Prefix Caching and KV Blocks

SimpleContext

A stateless context where each call is independent — no conversation history is accumulated or sent to the backend. Use it for single-shot tasks where prior turns are irrelevant.
from mellea.stdlib import SimpleContext
ctx = SimpleContext()
For multi-turn conversations, use ChatContext instead. See: Context and Sessions

Sampling strategy

A SamplingStrategy controls how the IVR loop behaves when a requirement fails. Mellea’s built-in strategies:
StrategyBehaviour
RejectionSamplingStrategyRetry up to loop_budget times; return first passing result
RepairTemplateStrategyLike rejection sampling but appends failure reasons to the original instruction
MultiTurnStrategyAdd validation failures as a new chat turn; model revises its previous attempt
MajorityVotingStrategyForMathGenerate N candidates; return the one supported by most (math expressions)
MBRDRougeLStrategyMinimum Bayes Risk decoding using ROUGE-L; best for text generation tasks
SOFAISamplingStrategyFast System-1 generation verified by a slower System-2 model
BudgetForcingSamplingStrategyInject thinking tokens to expand reasoning budget
BaseSamplingStrategyAbstract base; extend to implement custom repair and selection logic
See: Inference-Time Scaling

SamplingResult

The return type of session calls made with return_sampling_results=True, and of the serve() function used with m serve. Holds .result (the selected output), .success (whether a requirement was met), and .sample_generations (all candidates generated).

Table

An MObject wrapping a single table extracted from a RichDocument. Supports m.query() and m.transform() directly, plus .to_markdown() and .transpose().
tables = rich_doc.get_tables()
summary = m.query(tables[0], "What is the total in the last row?")
See: Working with Data

TestBasedEval

A Component in mellea.stdlib.components.unit_test_eval that formats an LLM-as-a-judge evaluation task for structured test cases loaded from JSON. Use it in offline evaluation pipelines to verify model behaviour against a set of input/target pairs.
from mellea.stdlib.components.unit_test_eval import TestBasedEval

test_evals = TestBasedEval.from_json_file("tests/eval_data/cases.json")
for eval_case in test_evals:
    verdict = judge_session.instruct(eval_case)
See: Unit Test Generative Code

TemplateFormatter

A ChatFormatter subclass that renders prompts using Jinja2 templates instead of the default chat-message format. Use it when you need precise control over how components are serialised into the final prompt string. Configured per-backend. See: Template Formatting

TemplateRepresentation

The data class a Component returns from format_for_llm() to describe itself to the TemplateFormatter. It carries the component’s template string, named arguments, tool definitions, and field list — everything the formatter needs to render the component into a prompt fragment. See: Mellea Core Internals

SOFAI

SOFAI (System-1 / System-2 AI) is a sampling strategy in Mellea that mirrors dual-process cognition: a fast “System 1” model generates candidates and a slower “System 2” model verifies them. Uses SOFAISamplingStrategy. See: Inference-Time Scaling

Tool

A Python function decorated with @tool (or registered via MelleaSession) that Mellea exposes to an LLM for function calling. Tools have typed inputs and outputs so the LLM can call them reliably without free-form parsing. See: Tools and Agents

ValidationResult

The return type of a custom verifier function. Holds a boolean result (pass/fail) and optional metadata — reason (string explanation), score (float), and thunk (the raw ModelOutputThunk if the verifier used an LLM call internally).
from mellea.core.requirement import ValidationResult

def my_verifier(output: str) -> ValidationResult:
    passed = len(output.split()) < 50
    return ValidationResult(passed, reason="Too long" if not passed else None)
See: Write Custom Verifiers

Thunk

See ModelOutputThunk.

wait_for_all_mots

A helper from mellea.helpers.async_helpers that concurrently resolves a list of ModelOutputThunk objects. All thunks in the list are awaited in parallel; the call returns when every thunk has been computed.
from mellea.helpers.async_helpers import wait_for_all_mots

thunks = [await m.ainstruct(...) for _ in items]
await wait_for_all_mots(thunks)
# All thunks are now resolved — access .value on each.
Total wall-clock time is roughly the latency of the slowest single call rather than the sum of all calls. Use SimpleContext (the default) when calling wait_for_all_mots; concurrent writes to ChatContext can corrupt state. See: Tutorial 02: Streaming and Async