act() / aact()
act() is the generic session method that runs any Component and returns a
result. Every higher-level method (instruct(), chat(), query(),
transform()) builds a Component and delegates to act(). Use act() directly
when working with custom components or building your own inference loops.
aact() is the async counterpart — same signature, same return types.
See: act() and aact()
aLoRA (Activated LoRA)
An Activated LoRA (aLoRA) is a LoRA adapter dynamically loaded byLocalHFBackend at inference time to serve as a lightweight requirement verifier.
Instead of running a full LLM call to check a requirement, the adapter is activated
on the same model weights already in memory.
See: LoRA and aLoRA Adapters
@generative
A decorator that converts a typed Python function into an AI-powered function.@generative uses the function’s name, docstring, parameters, and return type
annotation to instruct the LLM. The output is constrained to match the return type.
Write the function in idiomatic Python — the more natural the signature and
docstring, the better the model understands and imitates it.
Backend
A backend is an inference engine that Mellea uses to run LLM calls. Examples:OllamaModelBackend, OpenAIBackend, LocalHFBackend, LocalVLLMBackend,
WatsonxAIBackend. Backends are configured via MelleaSession or
start_session().
See: Backends and Configuration
ChatContext
The standard multi-turn context implementation.ChatContext accumulates the full
conversation history and passes it to the backend on each call. Create one at the
start of a session and pass it through all calls to maintain state:
window_size to cap how many turns are sent to the backend:
SimpleContext instead for stateless, single-turn calls.
See: Context and Sessions
CBlock
ACBlock (content block) is the low-level unit of content in Mellea. A CBlock
holds text (or image data) and is assembled by a Component into the prompt sent
to the backend. Multiple CBlocks compose into a single LLM request.
See: Mellea Core Internals
Component
AComponent is a reusable, composable unit in Mellea that encapsulates a prompt
structure, its requirements, and its parsing logic. Instruction, Message,
MObject, and Document are all Component subclasses. Components are the building
blocks of generative programs.
See: Building Custom Components
ComponentParseError
The exception raised byComponent.parse() when the model’s output cannot be
parsed into the component’s declared return type S. parse() catches any
exception from _parse() and re-raises it as ComponentParseError so all callers
get a consistent error type regardless of the underlying parse implementation.
ContextTurn
A single turn of model input and model output stored inside aContext. Each call
to m.instruct(), m.chat(), or m.act() appends a ContextTurn to the active
context. Turns are consumed by the backend formatter to build the conversation
history sent to the model.
Context
AContext holds the conversation history threaded through a MelleaSession.
Mellea provides SimpleContext (single-turn) and ChatContext (multi-turn). Push
and pop operations let you branch and restore context state across calls.
See: Context and Sessions
Document
AComponent that wraps a plain-text reference document for inclusion in a prompt.
Pass one or more Document objects in the _docs field of a Message or directly
as grounding context in an Instruction. Unlike RichDocument, Document holds
pre-extracted text rather than a parsed file.
Generative function
A Python function decorated with@generative. Mellea uses the function’s type
annotation as the output schema and its docstring as the prompt. Generative
functions are called with a MelleaSession as the first argument and return the
annotated type.
See: Generative Functions
Generative program
Any computer program that contains calls to an LLM. Mellea is a library for writing robust, composable generative programs. See: Generative ProgrammingGenerateLog
A dataclass that captures a single model call in detail. Pass alist[GenerateLog]
to m.validate() via the generate_logs= parameter to record the judge prompt and
raw verdict for each requirement validation:
prompt, result (ModelOutputThunk | None), backend,
model_options, is_final_result.
See: Evaluate with LLM-as-a-Judge
grounding_context
Thegrounding_context parameter of m.instruct() accepts a dictionary of
named text entries that Mellea injects into the prompt as grounding evidence.
Each entry is tracked as a separate context component, so it can be traced
and rendered independently from the instruction template.
Use grounding_context to anchor the model’s output to retrieved documents,
knowledge-base passages, or any reference material — without mixing that content
into user_variables:
grounding_context, m.instruct() generates from the model’s parametric
knowledge only. It is the primary integration point for RAG pipelines.
See: Build a RAG Pipeline
GuardianCheck
A safety requirement in Mellea that validates LLM outputs against defined safety rules before they are returned to the caller. Uses the Granite Guardian model as a verifier. Constructed with aGuardianRisk value and optional backend and
context_text parameters.
See: Making Agents Reliable |
Security and Taint Tracking
GuardianRisk
An enum that specifies which safety risk categoryGuardianCheck should detect.
Each check runs as an independent inference call against the Guardian model.
Available values: HARM, GROUNDEDNESS, PROFANITY, ANSWER_RELEVANCE,
JAILBREAK, FUNCTION_CALL, SOCIAL_BIAS, VIOLENCE, SEXUAL_CONTENT,
UNETHICAL_BEHAVIOR.
KV smashing
The technique of concatenating key-value attention caches from separately prefilled prompt chunks along the time axis, producing a single mergedDynamicCache that
covers the full context. Used by LocalHFBackend to avoid re-running forward
passes on content that has already been cached.
When a prompt contains a mix of cached and uncached CBlock objects, Mellea
prefills each block independently, then smashes the resulting caches together
before generation — giving results identical to a single full-context forward pass
at a fraction of the prefill cost.
See: Prefix Caching and KV Blocks
LiteLLM / LiteLLMBackend
LiteLLMBackend wraps LiteLLM — a unified interface
over 100+ model providers. Use it to reach providers not covered by Mellea’s
native backends: Bedrock via IAM, Vertex AI, Together AI, Cohere, and others.
LLM-as-a-judge
The default validation strategy forreq() in Mellea. After the model generates
an output, a second LLM call is made using the requirement’s description as the
evaluation criterion. Mellea converts the judge’s response to True / False by
looking for "yes" (case-insensitive) in the reply.
Use simple_validate instead when the criterion is deterministic (word count,
regex, type check) — no second LLM call is needed.
See: Evaluate with LLM-as-a-Judge
ImageBlock
A Mellea type that represents an image in a backend-agnostic, encoded form. UseImageBlock.from_pil_image(pil_image) to convert a Pillow
Image object into an ImageBlock. Both raw PIL images and ImageBlock objects are
accepted in the images=[...] parameter of instruct() and chat().
Use ImageBlock when you need an already-encoded representation, or when the PIL image
is not directly available (e.g., passing between functions or caching).
See: Use Images and Vision Models
Intrinsic
AnIntrinsic is a backend-level primitive in Mellea — a structured generation
operation with special handling (e.g., constrained decoding, RAG retrieval). The
LocalHFBackend exposes Intrinsics directly; server backends route them through
adapter endpoints.
See: Intrinsics
Instruction
The coreComponent in the IVR loop. An Instruction wraps a prompt description,
optional requirements, in-context examples, and grounding context into a single
object that m.act() can execute. m.instruct() is a convenience wrapper that
builds an Instruction for you.
IVR (Instruct-Validate-Repair)
A core generative programming pattern in Mellea:- Instruct — call the LLM with a prompt.
- Validate — check the output against a
Requirement. - Repair — if validation fails, retry or fix the output.
m decompose
m decompose is a CLI tool that takes a complex task description and uses an LLM
to break it into ordered subtasks, extract constraints, and generate a ready-to-run
Python script.
result.py you can run
immediately. Also available programmatically via
cli.decompose.pipeline.decompose().
MelleaSession
The primary entry point for Mellea. AMelleaSession wraps a backend and provides
instruct(), chat(), act(), aact(), query(), and transform() as
session-level methods. Use mellea.start_session() to create one with defaults.
mify / @mify
The@mify decorator turns any Python class into an MObject — an
LLM-queryable, tool-accessible wrapper around your data. You specify which fields
and methods are visible to the LLM; everything else remains hidden.
See: MObjects and mify
MObject
An MObject is a Python class decorated with@mify. It wraps existing data
objects so they can be queried and transformed by the LLM via m.query() and
m.transform(). Unlike @generative, @mify does not change the class’s Python
interface — it adds a layer that the LLM can see and call.
See: MObjects and mify
ModelOption
An enum (mellea.backends.ModelOption) of backend-agnostic inference options:
TEMPERATURE, SEED, MAX_NEW_TOKENS, SYSTEM_PROMPT, etc. Using ModelOption
keys ensures the same options work across all backends.
ModelOutputThunk
The return type ofm.instruct(), m.act(), and most session-level generative
calls. It wraps the model’s raw output and an optional parsed representation typed
to your output schema (accessible via .result).
The value is computed lazily — the underlying inference call may not have completed
when the thunk is returned. Accessing .value blocks until the result is ready.
For async code, use await thunk.avalue() to await completion, or
await thunk.astream() to consume output chunk by chunk as it arrives.
You can also call str(thunk) to get the raw string output directly.
Use thunk.is_computed() to check whether the value has already been filled
without triggering evaluation.
PreconditionException
Raised when a requirement attached to a@generative function’s input arguments
fails — i.e., before the LLM call is made. Catch it to handle pre-call validation
failures gracefully.
Purple elephant effect
The tendency for a model to produce the very thing you instructed it to avoid, because the instruction draws attention to it. Named after the cognitive phenomenon: “Don’t think about a purple elephant” — and now you are. In Mellea, avoid it by usingcheck() instead of req() for negative constraints.
check() validates the output without including the constraint description in the
generation prompt:
ReAct
Reason + Act — a goal-driven agentic loop where the LLM alternates between reasoning about the next step and calling a tool, repeating until the goal is achieved. Mellea providesmellea.stdlib.frameworks.react.react() as a built-in
async implementation:
Requirement
ARequirement is a validation constraint applied to a generative function’s
output. Requirements can be programmatic (lambda, regex, type check) or generative
(another LLM call). Used in the IVR pattern.
req() and check() are the common shorthand constructors from mellea.stdlib.requirements:
req(description)— creates aRequirementwhose description is included in the prompt, so the model knows to aim for it.check(description)— creates a check-onlyRequirementwhose description is not included in the prompt (avoids the “purple elephant effect” — mentioning a forbidden thing often makes the model produce it).simple_validate(fn)— wraps a lambda or function into avalidation_fn, bypassing LLM-as-a-judge for fast deterministic checks.PythonExecutionReq— verifies that Python code in the LLM’s output runs without raising an exception. Import frommellea.stdlib.requirements.python_reqs. Acceptstimeout,allowed_imports, anduse_sandbox(Docker-based isolation).
RichDocument
ARichDocument wraps a Docling parsed document
to make PDFs, tables, and structured files queryable by the LLM. Extract tables as
Table objects and pass them directly to m.transform() or m.query().
SimpleLRUCache
An LRU (least-recently-used) cache for storingDynamicCache KV blocks in
LocalHFBackend. Pass one at construction time to enable prefix caching:
capacity, the least recently used block is evicted and
its GPU memory freed. Choose capacity based on available VRAM and block size —
1–3 for large documents, up to 10 for small reused fragments.
See: Prefix Caching and KV Blocks
SimpleContext
A stateless context where each call is independent — no conversation history is accumulated or sent to the backend. Use it for single-shot tasks where prior turns are irrelevant.ChatContext instead.
See: Context and Sessions
Sampling strategy
ASamplingStrategy controls how the IVR loop behaves when a requirement fails.
Mellea’s built-in strategies:
| Strategy | Behaviour |
|---|---|
RejectionSamplingStrategy | Retry up to loop_budget times; return first passing result |
RepairTemplateStrategy | Like rejection sampling but appends failure reasons to the original instruction |
MultiTurnStrategy | Add validation failures as a new chat turn; model revises its previous attempt |
MajorityVotingStrategyForMath | Generate N candidates; return the one supported by most (math expressions) |
MBRDRougeLStrategy | Minimum Bayes Risk decoding using ROUGE-L; best for text generation tasks |
SOFAISamplingStrategy | Fast System-1 generation verified by a slower System-2 model |
BudgetForcingSamplingStrategy | Inject thinking tokens to expand reasoning budget |
BaseSamplingStrategy | Abstract base; extend to implement custom repair and selection logic |
SamplingResult
The return type of session calls made withreturn_sampling_results=True, and of
the serve() function used with m serve. Holds .result (the selected output),
.success (whether a requirement was met), and .sample_generations (all
candidates generated).
Table
AnMObject wrapping a single table extracted from a RichDocument. Supports
m.query() and m.transform() directly, plus .to_markdown() and .transpose().
TestBasedEval
AComponent in mellea.stdlib.components.unit_test_eval that formats an
LLM-as-a-judge evaluation task for structured test cases loaded from JSON. Use it
in offline evaluation pipelines to verify model behaviour against a set of
input/target pairs.
TemplateFormatter
AChatFormatter subclass that renders prompts using Jinja2 templates instead of
the default chat-message format. Use it when you need precise control over how
components are serialised into the final prompt string. Configured per-backend.
See: Template Formatting
TemplateRepresentation
The data class aComponent returns from format_for_llm() to describe itself to
the TemplateFormatter. It carries the component’s template string, named
arguments, tool definitions, and field list — everything the formatter needs to
render the component into a prompt fragment.
See: Mellea Core Internals
SOFAI
SOFAI (System-1 / System-2 AI) is a sampling strategy in Mellea that mirrors dual-process cognition: a fast “System 1” model generates candidates and a slower “System 2” model verifies them. UsesSOFAISamplingStrategy.
See: Inference-Time Scaling
Tool
A Python function decorated with@tool (or registered via MelleaSession) that
Mellea exposes to an LLM for function calling. Tools have typed inputs and outputs
so the LLM can call them reliably without free-form parsing.
See: Tools and Agents
ValidationResult
The return type of a custom verifier function. Holds a booleanresult (pass/fail)
and optional metadata — reason (string explanation), score (float), and
thunk (the raw ModelOutputThunk if the verifier used an LLM call internally).
Thunk
See ModelOutputThunk.wait_for_all_mots
A helper frommellea.helpers.async_helpers that concurrently resolves a list
of ModelOutputThunk objects. All thunks in the list are
awaited in parallel; the call returns when every thunk has been computed.
SimpleContext (the default) when calling
wait_for_all_mots; concurrent writes to ChatContext can corrupt state.
See: Tutorial 02: Streaming and Async