Looking to use this in code? See Context and Sessions for practical examples and session extension patterns.
The four layers
Components
AComponent is the structured representation of a single interaction with an LLM.
When you call m.instruct(...), Mellea creates an Instruction component — a
composite data structure that holds the description, requirements, user variables,
grounding context, and ICL examples for that call.
Components are composable: a component can contain other components. This is how
Mellea keeps prompts modular. An Instruction contains Requirement objects;
a Requirement is itself a component. The composition forms a directed acyclic
graph (DAG) that the backend renders into a prompt.
The leaf nodes of the DAG are CBlock objects — atomic content blocks that hold
raw text or a parsed representation of a model output.
Backends
ABackend takes a Component, formats it into a prompt, sends it to an LLM, and
returns the model output as a ModelOutputThunk. The Thunk is a lazy wrapper: it
holds the raw model output and parses it on access (via .value or str()).
The backend is responsible for:
- Rendering the component tree into the prompt format the model expects (chat messages, template strings, etc.)
- Making the network or process call to the LLM
- Parsing the response into a typed representation where applicable
Component does not know which backend will render it.
Contexts
AContext records the history of interactions during a session. It is a linked
list (or tree, when you clone a session) of components and their outputs.
The context serves two purposes:
-
Prompt construction — the backend calls
ctx.view_for_generation()to get the components that should appear in the prompt. ForChatContext, this includes all prior turns. ForSimpleContext, it includes only the current instruction. -
Validation — during the IVR loop, requirement validators receive the
Contextobject. They can callctx.last_output()to inspect the most recent model output, or examine the full history for more complex checks.
Sessions
MelleaSession is the developer-facing layer. It wraps a backend and a context,
exposes the instruct(), chat(), validate(), and other methods you use in your
code, and handles the bookkeeping that ties components, context updates, and backend
calls together.
start_session() returns a MelleaSession with defaults: Ollama backend, Granite 4
Micro model, and SimpleContext.
SimpleContext vs ChatContext
The two built-in context types implement very different history policies.
SimpleContext
SimpleContext is stateless between calls. Each instruct() or chat() call sees
only the current instruction — no prior turns. The prompt is entirely determined by
the current component.
Use SimpleContext (the default) when:
- Calls are logically independent (a batch of classification tasks, extraction from different documents)
- You are composing
@generativefunctions whose results flow through Python code, not through chat history - You want predictable, isolated calls with no context accumulation
ChatContext
ChatContext preserves the full message history across calls. The model sees all
prior turns on every new request.
ChatContext when:
- You are building a stateful conversation (a chat assistant, an interactive planning session)
- The model needs to refer back to prior turns to give a coherent response
- You are implementing agentic loops where each step builds on previous results
The context window trade-off
ChatContext accumulates history indefinitely. As history grows, prompts become
larger, latency increases, and cost rises. For long sessions, consider using
ctx.reset_to_new() or m.reset() to clear history at a natural breakpoint.
The ChatContext constructor accepts a window_size parameter to limit how many
prior turns are retained:
SimpleContext (the default)
is the right choice. Reserve ChatContext for applications where conversational
coherence is genuinely required.
Why explicit context management matters
Implicit context — a global chat history that grows without bounds — is a common source of subtle failures in generative programs:- Prompt degradation: A very long history can cause the model to lose focus on the current instruction, producing outputs that drift from what was asked.
- Context window overflow: Every LLM has a maximum token budget. Exceeding it causes truncation or errors.
- Hard-to-debug behaviour: When context is implicit and global, it is hard to reproduce failures — the same instruction can produce different results depending on what happened earlier in the session.
SimpleContext ensures independence by default; ChatContext
is opt-in for cases where history is genuinely needed.
Session cloning
m.clone() creates a copy of a session at its current context state. Both the
original and the clone start from the same history and then diverge independently:
- Exploring multiple continuations of the same context (tree-structured reasoning)
- Running parallel comparisons with the same conversational history
- Implementing best-of-N sampling at the conversation level rather than the single-turn level
Inspecting context
Thectx object exposes helpers for reading the current session state:
last_turn() returns a ContextTurn with .input and .output fields. It is
useful for observability or when you need to log exactly what the model received and
produced.
Extending sessions
MelleaSession is a regular Python class. Subclassing it lets you inject custom
behaviour — input filtering, output validation, logging, rate limiting — into
every call. See Context and Sessions how-to
for a worked example.
See also: Context and Sessions how-to | Async and Streaming