Advanced: This page is for contributors, backend developers, and anyone who wants to understand what happens when Mellea executes a request. If you are building applications with Mellea, you do not need this material.Mellea’s high-level API (
m.chat(), m.instruct(), @generative) is built on three
core data structures. Understanding these structures and the abstraction layers above
them explains how Mellea achieves lazy evaluation, parallel dispatch, and composable
context management.
The three core data structures
CBlock
A CBlock (content block) is a wrapper around a string that marks a tokenisation
and KV caching boundary:
CBlocks are the leaf nodes of every data dependency graph in Mellea. Importantly,
CBlock boundaries affect tokenisation:
tokenise(a + b). When you care about KV cache reuse, CBlock
boundaries let you control exactly where the tokeniser makes splits.
Component
A Component is a declarative structure that can depend on other Components or
CBlocks. Components are the unit of composition in Mellea. Message,
Instruction, @mify objects, and @generative functions all produce Components.
ModelOutputThunk
A ModelOutputThunk is a lazy reference to a computation result. It represents the
future output of an LLM call — the call may or may not have been dispatched yet
when you receive the thunk. You can pass a thunk as an input to another Component
before the underlying computation has completed.
The abstraction layers
Each layer below is a thinner wrapper around the one beneath it. You work at whatever level of abstraction the task requires.Layer 1: MelleaSession
The entry point for most programs. The session bundles a backend, a context, and
high-level methods. Everything is handled for you:
m.chat(), the session:
- Wraps your string in a
Messagecomponent - Passes the component and context to the backend
- Updates the context with the result
- Returns the response as a
Message
Layer 2: Functional API with explicit context
The functional API (mfuncs) exposes the same operations as stateless functions.
Context is threaded explicitly — you pass it in and get a new context back:
Layer 3: Direct component construction with mfuncs.act()
mfuncs.act() accepts any component or CBlock directly. All other mfuncs
functions (chat, instruct, etc.) are thin wrappers that construct a component
and then call act():
Layer 4: Async execution with mfuncs.aact()
Mellea’s core is async. The synchronous API wraps the async operations with
asyncio.run(). For each method in mfuncs there is an a* async version:
Layer 5: Lazy computation via backend.generate_from_context()
mfuncs.aact() is itself a convenience wrapper around the backend’s
generate_from_context() method. Calling it directly returns a ModelOutputThunk
rather than an evaluated response:
Layer 6: Composing lazy computations
Because thunks are lazy, you can pass a thunk as an input to a second computation before the first one has been evaluated. This lets the backend optimise across the full dependency graph:z’s dependency on x and y, evaluates them in order (or
in parallel if the backend supports it), and returns z’s result.
Layer summary
| Layer | Entry point | Who uses it |
|---|---|---|
MelleaSession | m.chat(), m.instruct() | Application developers |
mfuncs synchronous | mfuncs.chat(), mfuncs.act() | Application developers needing context control |
mfuncs async | mfuncs.aact(), mfuncs.achat() | Advanced users building async pipelines |
backend.generate_from_context() | Thunks, is_computed(), avalue() | Backend developers, advanced users |
| Composition | SimpleComponent with thunk inputs | Backend developers |
Template and prompt engineering
TemplateFormatter
Mellea formats Python objects into LLM-readable text using aTemplateFormatter.
It uses Jinja2 templates stored in a templates/prompts/ directory. Each
component class can have its own template, looked up by class name.
The formatter resolves templates in this order:
- Cached templates (from recent lookups)
- The formatter’s configured template path
- The package that owns the component (
melleaor a third-party package)
default/:
ibm-granite/granite-3.2-8b-instruct matches granite/granite-3-2/instruct
but not ibm/ — only one path should match in any given templates directory.
TemplateRepresentation
Each component’s format_for_llm() method returns either a string or a
TemplateRepresentation. The TemplateRepresentation specifies:
- A reference to the component instance
- A dictionary of arguments passed to the template renderer
- A list of tools or functions related to the component
- Either a
template(inline Jinja2 string) or atemplate_order(list of template file names to look up, where*means the class name)
Customising templates for an existing class
To change how an existing component is rendered, subclass it and overrideformat_for_llm(). Then create a new template file at the appropriate path.
See docs/examples/mify/rich_document_advanced.py
for a worked example.
See also: Generative Programming | Working with Data | Async and Streaming