ainstruct()and the async session method naming conventionModelOption.STREAMandmot.astream()for incremental outputwait_for_all_motsfor fan-out concurrent generation- Context behaviour with concurrent async calls
pip install mellea, Ollama running locally with granite4:micro downloaded.
Step 1: Your first async call
Every sync method onMelleaSession has an a-prefixed async counterpart with
the same signature and return type. Replace instruct() with ainstruct() and
wrap the call in async def:
ainstruct() returns a ModelOutputThunk. await-ing it starts generation
immediately; str(result) resolves the value when it is ready. Every other
method follows the same pattern: achat(), aact(), aquery(),
atransform(), avalidate().
Step 2: Streaming tokens
Enable streaming by passingModelOption.STREAM: True in model_options.
Consume chunks with mot.astream() as they arrive — useful for displaying
output progressively rather than waiting for the full response:
astream() works:
- Each call returns only the new content since the previous call.
- When generation is complete,
is_computed()returnsTrueand the finalastream()call returns the remaining content. - Do not call
astream()from multiple coroutines on the same thunk simultaneously.
Step 3: Concurrent batch processing
The pipeline from Tutorial 01 processes one feedback item at a time, and each call blocks until the previous one completes. Withainstruct() you can fire
all calls immediately and resolve them together.
Use wait_for_all_mots to await a list of thunks concurrently:
Step 4: Mixing parallel and sequential steps
Some pipeline steps are independent; others depend on earlier results. You can resolve dependencies explicitly without blocking unrelated work. In the Tutorial 01 pipeline,extract_issues is independent of summarize —
both take the raw feedback. Run them in parallel, then feed the resolved summary
into classify_sentiment:
Step 5: Context and concurrency
By defaultstart_session() uses SimpleContext, which is safe for concurrent
async calls. If you switch to ChatContext, Mellea logs a warning because
concurrent writes can corrupt the context state:
Note: This warning appears wheneverIf you needChatContextis used with async methods, even if youawaiteach call sequentially. It is safe to ignore when you ensure each call is fully resolved before starting the next.
ChatContext (for multi-turn conversation), await each call before
starting the next:
SimpleContext.
What you built
| Pattern | What it gives you |
|---|---|
ainstruct() / achat() / aact() | Non-blocking LLM calls |
ModelOption.STREAM + astream() | Token-by-token output for responsive UIs |
wait_for_all_mots | Fan-out: all thunks resolve concurrently |
| Explicit dependency ordering | Sequential where needed, parallel everywhere else |
SimpleContext (default) | Safe concurrent access with no state corruption |
See also: Async and Streaming (full API reference) | Tutorial 03: Using Generative Slots