Documentation Index
Fetch the complete documentation index at: https://ibm-llm-runtime-aaf3a78b.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Installation
granite4.1:3b not found
Python 3.13: outlines install failure
outlines requires a Rust compiler. Either install Rust
or pin Python to 3.12:
Intel Mac: torch errors
Create a Conda environment, install torchvision, then install Mellea inside it:
Missing optional dependency
Ollama connectivity
Connection refused
Wrong Ollama URL
If Ollama is running on a non-default host or port, pass the URL explicitly:Requirements and sampling
Requirements always failing — output looks fine
If the model keeps retrying but the output looks correct, the validation function may be too strict. Inspect what is being rejected:return_sampling_results=True makes instruct() return a SamplingResult instead
of a ModelOutputThunk. Use result.success to check whether the budget was
exhausted without a passing output.
Budget exhausted — result.success is False
The model failed all loop_budget attempts. Options:
-
Increase
loop_budget: - Simplify or relax the requirement.
-
Provide a more specific validation function that gives the model useful feedback via
ValidationResult.reason— the reason string is passed back to the model on retry. -
Switch to
SOFAISamplingStrategyto escalate to a stronger model when the primary model fails.
PreconditionException from @generative
@generative function failed before generation. This is
intentional — the function declared that its inputs do not meet a precondition.
Check the function’s @precondition decorators and validate your inputs before calling.
Agents and tools
react() raises RuntimeError
loop_budget without finding a final answer. Either
increase the budget or check that the tool functions are returning the information
the model needs to reach a conclusion.
Tool not called / wrong tool called
If the model is not calling tools as expected:- Verify
ModelOption.TOOLSis set in the session’s model options. - Check the tool’s docstring — the model uses it to decide when to call the tool. A vague or absent docstring leads to poor tool selection.
- Use
guardian_check(context, backend, criteria="function_call")from the Guardian Intrinsics to detect function call hallucinations.
Async
RuntimeError: no running event loop
ainstruct, achat, aact) or wrap in asyncio.run()
if you are at the top level.
asyncio.run() inside a Jupyter notebook
Jupyter notebooks already run an event loop. Use await directly or install
nest_asyncio:
Guardian / safety validation
Guardian Intrinsics (guardian_check(), policy_guardrails(),
factuality_detection(), factuality_correction()) require LocalHFBackend
with an IBM Granite model.
See Safety Guardrails for full usage.
guardian_check() returns unexpected scores
- Double-check the
criteriaargument — use a key fromCRITERIA_BANK(e.g."harm","groundedness") or a free-text criteria string. - For groundedness checks, attach source documents via
documents=[Document(...)]on theMessage("assistant", ...)in the evaluation context — not as a separate user message. - Scores below
0.5are safe; at or above0.5indicates risk detected.
Deprecated GuardianCheck warnings
GuardianCheck / GuardianRisk imports with the Guardian Intrinsics API.
See Safety Guardrails for migration guidance.
Getting more help
- GitHub Issues: github.com/generative-computing/mellea/issues
- Examples:
docs/examples/ - Enable telemetry to inspect what is happening at each step — see Telemetry.
See also: Quick Start | Inference-Time Scaling | Safety Guardrails