m.act(), @generative call, and LLM request
produces spans you can inspect in Jaeger, Grafana Tempo, Honeycomb, or any
OTLP-compatible backend.
Note: Tracing is an optional feature. Mellea works normally without it.
All telemetry calls are no-ops when the [telemetry] extra is not installed.
Install and enable tracing
Install the telemetry extra:Configuring an OTLP exporter
SetOTEL_EXPORTER_OTLP_ENDPOINT to any OTLP-compatible endpoint. Mellea uses
the gRPC OTLP exporter, so the endpoint must accept gRPC (default port 4317).
Jaeger
http://localhost:16686 to browse traces.
Grafana Tempo
http://localhost:3200) and use the Explore panel to
query by service name.
Other backends
Any OTLP-compatible backend works with the same environment variables: Honeycomb, Datadog, New Relic, AWS X-Ray (via the OTEL collector), and Google Cloud Trace all accept OTLP over gRPC.Checking trace status programmatically
What spans Mellea emits
Mellea has two independent trace scopes. Enable them separately to reduce noise during debugging.Application spans (mellea.application)
Application spans cover user-facing Mellea operations. They appear whenever you
call m.act(), m.instruct(), m.chat(), or a @generative function.
| Attribute | Description |
|---|---|
mellea.backend | Backend class name (e.g., OllamaModelBackend) |
mellea.action_type | Component class being executed (e.g., Instruction) |
mellea.context_size | Length of the context at call time |
mellea.has_format | Whether a format constraint was specified |
sampling_success | Whether the sampling strategy succeeded |
num_generate_logs | Number of generation attempts (>1 means retries occurred) |
response | Model response truncated to 500 characters |
Backend spans (mellea.backend)
Backend spans cover individual LLM API calls. They follow the
OpenTelemetry Gen-AI Semantic Conventions.
| Attribute | Description |
|---|---|
gen_ai.system | Backend system name mapped from class (e.g., ollama, openai) |
gen_ai.request.model | Model ID requested |
gen_ai.operation.name | "chat" for generate_from_context; "text_completion" for generate_from_raw |
gen_ai.usage.input_tokens | Input tokens consumed |
gen_ai.usage.output_tokens | Output tokens generated |
gen_ai.usage.total_tokens | Total tokens (input + output) |
gen_ai.response.finish_reasons | List of finish reasons (e.g., ["stop"]) |
gen_ai.response.id | Response identifier from the backend |
Span hierarchy
When both scopes are active, backend spans nest inside application spans:Reading traces in a typical agent run
When you open a trace in your backend, look for these patterns: High input token counts on early spans. A singleaact span with
gen_ai.usage.input_tokens much larger than expected usually means the context
has accumulated many previous messages. Use
prefix caching to reduce cost.
Repeated requirement_validation spans beneath one aact. The value of
num_generate_logs in the parent span tells you how many retries occurred.
If the model keeps retrying, read the response attribute on each attempt to
understand why validation is failing.
Long gaps between spans. A gap between the start of a backend chat span
and the next application span usually indicates time spent waiting for the LLM.
This is normal for large models but worth tracking across deploys.
gen_ai.response.finish_reasons containing "length". The model hit the
maximum output token limit and was cut off. Increase max_tokens in your
backend options or shorten your prompts.
Full working example
The example atdocs/examples/telemetry/telemetry_example.py
runs a session with instruct(), @generative, and m.chat() and prints trace
status to stdout. Run it to verify your setup:
Disabling tracing
Tracing is disabled by default. If you have set the environment variables globally and need to turn tracing off for a test run or performance measurement, unset or set them tofalse:
Warning: Setting the environment variables aftermellea.telemetryhas been imported has no effect. The tracing module reads the variables once at module load time and caches the result. Tip: In pytest, use a session-scoped fixture to set environment variables before any test imports Mellea, or usemonkeypatch.setenvcombined withimportlib.reload(mellea.telemetry.tracing)to reset state between tests.
Next steps
- Metrics and Telemetry — enable metrics collection alongside tracing, and learn how to instrument your own code with counters and histograms.
- Evaluate with LLM-as-a-Judge — add automated quality evaluation to your pipeline and correlate evaluation results with trace data.