Note: Metrics are an optional feature. All instrument calls are no-ops
when metrics are disabled or the [telemetry] extra is not installed.
Enable metrics
Token usage metrics
Mellea records token consumption automatically after each LLM call completes. No code changes are required.Token instruments
| Metric Name | Type | Unit | Description |
|---|---|---|---|
mellea.llm.tokens.input | Counter | tokens | Total input/prompt tokens processed |
mellea.llm.tokens.output | Counter | tokens | Total output/completion tokens generated |
Token attributes
All token metrics include these attributes following Gen-AI semantic conventions:| Attribute | Description | Example Values |
|---|---|---|
gen_ai.provider.name | Backend provider name | openai, ollama, watsonx, litellm, huggingface |
gen_ai.request.model | Model identifier | gpt-4, llama3.2:7b, granite-3.1-8b-instruct |
Backend support
| Backend | Streaming | Non-Streaming | Source |
|---|---|---|---|
| OpenAI | Yes | Yes | usage.prompt_tokens and usage.completion_tokens |
| Ollama | Yes | Yes | prompt_eval_count and eval_count |
| WatsonX | No | Yes | input_token_count and generated_token_count (streaming API limitation) |
| LiteLLM | Yes | Yes | usage.prompt_tokens and usage.completion_tokens |
| HuggingFace | Yes | Yes | Calculated from input_ids and output sequences |
Note: Token usage metrics are only tracked forgenerate_from_contextrequests.generate_from_rawcalls do not record token metrics.
Token recording timing
Token metrics are recorded after the full response is received, not incrementally during streaming:- Non-streaming: Metrics recorded immediately after
await mot.avalue()completes. - Streaming: Metrics recorded after the stream is fully consumed (all chunks received).
Latency histograms
Mellea tracks request duration and time-to-first-token (TTFB) automatically after each LLM call. No code changes are required.Latency instruments
| Metric Name | Type | Unit | Description |
|---|---|---|---|
mellea.llm.request.duration | Histogram | s | Total request duration, from call to full response |
mellea.llm.ttfb | Histogram | s | Time to first token (streaming requests only) |
Latency attributes
| Attribute | Description | Example Values |
|---|---|---|
gen_ai.provider.name | Backend provider name | openai, ollama, watsonx, litellm, huggingface |
gen_ai.request.model | Model identifier | gpt-4, llama3.2:7b, granite-3.1-8b-instruct |
streaming | Whether streaming mode was used (duration only) | True, False |
Histogram buckets
Custom bucket boundaries are configured for LLM-sized latencies:mellea.llm.request.duration:0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30, 60, 120secondsmellea.llm.ttfb:0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10seconds
Latency recording timing
mellea.llm.request.duration: Recorded for everygenerate_from_contextcall, both streaming and non-streaming.mellea.llm.ttfb: Recorded only for streaming requests, measuring elapsed time from thegenerate_from_contextcall until the first chunk arrives.
ModelOutputThunk:
Error metrics
Mellea records LLM errors automatically after each failed backend call. No code changes are required. Errors are classified into semantic categories for consistent filtering across providers.Error counter
| Metric Name | Type | Unit | Description |
|---|---|---|---|
mellea.llm.errors | Counter | {error} | Total LLM errors categorized by type |
Error attributes
All error metrics include these attributes:| Attribute | Description | Example Values |
|---|---|---|
error_type | Semantic error category (mellea-specific) | rate_limit, timeout, auth, content_policy, invalid_request, transport_error, server_error, unknown |
gen_ai.provider.name | Backend provider name | openai, ollama, watsonx, litellm, huggingface |
gen_ai.request.model | Model identifier | gpt-4, llama3.2:7b, granite-3.1-8b-instruct |
error.type | Python exception class name (standard OTel) | RateLimitError, TimeoutError, AuthenticationError |
Error type categories
Theerror_type attribute maps exceptions to human-friendly semantic labels:
| Category | Description | Matched exceptions |
|---|---|---|
rate_limit | Request throttled by provider | openai.RateLimitError, class names containing ratelimit |
timeout | Request or connection timed out | TimeoutError, openai.APITimeoutError, class names containing timeout |
auth | Authentication or authorization failure | openai.AuthenticationError, openai.PermissionDeniedError, class names containing auth |
content_policy | Request rejected by content moderation | openai.BadRequestError with code="content_policy_violation", class names containing content_policy |
invalid_request | Malformed or unsupported request | openai.BadRequestError (non-content-policy) |
transport_error | Network or connection failure | ConnectionError, openai.APIConnectionError, class names containing connection/transport |
server_error | Provider-side internal error | openai.InternalServerError, class names containing server |
unknown | Unrecognized exception type | Any exception not matched above |
When errors are recorded
Error metrics are recorded when a backend raises an exception during generation, after the request has been dispatched to the provider. Construction-time errors (e.g. missing API key) are not captured by the error counter.Cost metrics
Mellea estimates request cost automatically after each LLM call when pricing data is available. No code changes are required.Cost instrument
| Metric Name | Type | Unit | Description |
|---|---|---|---|
mellea.llm.cost.usd | Counter | USD | Estimated request cost in US dollars |
Cost attributes
| Attribute | Description | Example Values |
|---|---|---|
gen_ai.provider.name | Backend provider name | openai, ollama, watsonx, litellm, huggingface |
gen_ai.request.model | Model identifier | gpt-5.4, claude-sonnet-4-6 |
Pricing data
Mellea ships with built-in pricing for current OpenAI and Anthropic models. Prices are approximate and may become stale as providers update their rates. For models without built-in pricing, cost is not recorded and a warning is logged instead.Custom pricing
Override built-in prices or add pricing for any model using a JSON file:Operational metrics
Mellea records metrics for its internal sampling, validation, and tool execution loops. These counters give visibility into retry behavior, validation failure rates, and tool call health — independent of the underlying LLM provider.Sampling counters
| Metric Name | Type | Unit | Description |
|---|---|---|---|
mellea.sampling.attempts | Counter | {attempt} | Sampling attempts per loop iteration |
mellea.sampling.successes | Counter | {sample} | Sampling loops that produced a passing sample |
mellea.sampling.failures | Counter | {failure} | Sampling loops that exhausted the loop budget without success |
| Attribute | Description | Example Values |
|---|---|---|
strategy | Sampling strategy class name | RejectionSamplingStrategy, MultiTurnStrategy, RepairTemplateStrategy |
Requirement counters
| Metric Name | Type | Unit | Description |
|---|---|---|---|
mellea.requirement.checks | Counter | {check} | Requirement validation checks performed |
mellea.requirement.failures | Counter | {failure} | Requirement validation checks that failed |
| Attribute | Description | Example Values |
|---|---|---|
requirement | Requirement class name | LLMaJRequirement, PythonExecutionReq, ALoraRequirement, GuardianCheck |
reason | Human-readable failure reason (mellea.requirement.failures only) | "Output did not satisfy constraint", "unknown" |
Tool counter
| Metric Name | Type | Unit | Description |
|---|---|---|---|
mellea.tool.calls | Counter | {call} | Tool invocations by name and status |
| Attribute | Description | Example Values |
|---|---|---|
tool | Name of the invoked tool | "search", "calculator" |
status | Execution outcome | success, failure |
Metrics export configuration
Mellea supports multiple metrics exporters that can be used independently or simultaneously.
Warning: If MELLEA_METRICS_ENABLED=true but no exporter is configured,
Mellea logs a warning. Metrics are collected but not exported.
Console exporter (debugging)
Print metrics to console for local debugging without setting up an observability backend:OTLP exporter (production)
Export metrics to an OTLP collector for production observability platforms (Jaeger, Grafana, Datadog, etc.):Prometheus exporter
Register metrics with theprometheus_client default registry for
Prometheus scraping:
prometheus_client default registry via PrometheusMetricReader. Your
application is responsible for exposing the registry. Common approaches:
Standalone HTTP server (simplest):
http://localhost:9090 and query metrics like
mellea_llm_tokens_input.
Multiple exporters simultaneously
You can enable multiple exporters at once:prometheus_client registry for Prometheus scraping.
Typical combinations:
- Development: Console + Prometheus for local testing
- Production: OTLP + Prometheus for comprehensive monitoring
- Debugging: Console only for quick verification
Custom metrics
The metrics API exposescreate_counter, create_histogram, and
create_up_down_counter for instrumenting your own application code. These
return no-ops when metrics are disabled, so you can call them unconditionally.
Programmatic access
Check if metrics are enabled:ModelOutputThunk:
generation attribute is a GenerationMetadata dataclass. Its usage field
is a dictionary with three keys: prompt_tokens, completion_tokens, and
total_tokens. All backends populate this consistently. streaming and ttfb_ms
are set automatically based on whether streaming mode was used.
Performance
- Zero overhead when disabled: When
MELLEA_METRICS_ENABLED=false(default), no auto-registered metrics plugins are active and all instrument calls are no-ops. - Minimal overhead when enabled: Counter increments and histogram recordings are extremely fast (~nanoseconds per operation).
- Async export: Metrics are batched and exported asynchronously (default: every 60 seconds).
- Non-blocking: Metric recording never blocks LLM calls.
- Automatic collection: Metrics are recorded via hooks after generation completes — no manual instrumentation needed.
Troubleshooting
Metrics not appearing:- Verify
MELLEA_METRICS_ENABLED=trueis set. - Check that at least one exporter is configured (Console, OTLP, or Prometheus).
- For OTLP: Verify
MELLEA_METRICS_OTLP=trueand the endpoint is reachable. - For Prometheus: Verify
MELLEA_METRICS_PROMETHEUS=trueand your application exposes the registry (curl http://localhost:PORT/metrics). - Enable console output (
MELLEA_METRICS_CONSOLE=true) to verify metrics are being collected.
- Verify the OTLP collector is running:
docker ps | grep otel - Check the endpoint URL is correct (default:
http://localhost:4317). - Verify network connectivity:
curl http://localhost:4317 - Check collector logs for errors.
- Metrics are exported at intervals (default: 60 seconds). Wait for the export cycle.
- Reduce the export interval for testing:
export OTEL_METRIC_EXPORT_INTERVAL=10000(10 seconds). - For Prometheus: Metrics update on scrape, not continuously.
- Verify LLM calls are actually being made and completing successfully.
- Console:
export MELLEA_METRICS_CONSOLE=true - OTLP:
export MELLEA_METRICS_OTLP=true+ endpoint - Prometheus:
export MELLEA_METRICS_PROMETHEUS=true
Full example: docs/examples/telemetry/metrics_example.py
See also: