Skip to main content
Prerequisites: Telemetry introduces the environment variables and telemetry architecture. This page covers metrics collection in detail. Mellea automatically records LLM metrics across all backends using OpenTelemetry. Metrics follow the Gen-AI Semantic Conventions for standardized observability. The metrics API also lets you create your own counters, histograms, and up-down counters for application-level instrumentation.
Note: Metrics are an optional feature. All instrument calls are no-ops when metrics are disabled or the [telemetry] extra is not installed.

Enable metrics

export MELLEA_METRICS_ENABLED=true
You also need at least one exporter configured — see Metrics export configuration below.

Token usage metrics

Mellea records token consumption automatically after each LLM call completes. No code changes are required.

Token instruments

Metric NameTypeUnitDescription
mellea.llm.tokens.inputCountertokensTotal input/prompt tokens processed
mellea.llm.tokens.outputCountertokensTotal output/completion tokens generated

Token attributes

All token metrics include these attributes following Gen-AI semantic conventions:
AttributeDescriptionExample Values
gen_ai.provider.nameBackend provider nameopenai, ollama, watsonx, litellm, huggingface
gen_ai.request.modelModel identifiergpt-4, llama3.2:7b, granite-3.1-8b-instruct

Backend support

BackendStreamingNon-StreamingSource
OpenAIYesYesusage.prompt_tokens and usage.completion_tokens
OllamaYesYesprompt_eval_count and eval_count
WatsonXNoYesinput_token_count and generated_token_count (streaming API limitation)
LiteLLMYesYesusage.prompt_tokens and usage.completion_tokens
HuggingFaceYesYesCalculated from input_ids and output sequences
Note: Token usage metrics are only tracked for generate_from_context requests. generate_from_raw calls do not record token metrics.

Token recording timing

Token metrics are recorded after the full response is received, not incrementally during streaming:
  • Non-streaming: Metrics recorded immediately after await mot.avalue() completes.
  • Streaming: Metrics recorded after the stream is fully consumed (all chunks received).
This ensures accurate token counts from the backend’s usage metadata, which is only available after the complete response.
mot, _ = await backend.generate_from_context(msg, ctx)

# Metrics NOT recorded yet (stream still in progress)
await mot.astream()

# Metrics recorded here (after stream completion)
await mot.avalue()

Latency histograms

Mellea tracks request duration and time-to-first-token (TTFB) automatically after each LLM call. No code changes are required.

Latency instruments

Metric NameTypeUnitDescription
mellea.llm.request.durationHistogramsTotal request duration, from call to full response
mellea.llm.ttfbHistogramsTime to first token (streaming requests only)

Latency attributes

AttributeDescriptionExample Values
gen_ai.provider.nameBackend provider nameopenai, ollama, watsonx, litellm, huggingface
gen_ai.request.modelModel identifiergpt-4, llama3.2:7b, granite-3.1-8b-instruct
streamingWhether streaming mode was used (duration only)True, False

Histogram buckets

Custom bucket boundaries are configured for LLM-sized latencies:
  • mellea.llm.request.duration: 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30, 60, 120 seconds
  • mellea.llm.ttfb: 0.05, 0.1, 0.25, 0.5, 1, 2, 5, 10 seconds

Latency recording timing

  • mellea.llm.request.duration: Recorded for every generate_from_context call, both streaming and non-streaming.
  • mellea.llm.ttfb: Recorded only for streaming requests, measuring elapsed time from the generate_from_context call until the first chunk arrives.
Access latency data directly from a ModelOutputThunk:
from mellea import start_session
from mellea.backends import ModelOption

with start_session() as m:
    result = m.instruct(
        "Explain quantum entanglement briefly",
        model_options={ModelOption.STREAM: True},
    )
    if result.generation.streaming and result.generation.ttfb_ms is not None:
        print(f"Time to first token: {result.generation.ttfb_ms:.1f} ms")

Error metrics

Mellea records LLM errors automatically after each failed backend call. No code changes are required. Errors are classified into semantic categories for consistent filtering across providers.

Error counter

Metric NameTypeUnitDescription
mellea.llm.errorsCounter{error}Total LLM errors categorized by type

Error attributes

All error metrics include these attributes:
AttributeDescriptionExample Values
error_typeSemantic error category (mellea-specific)rate_limit, timeout, auth, content_policy, invalid_request, transport_error, server_error, unknown
gen_ai.provider.nameBackend provider nameopenai, ollama, watsonx, litellm, huggingface
gen_ai.request.modelModel identifiergpt-4, llama3.2:7b, granite-3.1-8b-instruct
error.typePython exception class name (standard OTel)RateLimitError, TimeoutError, AuthenticationError

Error type categories

The error_type attribute maps exceptions to human-friendly semantic labels:
CategoryDescriptionMatched exceptions
rate_limitRequest throttled by provideropenai.RateLimitError, class names containing ratelimit
timeoutRequest or connection timed outTimeoutError, openai.APITimeoutError, class names containing timeout
authAuthentication or authorization failureopenai.AuthenticationError, openai.PermissionDeniedError, class names containing auth
content_policyRequest rejected by content moderationopenai.BadRequestError with code="content_policy_violation", class names containing content_policy
invalid_requestMalformed or unsupported requestopenai.BadRequestError (non-content-policy)
transport_errorNetwork or connection failureConnectionError, openai.APIConnectionError, class names containing connection/transport
server_errorProvider-side internal erroropenai.InternalServerError, class names containing server
unknownUnrecognized exception typeAny exception not matched above

When errors are recorded

Error metrics are recorded when a backend raises an exception during generation, after the request has been dispatched to the provider. Construction-time errors (e.g. missing API key) are not captured by the error counter.

Cost metrics

Mellea estimates request cost automatically after each LLM call when pricing data is available. No code changes are required.

Cost instrument

Metric NameTypeUnitDescription
mellea.llm.cost.usdCounterUSDEstimated request cost in US dollars

Cost attributes

AttributeDescriptionExample Values
gen_ai.provider.nameBackend provider nameopenai, ollama, watsonx, litellm, huggingface
gen_ai.request.modelModel identifiergpt-5.4, claude-sonnet-4-6

Pricing data

Mellea ships with built-in pricing for current OpenAI and Anthropic models. Prices are approximate and may become stale as providers update their rates. For models without built-in pricing, cost is not recorded and a warning is logged instead.

Custom pricing

Override built-in prices or add pricing for any model using a JSON file:
export MELLEA_PRICING_FILE=/path/to/my-pricing.json
The file format maps model IDs to per-million-token rates:
{
  "my-custom-model": {"input_per_1m": 1.0, "output_per_1m": 2.0},
  "gpt-5.4": {"input_per_1m": 2.5, "output_per_1m": 15.0}
}
Custom entries override built-in prices. Errors loading the file are logged as warnings and built-in prices are used as a fallback.

Operational metrics

Mellea records metrics for its internal sampling, validation, and tool execution loops. These counters give visibility into retry behavior, validation failure rates, and tool call health — independent of the underlying LLM provider.

Sampling counters

Metric NameTypeUnitDescription
mellea.sampling.attemptsCounter{attempt}Sampling attempts per loop iteration
mellea.sampling.successesCounter{sample}Sampling loops that produced a passing sample
mellea.sampling.failuresCounter{failure}Sampling loops that exhausted the loop budget without success
All sampling metrics include:
AttributeDescriptionExample Values
strategySampling strategy class nameRejectionSamplingStrategy, MultiTurnStrategy, RepairTemplateStrategy

Requirement counters

Metric NameTypeUnitDescription
mellea.requirement.checksCounter{check}Requirement validation checks performed
mellea.requirement.failuresCounter{failure}Requirement validation checks that failed
AttributeDescriptionExample Values
requirementRequirement class nameLLMaJRequirement, PythonExecutionReq, ALoraRequirement, GuardianCheck
reasonHuman-readable failure reason (mellea.requirement.failures only)"Output did not satisfy constraint", "unknown"

Tool counter

Metric NameTypeUnitDescription
mellea.tool.callsCounter{call}Tool invocations by name and status
AttributeDescriptionExample Values
toolName of the invoked tool"search", "calculator"
statusExecution outcomesuccess, failure

Metrics export configuration

Mellea supports multiple metrics exporters that can be used independently or simultaneously.
Warning: If MELLEA_METRICS_ENABLED=true but no exporter is configured, Mellea logs a warning. Metrics are collected but not exported.

Console exporter (debugging)

Print metrics to console for local debugging without setting up an observability backend:
export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_CONSOLE=true
python your_script.py
Metrics are printed as JSON at the configured export interval (default: 60 seconds).

OTLP exporter (production)

Export metrics to an OTLP collector for production observability platforms (Jaeger, Grafana, Datadog, etc.):
export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_OTLP=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

# Optional: metrics-specific endpoint (overrides general endpoint)
export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318

# Optional: set service name
export OTEL_SERVICE_NAME=my-mellea-app

# Optional: adjust export interval (milliseconds, default: 60000)
export OTEL_METRIC_EXPORT_INTERVAL=30000
OTLP collector setup example:
cat > otel-collector-config.yaml <<EOF
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  prometheus:
    endpoint: 0.0.0.0:8889
  debug:
    verbosity: detailed

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [prometheus, debug]
EOF

docker run -p 4317:4317 -p 8889:8889 \
  -v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
  otel/opentelemetry-collector:latest

Prometheus exporter

Register metrics with the prometheus_client default registry for Prometheus scraping:
export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_PROMETHEUS=true
When enabled, Mellea registers its OpenTelemetry metrics with the prometheus_client default registry via PrometheusMetricReader. Your application is responsible for exposing the registry. Common approaches: Standalone HTTP server (simplest):
from prometheus_client import start_http_server

start_http_server(9464)
FastAPI middleware:
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from fastapi import FastAPI, Response

app = FastAPI()

@app.get("/metrics")
def metrics():
    return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)
Flask route:
from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
from flask import Flask, Response

app = Flask(__name__)

@app.route("/metrics")
def metrics():
    return Response(generate_latest(), content_type=CONTENT_TYPE_LATEST)
Verify with:
curl http://localhost:9464/metrics
Prometheus server configuration:
# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'mellea'
    static_configs:
      - targets: ['localhost:9464']
docker run -p 9090:9090 \
  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus
Access Prometheus UI at http://localhost:9090 and query metrics like mellea_llm_tokens_input.

Multiple exporters simultaneously

You can enable multiple exporters at once:
export MELLEA_METRICS_ENABLED=true
export MELLEA_METRICS_CONSOLE=true
export MELLEA_METRICS_OTLP=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export MELLEA_METRICS_PROMETHEUS=true
This configuration prints metrics to console for immediate feedback, exports to an OTLP collector for centralized observability, and registers with the prometheus_client registry for Prometheus scraping. Typical combinations:
  • Development: Console + Prometheus for local testing
  • Production: OTLP + Prometheus for comprehensive monitoring
  • Debugging: Console only for quick verification

Custom metrics

The metrics API exposes create_counter, create_histogram, and create_up_down_counter for instrumenting your own application code. These return no-ops when metrics are disabled, so you can call them unconditionally.
from mellea.telemetry import create_counter, create_histogram, create_up_down_counter

# Monotonically increasing values
requests = create_counter("myapp.requests", unit="1", description="Total requests")
requests.add(1, {"backend": "ollama", "model": "granite4:micro"})

# Value distributions
latency = create_histogram("myapp.latency", unit="ms", description="Request latency")
latency.record(120.5, {"backend": "ollama"})

# Values that increase or decrease
active = create_up_down_counter(
    "myapp.sessions.active", unit="1", description="Active sessions"
)
active.add(1)   # session started
active.add(-1)  # session ended

Programmatic access

Check if metrics are enabled:
from mellea.telemetry import is_metrics_enabled

if is_metrics_enabled():
    print("Metrics are being collected")
Access token usage and latency data from a ModelOutputThunk:
from mellea import start_session
from mellea.backends import ModelOption

with start_session() as m:
    result = m.instruct("Write a haiku about programming")

    if result.generation.usage:
        print(f"Prompt tokens: {result.generation.usage['prompt_tokens']}")
        print(f"Completion tokens: {result.generation.usage['completion_tokens']}")
        print(f"Total tokens: {result.generation.usage['total_tokens']}")

    # Streaming mode also exposes time-to-first-token
    streamed = m.instruct(
        "Describe the solar system",
        model_options={ModelOption.STREAM: True},
    )
    print(f"Streaming: {streamed.generation.streaming}")
    if streamed.generation.ttfb_ms is not None:
        print(f"Time to first token: {streamed.generation.ttfb_ms:.1f} ms")
The generation attribute is a GenerationMetadata dataclass. Its usage field is a dictionary with three keys: prompt_tokens, completion_tokens, and total_tokens. All backends populate this consistently. streaming and ttfb_ms are set automatically based on whether streaming mode was used.

Performance

  • Zero overhead when disabled: When MELLEA_METRICS_ENABLED=false (default), no auto-registered metrics plugins are active and all instrument calls are no-ops.
  • Minimal overhead when enabled: Counter increments and histogram recordings are extremely fast (~nanoseconds per operation).
  • Async export: Metrics are batched and exported asynchronously (default: every 60 seconds).
  • Non-blocking: Metric recording never blocks LLM calls.
  • Automatic collection: Metrics are recorded via hooks after generation completes — no manual instrumentation needed.

Troubleshooting

Metrics not appearing:
  1. Verify MELLEA_METRICS_ENABLED=true is set.
  2. Check that at least one exporter is configured (Console, OTLP, or Prometheus).
  3. For OTLP: Verify MELLEA_METRICS_OTLP=true and the endpoint is reachable.
  4. For Prometheus: Verify MELLEA_METRICS_PROMETHEUS=true and your application exposes the registry (curl http://localhost:PORT/metrics).
  5. Enable console output (MELLEA_METRICS_CONSOLE=true) to verify metrics are being collected.
Missing OpenTelemetry dependency:
ImportError: No module named 'opentelemetry'
Install telemetry dependencies:
pip install "mellea[telemetry]"
OTLP connection refused:
Failed to export metrics via OTLP
  1. Verify the OTLP collector is running: docker ps | grep otel
  2. Check the endpoint URL is correct (default: http://localhost:4317).
  3. Verify network connectivity: curl http://localhost:4317
  4. Check collector logs for errors.
Metrics not updating:
  1. Metrics are exported at intervals (default: 60 seconds). Wait for the export cycle.
  2. Reduce the export interval for testing: export OTEL_METRIC_EXPORT_INTERVAL=10000 (10 seconds).
  3. For Prometheus: Metrics update on scrape, not continuously.
  4. Verify LLM calls are actually being made and completing successfully.
No exporter configured warning:
WARNING: Metrics are enabled but no exporters are configured
Enable at least one exporter:
  • Console: export MELLEA_METRICS_CONSOLE=true
  • OTLP: export MELLEA_METRICS_OTLP=true + endpoint
  • Prometheus: export MELLEA_METRICS_PROMETHEUS=true
Full example: docs/examples/telemetry/metrics_example.py

See also:
  • Telemetry — overview of all telemetry features and configuration.
  • Tracing — distributed traces with Gen-AI semantic conventions.
  • Logging — console logging and OTLP log export.