pip install mellea, Ollama running locally, pytest installed.
Contributing to Mellea itself? See the Contributing Guide for Mellea’s own test markers, fixtures, and CI setup.Testing generative code requires you to separate concerns: some assertions are always deterministic (the output is the right type), while others depend on model behaviour and are inherently qualitative. This page shows you how to structure both categories, configure the right pytest markers, and make your CI pipeline fast and reliable.
Three levels of assertion
Every test for a@generative function falls into one of three levels:
| Level | What you assert | Deterministic? |
|---|---|---|
| Type check | isinstance(result, bool) | Yes — constrained decoding always returns the declared type |
| Structural check | result in ["positive", "negative"] or field names present | Yes — schema enforcement is deterministic |
| Qualitative check | assert result is True | No — depends on the model and prompt |
@pytest.mark.qualitative and are skipped in CI when CICD=1 is set.
Setting up a test session fixture
Use abackend fixture to handle CI versus local configuration, and a
function-scoped session fixture to give each test a clean slate:
Note: Scopingbackendtomoduleandsessiontofunctionstrikes a balance between setup cost and test isolation. Each test gets a clean context, but the backend connection is created once per module.
Module-level markers
Declare markers at the top of your test file withpytestmark so they apply to
every test in the module without repetition. Register your own markers in
pyproject.toml under [tool.pytest.ini_options] markers to avoid warnings:
Testing @generative functions
Type assertions — always deterministic
The return type of a@generative function is enforced by constrained decoding
or output parsing. An isinstance check never depends on model behaviour:
Structural assertions — always deterministic
ForLiteral return types, membership in the allowed values is enforced before
your test sees the result. The assertion is still deterministic:
Qualitative assertions — mark and skip in CI
When you want to assert on the content of a response, add@pytest.mark.qualitative. These tests are skipped automatically in CI
(CICD=1) and are intended to run locally or in a dedicated quality gate:
Warning: Do not assert on qualitative behaviour without@pytest.mark.qualitative. A deterministic-looking assertion likeassert score > 5can flake across model versions, temperatures, and quantisation levels.
Testing instruct() calls
instruct() calls are non-qualitative when you test structure, not content.
Assert that the call returns a value and that the value has the right type:
Inspecting logged model options
_generate_log.model_options lets you confirm that options you passed were
forwarded to the model. This is useful when testing custom model option handling:
Note: _generate_log is an internal attribute. Its structure may change
between Mellea versions. Use it for debugging and option-forwarding tests, not
as a primary correctness check.
Using simple_validate for deterministic checks
simple_validate wraps a plain function into a validation callable that
Requirement accepts. Use it to assert deterministic structural constraints
inside the IVR loop, or directly in tests to verify that your validator logic
behaves correctly:
simple_validate to a Requirement, it checks the last model
output as a string, regardless of how the output was parsed:
The unit_test_eval component
mellea.stdlib.components.unit_test_eval provides TestBasedEval, a
Component that formats an LLM-as-a-judge evaluation task. You load test cases
from a JSON file and pass them to a judge session. This is useful for offline
evaluation pipelines, not for individual pytest assertions.
JSON file format
Each entry in the JSON array defines one test:Loading and running evaluations
Note: TestBasedEval calls the judge model once per input. For large
evaluation sets, consider batching or running evaluations asynchronously.
CI strategy
A simpleconftest.py that skips qualitative tests in CI:
| Test category | Marker | Runs in CI? |
|---|---|---|
| Type and structural checks | (none needed) | Yes |
| Qualitative content checks | @pytest.mark.qualitative | No — skipped when CI=true |
| Tests needing a running backend | @pytest.mark.requires_ollama | Only if Ollama is in CI |
| Long-running tests | @pytest.mark.slow | Optionally excluded |
Next steps
- The Requirements System — understand how
Requirement,simple_validate, andcheckinteract with the IVR loop - Handling Exceptions — catch and diagnose errors that occur during generation