pip install mellea.
LLM-as-a-judge (LLMaJ) uses a second model call to evaluate whether a generated
output meets a criterion expressed in natural language. In Mellea this is the
default validation strategy for req() — you describe what good output looks
like, and Mellea asks the model whether the output satisfies that description.
How it works
When aRequirement has no validation_fn, Mellea runs a separate LLM call
after generation. The requirement’s description and the model output are
formatted into a judge prompt, and the model returns a verdict. Mellea converts
the verdict to True / False by looking for "yes" (case-insensitive) in the
response.
loop_budget
limit) and feeds the failure reason back into the next attempt.
Standalone validation with m.validate()
Run requirements against an existing output without triggering a new generation:m.validate() returns a list of ValidationResult objects, one per requirement.
Capture judge reasoning with generate_logs
To inspect the full judge prompt and verdict, pass aGenerateLog list:
GenerateLog captures the prompt sent to the judge model and the raw verdict
string, which is useful for debugging requirements that are failing unexpectedly.
Avoid the purple elephant effect with check()
Including a requirement description in the generation prompt can cause the model to fixate on the thing you want to avoid — the purple elephant effect. Usecheck() to validate without including the description in the generation prompt:
req() shapes what the model aims for. check() enforces a constraint the model
should satisfy naturally — without being told about it.
Replace LLMaJ with a fast programmatic check
For deterministic criteria (length, format, regex), usesimple_validate to
bypass the LLM judge entirely:
simple_validate wraps a function that receives the model output as a string and
returns bool (or a (bool, reason) tuple). No LLM call is made for validation.
Combine LLMaJ and programmatic checks
Use both in the samerequirements list:
req() steers the model toward a valid postcode. The second uses
simple_validate to enforce the regex — cheaply, without a second LLM call.
Return validation metadata with SamplingResult
To access the full validation outcome alongside the generated output, usereturn_sampling_results=True:
SamplingResult.success is True if at least one attempt satisfied all
requirements. sample_generations lists every attempt made.
See also: The Requirements System |
Write Custom Verifiers |
Handling Exceptions and Failures