pip install mellea, Ollama running locally.
A sampling strategy controls what happens after the first generation: whether to
retry on failure, how to repair output, and whether to escalate to a more powerful
model. You pass a strategy to instruct() via the strategy parameter.
Rejection sampling
RejectionSamplingStrategy is the default. It generates once, validates all
requirements, and retries from scratch up to loop_budget times on failure:
return_sampling_results=True, instruct() returns a SamplingResult with:
result.success— whether any attempt passed all requirementsresult.result— the passing output (if any)result.sample_generations— all intermediate generations
return_sampling_results=True, instruct() returns a ModelOutputThunk
directly (the last generation, regardless of whether validation passed).
The default strategy when you don’t pass strategy explicitly is
RejectionSamplingStrategy(loop_budget=2).
Validation feedback
The repair loop works best when failing requirements provide a reason. TheValidationResult.reason string is included in the repair prompt sent to the model:
SOFAI — dual-model escalation
Advanced: SOFAI (Slow and Fast AI) uses two backends: S1 (fast, small) handles most cases; S2 (slower, larger) escalates when S1 exhausts its budget.
SOFAISamplingStrategy is useful when a fast local model handles easy inputs but
you need a more capable model for hard cases:
s2_solver_mode controls how S2 starts when escalated:
| Mode | Behavior |
|---|---|
"fresh_start" | S2 receives a clean context with no S1 history |
"continue_chat" | S2 continues from S1’s conversation history |
"best_attempt" | S2 starts from S1’s best attempt so far |
ValidationResult.reason string is passed to both S1 and S2 as repair guidance —
write specific, actionable failure reasons for best results.
Full example: docs/examples/sofai/sofai_graph_coloring.py
Budget forcing
Advanced:BudgetForcingSamplingStrategycontrols thinking-token budgets for models that support extended reasoning (e.g., models with<think>tokens).
Note:BudgetForcingSamplingStrategyis not exported frommellea.stdlib.samplingdirectly — import frommellea.stdlib.sampling.budget_forcing. Token defaults arethink_max_tokens=4096andanswer_max_tokens=None. The strategy wrapsRejectionSamplingStrategyso you can combine it with requirements andloop_budget.
Majority voting
Advanced: MajorityVotingStrategyForMath generates multiple independent
answers and selects the most common one — useful for math and reasoning tasks where
the correct answer should appear frequently across independent samples.
Note:MajorityVotingStrategyForMathis designed for numeric math expressions (it normalises and compares parsed values).MBRDRougeLStrategyuses ROUGE-L scoring for text tasks — passnumber_of_samplesto control how many independent generations are compared. Neither is exported frommellea.stdlib.samplingdirectly — import frommellea.stdlib.sampling.majority_voting.
Other built-in strategies
Two additional strategies are exported frommellea.stdlib.sampling:
RepairTemplateStrategy — like RejectionSamplingStrategy but appends
validation failure reasons to a copy of the original instruction rather than
retrying from a clean state. Use this when you want the repair prompt to include
the full original instruction plus a “what went wrong” addendum:
MultiTurnStrategy — multi-turn repair that adds validation failures as a
new chat turn rather than rewriting the original instruction. The model sees
its previous attempt in the context and is asked to revise it. Use with
ChatContext for agentic repair loops:
Building a custom strategy
ExtendBaseSamplingStrategy to implement your own repair logic. You must
implement two static methods:
repair(old_ctx, new_ctx, past_actions, past_results, past_val)— returns a(Component, Context)tuple for the next generation attempt.select_from_failure(sampled_actions, sampled_results, sampled_val)— returns the index of the best result when the budget is exhausted with no success.
instruct() just like the built-in ones: