mellea.stdlib.sampling.sampling_algos.budget_forcing_alg

Functions

FUNC `think_budget_forcing`

think_budget_forcing(backend: OllamaModelBackend, action: CBlock | Component) -> ModelOutputThunk

Generate with budget forcing using the completions APIs. This relies on raw autocompletion and assumes the model’s output is structured in the following form: ‘<think> … </think> summary answer’ The budget forcing method is proposed in the paper: https://arxiv.org/abs/2501.19393 This implementation tries to follow the key outlines in the paper while ensuring stable and fail-safe operation. This is performed via multi-step generation. The model will be called multiple times until requirements are met, in other words, the response will be assembled conditionally. Args:

backend: OllamaModelBackend
action: The last item of the context should be passed in as an action instead of as part of the ctx. See docs/dev/generate_signature_decisions.md.
think_max_tokens: Budget in number of tokens allocated for the think block
answer_max_tokens: Budget in number of tokens allocated for the summary and answer block, None indicates unbounded answer, generating till EoS
start_think_token: String indicating start of think block, default <think>
end_think_token: String indicating end of think block, default </think>
begin_response_token: Used by certain models, string indicating start of response block, e.g. “<response>”, default None
think_more_suffix: String to append to force continued thinking, e.g. “\nWait” if set to None we will not force additional thinking. Use None for upper-bound budget case
answer_suffix: String to append to force a final answer
model_options: Any model options to upsert into the defaults for this call.

mellea

cli

​Functions

​FUNC think_budget_forcing

Functions

FUNC `think_budget_forcing`