Skip to main content

Functions

FUNC think_budget_forcing

think_budget_forcing(backend: OllamaModelBackend, action: CBlock | Component) -> ModelOutputThunk
Generate with budget forcing using the completions APIs. This relies on raw autocompletion and assumes the model’s output is structured in the following form: ‘<think> … </think> summary answer’ The budget forcing method is proposed in the paper: https://arxiv.org/abs/2501.19393 This implementation tries to follow the key outlines in the paper while ensuring stable and fail-safe operation. This is performed via multi-step generation. The model will be called multiple times until requirements are met, in other words, the response will be assembled conditionally. Args:
  • backend: OllamaModelBackend
  • action: The last item of the context should be passed in as an action instead of as part of the ctx. See docs/dev/generate_signature_decisions.md.
  • think_max_tokens: Budget in number of tokens allocated for the think block
  • answer_max_tokens: Budget in number of tokens allocated for the summary and answer block, None indicates unbounded answer, generating till EoS
  • start_think_token: String indicating start of think block, default <think>
  • end_think_token: String indicating end of think block, default </think>
  • begin_response_token: Used by certain models, string indicating start of response block, e.g. “<response>”, default None
  • think_more_suffix: String to append to force continued thinking, e.g. “\nWait” if set to None we will not force additional thinking. Use None for upper-bound budget case
  • answer_suffix: String to append to force a final answer
  • model_options: Any model options to upsert into the defaults for this call.