Skip to main content
The purpose of the Hugginface backend is to provide a setting for implementing experimental features. If you want a performance local backend, and do not need experimental features such as Span-based context or ALoras, consider using Ollama backends instead.

Classes

CLASS HFAloraCacheInfo

A dataclass for holding some KV cache and associated information.

CLASS LocalHFBackend

The LocalHFBackend uses Huggingfaceโ€™s transformers library for inference, and uses a Formatter to convert Components into prompts. This backend also supports Activated LoRAs (ALoras)](https://arxiv.org/pdf/2504.12397). This backend is designed for running an HF model for small-scale inference locally on your machine. This backend is NOT designed for inference scaling on CUDA-enabled hardware.
Methods:

FUNC generate_from_context

generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context]
Generate using the huggingface model.

FUNC processing

processing(self, mot: ModelOutputThunk, chunk: str | GenerateDecoderOnlyOutput, input_ids)
Process the returned chunks or the complete response.

FUNC post_processing

post_processing(self, mot: ModelOutputThunk, conversation: list[dict], _format: type[BaseModelSubclass] | None, tool_calls: bool, tools: dict[str, Callable], seed, input_ids)
Called when generation is done.

FUNC generate_from_raw

generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]]

FUNC generate_from_raw

generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]]

FUNC generate_from_raw

generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk]
Generate using the completions api. Gives the input provided to the model without templating.

FUNC cache_get

cache_get(self, id: str) -> HFAloraCacheInfo | None
Retrieve from cache.

FUNC cache_put

cache_put(self, id: str, v: HFAloraCacheInfo)
Put into cache.

FUNC base_model_name

base_model_name(self)
Returns the base_model_id of the model used by the backend. For example, granite-3.3-8b-instruct for ibm-granite/granite-3.3-8b-instruct.

FUNC add_adapter

add_adapter(self, adapter: LocalHFAdapter)
Adds the given adapter to the backend. Must not have been added to a different backend.

FUNC load_adapter

load_adapter(self, adapter_qualified_name: str)
Loads the given adapter for the backend. Must have previously been added. Do not call when generation requests are happening.

FUNC unload_adapter

unload_adapter(self, adapter_qualified_name: str)
Unloads the given adapter from the backend.

FUNC list_adapters

list_adapters(self) -> list[str]
Lists the adapters added via add_adapter(). :returns: list of adapter names that are currently registered with this backend