mellea.backends.huggingface

The purpose of the Hugginface backend is to provide a setting for implementing experimental features. If you want a performance local backend, and do not need experimental features such as Span-based context or ALoras, consider using Ollama backends instead.

Classes

CLASS `HFAloraCacheInfo`

A dataclass for holding some KV cache and associated information.

CLASS `LocalHFBackend`

The LocalHFBackend uses Huggingface’s transformers library for inference, and uses a Formatter to convert Components into prompts. This backend also supports Activated LoRAs (ALoras)](https://arxiv.org/pdf/2504.12397). This backend is designed for running an HF model for small-scale inference locally on your machine. This backend is NOT designed for inference scaling on CUDA-enabled hardware.

Methods:

FUNC `generate_from_context`

generate_from_context(self, action: Component[C] | CBlock, ctx: Context) -> tuple[ModelOutputThunk[C], Context]

Generate using the huggingface model.

FUNC `processing`

processing(self, mot: ModelOutputThunk, chunk: str | GenerateDecoderOnlyOutput, input_ids)

Process the returned chunks or the complete response.

FUNC `post_processing`

post_processing(self, mot: ModelOutputThunk, conversation: list[dict], _format: type[BaseModelSubclass] | None, tool_calls: bool, tools: dict[str, Callable], seed, input_ids)

Called when generation is done.

FUNC `generate_from_raw`

generate_from_raw(self, actions: list[Component[C]], ctx: Context) -> list[ModelOutputThunk[C]]

FUNC `generate_from_raw`

generate_from_raw(self, actions: list[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk[C | str]]

FUNC `generate_from_raw`

generate_from_raw(self, actions: Sequence[Component[C] | CBlock], ctx: Context) -> list[ModelOutputThunk]

Generate using the completions api. Gives the input provided to the model without templating.

FUNC `cache_get`

cache_get(self, id: str) -> HFAloraCacheInfo | None

Retrieve from cache.

FUNC `cache_put`

cache_put(self, id: str, v: HFAloraCacheInfo)

Put into cache.

FUNC `base_model_name`

base_model_name(self)

Returns the base_model_id of the model used by the backend. For example, granite-3.3-8b-instruct for ibm-granite/granite-3.3-8b-instruct.

FUNC `add_adapter`

add_adapter(self, adapter: LocalHFAdapter)

Adds the given adapter to the backend. Must not have been added to a different backend.

FUNC `load_adapter`

load_adapter(self, adapter_qualified_name: str)

Loads the given adapter for the backend. Must have previously been added. Do not call when generation requests are happening.

FUNC `unload_adapter`

unload_adapter(self, adapter_qualified_name: str)

Unloads the given adapter from the backend.

FUNC `list_adapters`

list_adapters(self) -> list[str]

Lists the adapters added via add_adapter(). :returns: list of adapter names that are currently registered with this backend

mellea

cli

​Classes

​CLASS HFAloraCacheInfo

​CLASS LocalHFBackend

​FUNC generate_from_context

​FUNC processing

​FUNC post_processing

​FUNC generate_from_raw

​FUNC generate_from_raw

​FUNC generate_from_raw

​FUNC cache_get

​FUNC cache_put

​FUNC base_model_name

​FUNC add_adapter

​FUNC load_adapter

​FUNC unload_adapter

​FUNC list_adapters

Classes

CLASS `HFAloraCacheInfo`

CLASS `LocalHFBackend`

FUNC `generate_from_context`

FUNC `processing`

FUNC `post_processing`

FUNC `generate_from_raw`

FUNC `generate_from_raw`

FUNC `generate_from_raw`

FUNC `cache_get`

FUNC `cache_put`

FUNC `base_model_name`

FUNC `add_adapter`

FUNC `load_adapter`

FUNC `unload_adapter`

FUNC `list_adapters`