Classes
CLASS HFAloraCacheInfo
A dataclass for holding some KV cache and associated information.
CLASS LocalHFBackend
The LocalHFBackend uses Huggingfaceโs transformers library for inference, and uses a Formatter to convert Components into prompts. This backend also supports Activated LoRAs (ALoras)](https://arxiv.org/pdf/2504.12397).
This backend is designed for running an HF model for small-scale inference locally on your machine.
This backend is NOT designed for inference scaling on CUDA-enabled hardware.
Methods:
FUNC generate_from_context
FUNC processing
FUNC post_processing
FUNC generate_from_raw
FUNC generate_from_raw
FUNC generate_from_raw
FUNC cache_get
FUNC cache_put
FUNC base_model_name
granite-3.3-8b-instruct for ibm-granite/granite-3.3-8b-instruct.