Classes
CLASS LocalVLLMBackend
The LocalVLLMBackend uses vLLM’s python interface for inference, and uses a Formatter to convert Components into prompts.
The support for Activated LoRAs (ALoras)](https://arxiv.org/pdf/2504.12397) is planned.
This backend is designed for running an HF model for small-scale inference locally on your machine.
Its throughput is generally higher than that of LocalHFBackend.
However, it takes longer to load the weights during the instantiation.
Also, if you submit a request one by one, it can be slower.
Methods: