instruct() or chat() call using the images parameter.
Prerequisites: pip install mellea pillow, a vision-capable model downloaded and
running.
Backend note: The default Ollama model (granite4:micro) does not support image input. You must switch to a vision-capable model such asgranite3.2-visionorllava. Not all backends support vision — see backend notes below.
Basic usage with Ollama
Start a session with a vision-capable model, then pass a PillowImage object in the images list:
llava, llava-phi3, moondream, qwen2.5vl:7b.
Using ImageBlock for explicit control
For the OpenAI backend (and compatible endpoints), convert the PIL image to anImageBlock first:
ImageBlock objects are accepted in the images list. Use
ImageBlock when you need to work with an already-encoded representation or when
the PIL image is not directly available.
Multi-turn vision with ChatContext
Images passed toinstruct() or chat() are stored in the ChatContext
turn history. Subsequent calls in the same session can reference the image without
passing it again:
images=[] explicitly.
Backend support
| Backend | Vision support | Notes |
|---|---|---|
OllamaModelBackend | ✓ | Requires a vision model (e.g., granite3.2-vision, llava) |
OpenAIBackend | ✓ | Use with gpt-4o, or a local vision model via OpenAI-compatible endpoint |
LiteLLMBackend | ✓ | Depends on the underlying provider |
LocalHFBackend | Partial | Model-dependent; experimental |
LocalVLLMBackend | Partial | Model-dependent |
WatsonxAIBackend | ✗ | Not currently supported |
Full example (Ollama):docs/examples/image_text_models/vision_ollama_chat.pyFull example (OpenAI backend):docs/examples/image_text_models/vision_openai_examples.py
See also: Working with Data | The Instruction Model