Mellea — build predictable AI without guesswork

Mellea helps you manage the unreliable part of every AI-powered pipeline: the LLM call itself. It replaces ad-hoc prompt chains and brittle agents with structured generative programs — Python code where LLM calls are first-class operations governed by type annotations, requirement verifiers, and principled repair loops.

uv pip install mellea

Get started

Install Mellea and run your first generative program in minutes.

Tutorial

Build a complete program with generation, validation, and repair.

Code examples

Runnable examples: RAG, agents, sampling, MObjects, and more.

API reference

Full public API — backends, session, components, requirements, sampling.

How Mellea works

Mellea’s design rests on three interlocking ideas.

Python, not prose

@generative turns a typed function signature into an LLM-backed implementation. Docstrings become prompts. Type hints become output schemas. No DSL required.

Requirements driven

Declare what good output looks like with req(). Mellea checks every response before it leaves the session — using LLM verifiers, programmatic checks, or domain-trained adapters.

Instruct · Validate · Repair

When a requirement fails, Mellea feeds the failure back and tries again. Rejection sampling, majority voting, and SOFAI are built in.

Key patterns

MObjects and mify

Add @mify to any class to make it LLM-queryable and tool-accessible without rewriting your data model.

Context and sessions

Explicit context threading with push/pop state keeps multi-turn workflows reproducible and debuggable.

Async and streaming

ainstruct(), aact(), and token-by-token streaming for production throughput and responsive UIs.

Safety checks

Guardian Intrinsics detect harmful, off-topic, or hallucinated outputs before they reach downstream code.

Inference-time scaling

Best-of-n, SOFAI, majority voting — swap strategies in one line.

Tools and agents

@tool, MelleaTool, and the ReACT loop for goal-driven multi-step agents.

Backends

Mellea is backend-agnostic. The same program runs on any inference engine.

Ollama

Local inference, zero cloud costs.

OpenAI

GPT-4o, o3-mini, any OpenAI-compatible API.

AWS Bedrock

AWS Bedrock via Bedrock Mantle or LiteLLM.

IBM WatsonX

IBM WatsonX managed AI platform.

HuggingFace

Local inference with Transformers — aLoRA and constrained decoding.

LiteLLM / Vertex AI

Google Vertex AI, Anthropic, and 100+ providers via LiteLLM.

LangChain

Use LangChain tools in Mellea sessions or call Mellea from LangChain chains.

See Backends and configuration for the full list of supported backends and how to configure them.

How-to guides

Enforce structured output

Pydantic models, Literal types, and @generative for guaranteed schemas.

Write custom verifiers

Python functions, ValidationResult, and multi-field validation logic.

Async and streaming

aact(), ainstruct(), and token-by-token streaming output.

Use context and sessions

ChatContext, explicit context threading, and multi-session workflows.

Configure model options

Temperature, seed, max tokens, system prompts — cross-backend with ModelOption.

Use images and vision

Pass images to instruct() and chat() with any vision-capable backend.

Build a RAG pipeline

Vector search, LLM relevance filtering, and grounded generation end-to-end.

GitHub · PyPI · Discussions

Get started

Tutorial

Code examples

API reference

​How Mellea works

Python, not prose

Requirements driven

Instruct · Validate · Repair

​Key patterns

MObjects and mify

Context and sessions

Async and streaming

Safety checks

Inference-time scaling

Tools and agents

​Backends

Ollama

OpenAI

AWS Bedrock

IBM WatsonX

HuggingFace

LiteLLM / Vertex AI

LangChain

​How-to guides

Enforce structured output

Write custom verifiers

Async and streaming

Use context and sessions

Configure model options

Use images and vision

Build a RAG pipeline

How Mellea works

Key patterns

Backends

How-to guides