Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ibm-llm-runtime-aaf3a78b.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Prerequisites: pip install "mellea[hf]" for LocalHFBackend (GPU or Apple Silicon Mac recommended), or pip install mellea for OpenAIBackend with a Granite Switch model served via vLLM. Intrinsics are adapter-accelerated operations for RAG quality checks. They use LoRA/aLoRA adapters loaded directly into the HuggingFace backend — faster and more reliable than prompting a general-purpose model for these specialized micro-tasks.
Backend note: Intrinsics work with two backends:
  • LocalHFBackend — loads LoRA/aLoRA adapters from the catalog at runtime. All intrinsics are available. Requires a GPU or Apple Silicon Mac.
  • OpenAIBackend — uses a Granite Switch model served via vLLM with load_embedded_adapters=True. Only intrinsics embedded in the model are available — check the model’s adapter_index.json for the list. See docs/docs/examples/granite-switch/README.md
Intrinsics do not work with Ollama or other remote backends.
Set up the backend once and reuse it across intrinsic calls:
# Requires: mellea[hf]
# Returns: LocalHFBackend
from mellea.backends.huggingface import LocalHFBackend

backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
Or, with a Granite Switch model via the OpenAI backend:
from mellea.backends.openai import OpenAIBackend
from mellea.backends.model_ids import IBM_GRANITE_SWITCH_4_1_3B_PREVIEW
from mellea.formatters import TemplateFormatter

backend = OpenAIBackend(
    model_id=IBM_GRANITE_SWITCH_4_1_3B_PREVIEW.hf_model_name,
    formatter=TemplateFormatter(model_id=IBM_GRANITE_SWITCH_4_1_3B_PREVIEW.hf_model_name),
    base_url="http://localhost:8000/v1",  # vLLM server
    api_key="EMPTY",
    load_embedded_adapters=True,
)

Answerability

Check whether a set of retrieved documents can answer a given question:
# Requires: mellea[hf]
# Returns: bool
from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Document, Message
from mellea.stdlib.components.intrinsic import rag
from mellea.stdlib.context import ChatContext

backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
context = ChatContext().add(Message("assistant", "Hello! How can I help you?"))
question = "What is the square root of 4?"

docs_answerable = [Document("The square root of 4 is 2.")]
docs_not_answerable = [Document("The square root of 8 is approximately 2.83.")]

print(rag.check_answerability(question, docs_answerable, context, backend))   # True
print(rag.check_answerability(question, docs_not_answerable, context, backend))  # False

Context relevance

Assess whether a document is relevant to a question:
# Requires: mellea[hf]
# Returns: str
from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Document
from mellea.stdlib.components.intrinsic import rag
from mellea.stdlib.context import ChatContext

# NOTE: no context_relevance adapter for Granite 4.1 — use granite-4.0-micro
backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro")
context = ChatContext()
question = "Who is the CEO of Microsoft?"
document = Document(
    "Microsoft Corporation is an American multinational corporation "
    "headquartered in Redmond, Washington."
)

result = rag.check_context_relevance(question, document, context, backend)
print(result)  # 'partially relevant' — doc is about Microsoft but not its CEO

Hallucination detection

Flag sentences in an assistant response that are not grounded in the source documents:
# Requires: mellea[hf]
# Returns: list[str]
from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Document, Message
from mellea.stdlib.components.intrinsic import rag
from mellea.stdlib.context import ChatContext

backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
context = (
    ChatContext()
    .add(Message("assistant", "Hello! How can I help you?"))
    .add(Message("user", "Tell me about yellow fish."))
)

response = "Purple bumble fish are yellow. Green bumble fish are also yellow."
documents = [
    Document(doc_id="1", text="The only type of fish that is yellow is the purple bumble fish.")
]

result = rag.flag_hallucinated_content(response, documents, context, backend)
print(result)
# Flags "Green bumble fish are also yellow." as hallucinated

Answer relevance rewriting

Rewrite a vague or incomplete answer to be more grounded in the source documents:
# Requires: mellea[hf]
# Returns: str
from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Document, Message
from mellea.stdlib.components.intrinsic import rag
from mellea.stdlib.context import ChatContext

backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
context = ChatContext().add(Message("user", "Who attended the meeting?"))
documents = [
    Document("Meeting attendees: Alice, Bob, Carol."),
    Document("Meeting time: 9:00 am to 11:00 am."),
]
original = "Many people attended the meeting."

result = rag.rewrite_answer_for_relevance(original, documents, context, backend)
print(result)
# A more specific, grounded answer — output will vary

Query rewriting

Rewrite an ambiguous user query using conversation history to improve retrieval:
# Requires: mellea[hf]
# Returns: str
from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Message
from mellea.stdlib.components.intrinsic import rag
from mellea.stdlib.context import ChatContext

backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
context = (
    ChatContext()
    .add(Message("assistant", "Welcome to pet questions!"))
    .add(Message("user", "I have two pets: a dog named Rex and a cat named Lucy."))
    .add(Message("assistant", "Rex spends a lot of time outdoors, and Lucy is always inside."))
    .add(Message("user", "Sounds good! Rex must love exploring outside."))
)
next_turn = "But is he more likely to get fleas because of that?"

result = rag.rewrite_question(next_turn, context, backend)
print(result)
# Resolves "he" to "Rex" and incorporates context about outdoor exposure

Citations

Find supporting sentences in source documents for a given assistant response:
# Requires: mellea[hf]
# Returns: dict
from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Document, Message
from mellea.stdlib.components.intrinsic import rag
from mellea.stdlib.context import ChatContext

backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
context = ChatContext().add(
    Message("user", "How did Murdoch expand in Australia versus New Zealand?")
)
response = (
    "Murdoch expanded in Australia and New Zealand by acquiring local newspapers. "
    "I do not have information about his expansion in New Zealand after purchasing "
    "The Dominion."
)
documents = [
    Document(doc_id="1", text="Keith Rupert Murdoch was born on 11 March 1931 in Melbourne..."),
    Document(doc_id="2", text="This document has nothing to do with Rupert Murdoch."),
]

result = rag.find_citations(response, documents, context, backend)
print(result)
# Maps each response sentence to supporting document sentences

Direct intrinsic usage

Advanced: For custom adapter tasks, use the Intrinsic component and CustomIntrinsicAdapter directly.
# Requires: mellea[hf]
# Returns: dict
import mellea.stdlib.functional as mfuncs
from mellea.backends.adapters.adapter import CustomIntrinsicAdapter
from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components import Intrinsic, Message
from mellea.stdlib.context import ChatContext

backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")

# Register an adapter by task name
req_adapter = CustomIntrinsicAdapter(
    "requirement-check",
    base_model_name=backend.base_model_name,
)
backend.add_adapter(req_adapter)

ctx = ChatContext()
ctx = ctx.add(Message("user", "Hi, can you help me?"))
ctx = ctx.add(Message("assistant", "Yes! What can I help with?"))

out, _ = mfuncs.act(
    Intrinsic(
        "requirement-check",
        intrinsic_kwargs={"requirement": "The assistant is helpful."},
    ),
    ctx,
    backend,
)
print(out)  # {"requirement_likelihood": 1.0}
The Intrinsic component loads aLoRA adapters (falling back to LoRA) by task name. For OpenAI backends with Granite Switch, adapters are loaded from the model’s HuggingFace repository configuration instead of the intrinsic catalog. Output format is task-specific — requirement-check returns a likelihood score.

Guardian Intrinsics

Safety and factuality checks use a separate set of Guardian-specific intrinsics: guardian_check(), policy_guardrails(), factuality_detection(), and factuality_correction(). These are documented in the Safety Guardrails how-to guide.