Functions
FUNC download_mtrag_corpus
target_dir: Location where the file should be written if not already present.corpus_name: Should be one of"cloud","clapnq","fiqa", or"govt".
- Path to the downloaded (or cached) file.
ValueError: Ifcorpus_nameis not one of the supported corpus names.
FUNC read_mtrag_corpus
corpus_file: Location of the corpus data file.
- Documents from the corpus as a PyArrow table, with schema
["id", "url", "title", "text"].
TypeError: If the ID column cannot be identified or if notextcolumn is present in the corpus file.
FUNC download_mtrag_embeddings
embedding_name: Name of the SentenceTransformers embedding model used to create the embeddings.corpus_name: Should be one of"cloud","clapnq","fiqa", or"govt".target_dir: Location where Parquet files named"part_001.parquet","part_002.parquet", etc. will be written.
ValueError: Ifcorpus_nameis not one of the supported corpus names, or if no precomputed embeddings are found for the given corpus and embedding model combination.