### GPU Acceleration Setup Source: https://github.com/qdrant/fastembed/blob/main/README.md Install the GPU-enabled package and configure the model to use CUDA. ```bash pip install fastembed-gpu ``` ```python from fastembed import TextEmbedding embedding_model = TextEmbedding( model_name="BAAI/bge-small-en-v1.5", providers=["CUDAExecutionProvider"] ) print("The model BAAI/bge-small-en-v1.5 is ready to use on a GPU.") ``` -------------------------------- ### Install Qdrant Client with FastEmbed Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Usage_With_Qdrant.ipynb Install the necessary Qdrant client library with FastEmbed support. Use --quiet for silent installation. ```python !pip install 'qdrant-client[fastembed]' --quiet --upgrade ``` -------------------------------- ### Setup CUDA 12.x on Ubuntu 22.04 Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Commands to install the CUDA toolkit on Ubuntu 22.04 using the NVIDIA repository. ```bash wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get -y install cuda ``` -------------------------------- ### Install Dependencies Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb Install the necessary libraries for Qdrant, FastEmbed, datasets, and transformers. ```python !pip install -qU qdrant-client fastembed datasets transformers ``` -------------------------------- ### Install Dependencies Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb Install the necessary Python packages for Qdrant and data handling. ```python !pip install qdrant-client pandas dataset --quiet --upgrade ``` -------------------------------- ### Install necessary libraries Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_vs_HF_Comparison.ipynb Installs matplotlib, transformers, and torch. Use -qq for quieter output. ```python !pip install matplotlib transformers torch -qq ``` -------------------------------- ### Install Dependencies Source: https://github.com/qdrant/fastembed/blob/main/docs/experimental/Binary Quantization from Scratch.ipynb Install the required libraries for data processing and visualization. ```python !pip install matplotlib tqdm pandas numpy datasets --quiet --upgrade ``` -------------------------------- ### Setup CuDNN 9.x on Ubuntu 22.04 Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Commands to install the CuDNN library on Ubuntu 22.04 using the NVIDIA repository. ```bash wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update sudo apt-get -y install cudnn ``` -------------------------------- ### Install FastEmbed Source: https://github.com/qdrant/fastembed/blob/main/README.md Install the FastEmbed library using pip. Use 'fastembed-gpu' for GPU support. ```bash pip install fastembed # or with GPU support pip install fastembed-gpu ``` -------------------------------- ### Install Dependencies Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb Install the necessary libraries for the RAG pipeline. ```bash !pip install -U fastembed datasets qdrant-client peft transformers accelerate bitsandbytes -qq ``` -------------------------------- ### Install FastEmbed Source: https://github.com/qdrant/fastembed/blob/main/docs/index.md Install the FastEmbed library using pip. This command fetches and installs the latest version of the package. ```bash pip install fastembed ``` -------------------------------- ### Install FastEmbed Source: https://github.com/qdrant/fastembed/blob/main/docs/Getting Started.ipynb Install the fastembed package using pip. ```bash !pip install -Uqq fastembed ``` -------------------------------- ### Qdrant Integration Source: https://github.com/qdrant/fastembed/blob/main/README.md Install the Qdrant client with FastEmbed support and use it to index and search documents. ```bash pip install qdrant-client[fastembed] ``` ```bash pip install qdrant-client[fastembed-gpu] ``` ```python from qdrant_client import QdrantClient, models # Initialize the client client = QdrantClient("localhost", port=6333) # For production # client = QdrantClient(":memory:") # For experimentation model_name = "sentence-transformers/all-MiniLM-L6-v2" payload = [ {"document": "Qdrant has Langchain integrations", "source": "Langchain-docs", }, {"document": "Qdrant also has Llama Index integrations", "source": "LlamaIndex-docs"}, ] docs = [models.Document(text=data["document"], model=model_name) for data in payload] ids = [42, 2] client.create_collection( "demo_collection", vectors_config=models.VectorParams( size=client.get_embedding_size(model_name), distance=models.Distance.COSINE) ) client.upload_collection( collection_name="demo_collection", vectors=docs, ids=ids, payload=payload, ) search_result = client.query_points( collection_name="demo_collection", query=models.Document(text="This is a query document", model=model_name) ).points print(search_result) ``` -------------------------------- ### Install Qdrant Client with FastEmbed Support Source: https://github.com/qdrant/fastembed/blob/main/docs/index.md Install the Qdrant client library with FastEmbed integration. This enables seamless use of FastEmbed within Qdrant operations. Note that on zsh, you might need to use quotes around the package name. ```bash pip install qdrant-client[fastembed] ``` ```bash pip install 'qdrant-client[fastembed]' ``` -------------------------------- ### Install FastEmbed dependencies Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb Installs the necessary library for sparse embedding generation. ```python # !pip install -q fastembed ``` -------------------------------- ### Install FastEmbed dependencies Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Retrieval_with_FastEmbed.ipynb Install the necessary package to create embeddings and perform retrieval. ```python # !pip install fastembed --quiet --upgrade ``` -------------------------------- ### Initialize CPU embedding model Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Setup for the TextEmbedding model using the default CPU provider. ```python embedding_model_cpu = TextEmbedding(model_name="BAAI/bge-small-en-v1.5") embedding_model_cpu.model.model.get_providers() ``` -------------------------------- ### Install Pre-Commit Hooks Source: https://github.com/qdrant/fastembed/blob/main/CONTRIBUTING.md Install pre-commit hooks to ensure code is linted before committing. Run this command in the project's root directory. ```bash pre-commit install ``` -------------------------------- ### Install FastEmbed GPU for CUDA 11.x Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Install onnxruntime-gpu configured for CUDA 11.x environments. ```python !pip install onnxruntime-gpu -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/ -qq ``` -------------------------------- ### Install FastEmbed GPU package Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Install the GPU-enabled version of FastEmbed. Ensure that standard fastembed is uninstalled first to avoid conflicts. ```python !pip install fastembed-gpu ``` -------------------------------- ### Quickstart Text Embedding Generation Source: https://github.com/qdrant/fastembed/blob/main/README.md Initialize the default TextEmbedding model and generate embeddings for a list of documents. The model is downloaded and initialized on first use. The embed method returns a generator. ```python from fastembed import TextEmbedding # Example list of documents documents: list[str] = [ "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.", "fastembed is supported by and maintained by Qdrant.", ] # This will trigger the model download and initialization embedding_model = TextEmbedding() print("The model BAAI/bge-small-en-v1.5 is ready to use.") embeddings_generator = embedding_model.embed(documents) # reminder this is a generator embeddings_list = list(embedding_model.embed(documents)) # you can also convert the generator to a list, and that to a numpy array len(embeddings_list[0]) # Vector of 384 dimensions ``` -------------------------------- ### Initialize GPU embedding model Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Setup for the TextEmbedding model using the CUDA execution provider. ```python import numpy as np from fastembed import TextEmbedding embedding_model_gpu = TextEmbedding( model_name="BAAI/bge-small-en-v1.5", providers=["CUDAExecutionProvider"] ) embedding_model_gpu.model.model.get_providers() ``` -------------------------------- ### Get FastEmbed Version Source: https://github.com/qdrant/fastembed/blob/main/CONTRIBUTING.md Run this command to get the exact version of FastEmbed you are using. This is useful for bug reports. ```python import fastembed print(fastembed.__version__) ``` -------------------------------- ### Check FastEmbed Version Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb Verify the installed version of the FastEmbed library. ```python fastembed.__version__ # 0.2.5 ``` -------------------------------- ### Prepare ONNX Inputs and Run Inference Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb Prepares the input data in NumPy format for the ONNX model and runs the inference to get the logits. ```python inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True, max_length=512) inputs = {key: val.to(device) for key, val in inputs.items()} input_ids = inputs["input_ids"] attention_mask = inputs["attention_mask"] token_type_ids = inputs["token_type_ids"] onnx_input = { "input_ids": input_ids.cpu().numpy(), "attention_mask": attention_mask.cpu().numpy(), "token_type_ids": token_type_ids.cpu().numpy(), } logits = model(**onnx_input).logits ``` -------------------------------- ### Install FastEmbed GPU with CuDNN 9.x Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Install CuDNN 9.x and the GPU-enabled FastEmbed package for environments requiring the latest dependencies. ```python !sudo apt install cudnn9 !pip install fastembed-gpu -qqq ``` -------------------------------- ### Display FastEmbed version Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_vs_HF_Comparison.ipynb Checks and displays the installed version of the FastEmbed library. ```python import fastembed fastembed.__version__ ``` -------------------------------- ### List Supported ColBERT Models Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/ColBERT_with_FastEmbed.ipynb Use this method to see all available ColBERT models supported by FastEmbed. Ensure you have the necessary libraries installed. ```python from fastembed import LateInteractionTextEmbedding LateInteractionTextEmbedding.list_supported_models() ``` -------------------------------- ### Load Tokenizer for SPLADE Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb Loads the tokenizer from Hugging Face for use with SPLADE models. Ensure you have the 'transformers' library installed. ```python import json from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained( SparseTextEmbedding.list_supported_models()[0]["sources"]["hf"] ) ``` -------------------------------- ### Install FastEmbed GPU with CuDNN 8.x Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Lock onnxruntime-gpu to version 1.18.0 to maintain compatibility with CuDNN 8.x and CUDA 12.x. ```python !pip install onnxruntime-gpu==1.18.0 -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ -qq !pip install fastembed-gpu -qqq ``` -------------------------------- ### Initialize Environment Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb Import required libraries and set random seeds for reproducibility. ```python import os import random import time import datasets import numpy as np import pandas as pd from qdrant_client import QdrantClient, models random.seed(37) np.random.seed(37) ``` -------------------------------- ### Initialize Qdrant Client Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb Creates an in-memory instance of the Qdrant client. ```python client = QdrantClient(":memory:") ``` -------------------------------- ### Initialize Qdrant Client and Add Documents Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Usage_With_Qdrant.ipynb Initialize an in-memory Qdrant client and add the prepared documents to a specified collection. The 'add' method creates the collection if it doesn't exist. ```python client = QdrantClient(":memory:") client.add(collection_name="test_collection", documents=documents) ``` -------------------------------- ### Usage with Qdrant Client Source: https://github.com/qdrant/fastembed/blob/main/docs/index.md Initialize an in-memory Qdrant client, add documents with metadata and IDs, and perform a query. This demonstrates basic data management and retrieval using Qdrant and FastEmbed. ```python from qdrant_client import QdrantClient # Initialize the client client = QdrantClient(":memory:") # Using an in-process Qdrant # Prepare your documents, metadata, and IDs docs = ["Qdrant has Langchain integrations", "Qdrant also has Llama Index integrations"] metadata = [ {"source": "Langchain-docs"}, {"source": "Llama-index-docs"}, ] olds = [42, 2] client.add( collection_name="demo_collection", documents=docs, metadata=metadata, ids=ids ) search_result = client.query( collection_name="demo_collection", query_text="This is a query document" ) print(search_result) ``` -------------------------------- ### Prepare benchmark documents Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Create a list of strings for performance testing. ```python documents: list[str] = list(np.repeat("Demonstrating GPU acceleration in fastembed", 500)) ``` -------------------------------- ### Import Qdrant Client Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Usage_With_Qdrant.ipynb Import the QdrantClient class for interacting with the Qdrant database. ```python from qdrant_client import QdrantClient ``` -------------------------------- ### Create Qdrant Collection with Binary Quantization Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb Initialize a Qdrant collection configured for binary quantization. ```python client = QdrantClient( # assumes Qdrant is launched at localhost:6333 prefer_grpc=True, ) collection_name = "binary-quantization" client.create_collection( collection_name=collection_name, vectors_config=models.VectorParams( size=n_dim, distance=models.Distance.DOT, on_disk=True, ), quantization_config=models.BinaryQuantization( binary=models.BinaryQuantizationConfig(always_ram=True), ), ) ``` -------------------------------- ### Import Libraries Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb Import required modules for data handling, Qdrant client, and FastEmbed models. ```python import json import numpy as np import pandas as pd from datasets import load_dataset from qdrant_client import QdrantClient from qdrant_client.models import ( Distance, NamedSparseVector, NamedVector, SparseVector, PointStruct, SearchRequest, SparseIndexParams, SparseVectorParams, VectorParams, ScoredPoint, ) from transformers import AutoTokenizer import fastembed from fastembed import SparseEmbedding, SparseTextEmbedding, TextEmbedding ``` -------------------------------- ### Initialize Model and Prepare Input Data Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb Load the Hugging Face model and tokenizer, and define sample multilingual and English input lists. ```python hf_model = AutoModel.from_pretrained(model_id) hf_tokenizer = AutoTokenizer.from_pretrained(model_id) # The input texts can be in any language, not just English. # Each input text should start with "query: " or "passage: ", even for non-English texts. # For tasks other than retrieval, you can simply use the "query: " prefix. input_texts = [ "query: how much protein should a female eat", "query: 南瓜的家常做法", "query: भारत का राष्ट्रीय खेल कौन-सा है?", # Hindi text "query: భారత్ దేశంలో రాష్ట్రపతి ఎవరు?", # Telugu text "query: இந்தியாவின் தேசிய கோப்பை எது?", # Tamil text "query: ಭಾರತದಲ್ಲಿ ರಾಷ್ಟ್ರಪತಿ ಯಾರು?", # Kannada text "query: ഇന്ത്യയുടെ രാഷ്ട്രീയ ഗാനം എന്താണ്?", # Malayalam text ] english_texts = [ "India: Where the Taj Mahal meets spicy curry.", "Machine Learning: Turning data into knowledge, one algorithm at a time.", "Python: The language that makes programming a piece of cake.", "fastembed: Accelerating embeddings for lightning-fast similarity search.", "Qdrant: The ultimate tool for high-dimensional indexing and search.", ] ``` -------------------------------- ### Import Required Libraries Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb Load necessary dependencies for model handling, ONNX optimization, and numerical computation. ```python from pathlib import Path from typing import Any import numpy as np import time from torch import Tensor from transformers import AutoTokenizer, AutoModel from optimum.onnxruntime import AutoOptimizationConfig, ORTModelForFeatureExtraction, ORTOptimizer from optimum.pipelines import pipeline import torch.nn.functional as F ``` -------------------------------- ### Initialize ONNX Pipeline Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb Create an inference pipeline using the optimized ONNX model. ```python onnx_quant_embed = pipeline( "feature-extraction", model=model, accelerator="ort", tokenizer=tokenizer, return_tensors=True ) ``` -------------------------------- ### Get Tokens and Weights from Sparse Embedding Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb This function decodes sparse embedding indices into tokens using a provided tokenizer and returns a dictionary of tokens mapped to their weights, sorted by weight in descending order. It's helpful for analyzing the importance of each token in a document. ```python def get_tokens_and_weights(sparse_embedding, tokenizer): token_weight_dict = {} for i in range(len(sparse_embedding.indices)): token = tokenizer.decode([sparse_embedding.indices[i]]) weight = sparse_embedding.values[i] token_weight_dict[token] = weight # Sort the dictionary by weights token_weight_dict = dict( sorted(token_weight_dict.items(), key=lambda item: item[1], reverse=True) ) return token_weight_dict # Test the function with the first SparseEmbedding print(json.dumps(get_tokens_and_weights(sparse_embeddings_list[index], tokenizer), indent=4)) ``` -------------------------------- ### Run Inference Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb Format the prompt and generate an answer using the LLM. ```python input_prompt = """ Answer the following question based on the context given after it in the same language as the question: ### Question: {} ### Context: {} ### Answer: {}""" input_text = input_prompt.format( questions[idx], # question search_context_text[:2000], # context "", # output - leave this blank for generation! ) inputs = tokenizer([input_text], return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=50, use_cache=True) response = tokenizer.batch_decode(outputs)[0] ``` -------------------------------- ### Model download progress logs Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Console output showing progress for model file downloads. ```text Fetching 5 files: 0%| | 0/5 [00:00" # Get your token from https://huggingface.co/settings/token, needed for Gemma weights ``` -------------------------------- ### Sample query dataset Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb Selects a random subset of indices from an existing dataset to create a query set. ```python query_indices = random.sample(range(len(dataset)), 100) query_dataset = dataset[query_indices] query_indices ``` -------------------------------- ### Initialize Embedding Models Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb Initialize the sparse and dense embedding models. This process triggers the automatic download of the specified models. ```python sparse_model = SparseTextEmbedding(model_name=sparse_model_name, batch_size=32) dense_model = TextEmbedding(model_name=dense_model_name, batch_size=32) ``` -------------------------------- ### Import Libraries Source: https://github.com/qdrant/fastembed/blob/main/docs/experimental/Binary Quantization from Scratch.ipynb Import necessary modules for numerical operations and dataset handling. ```python import numpy as np import pandas as pd from datasets import load_dataset from tqdm import tqdm ``` -------------------------------- ### Load Dataset Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb Load the Hindi and Tamil QA dataset from Hugging Face. ```python ds = load_dataset("nirantk/chaii-hindi-and-tamil-question-answering", split="train") ds ``` -------------------------------- ### Combine Text Fields Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb Create a new column by concatenating product title, text, and bullet points. ```python df["combined_text"] = ( df["product_title"] + "\n" + df["product_text"] + "\n" + df["product_bullet_point"] ) ``` -------------------------------- ### Importing Required Libraries Source: https://github.com/qdrant/fastembed/blob/main/experiments/Throughput_Across_Models.ipynb Imports necessary modules for embedding generation, timing, and data visualization. ```python import time from typing import Callable import torch.nn.functional as F from fastembed import TextEmbedding import matplotlib.pyplot as plt from transformers import AutoModel, AutoTokenizer ``` -------------------------------- ### Print First 5 Features and Weights Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb This snippet prints the first 5 features and their corresponding weights from a sparse embedding list. Useful for a quick look at the most prominent terms. ```python for i in range(5): print( f"Token at index {sparse_embeddings_list[0].indices[i]} has weight {sparse_embeddings_list[0].values[i]}" ) ``` -------------------------------- ### Initialize FastEmbed Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb Initialize the embedding model instance. ```python embedding_model = TextEmbedding(model_name=embedding_model) ``` -------------------------------- ### Troubleshoot CUDA library missing Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Error message indicating the libcublasLt library is missing. ```bash FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.x: cannot open shared object file: No such file or directory ``` -------------------------------- ### Retrieve Collection Information Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb Fetch and display the configuration and status of the created collection. ```python collection_info = client.get_collection(collection_name=f"{collection_name}") collection_info.dict() ``` -------------------------------- ### Initialize the SPLADE model Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb Sets the model name and initializes the SparseTextEmbedding instance, which triggers the model download. ```python model_name = "prithvida/Splade_PP_en_v1" # This triggers the model download model = SparseTextEmbedding(model_name=model_name) ``` -------------------------------- ### Optimize Model for ONNX Runtime Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb Export the model to ONNX format and apply optimization configurations. ```python tokenizer = AutoTokenizer.from_pretrained(model_id) model = ORTModelForFeatureExtraction.from_pretrained(model_id, export=True) # Remove all existing files in the save_dir using Path.unlink() save_dir = Path(save_dir) save_dir.mkdir(parents=True, exist_ok=True) for p in save_dir.iterdir(): p.unlink() # Load the optimization configuration detailing the optimization we wish to apply optimization_config = AutoOptimizationConfig.O4() optimizer = ORTOptimizer.from_pretrained(model) optimizer.optimize( save_dir=save_dir, optimization_config=optimization_config, use_external_data_format=True ) model = ORTModelForFeatureExtraction.from_pretrained(save_dir) tokenizer.save_pretrained(save_dir) # model.save_pretrained(save_dir) # model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True) ``` -------------------------------- ### Load and Convert Embeddings Source: https://github.com/qdrant/fastembed/blob/main/docs/experimental/Binary Quantization from Scratch.ipynb Download dataset from Hugging Face, convert continuous vectors to binary, and determine dimensionality. ```python # Download from Huggingface Hub ds = load_dataset( "Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-100K", split="train" ) openai_vectors = np.array(ds["text-embedding-3-large-3072-embedding"]) del ds ``` ```python openai_bin = np.zeros_like(openai_vectors, dtype=np.int8) openai_bin[openai_vectors > 0] = 1 ``` ```python n_dim = openai_vectors.shape[1] n_dim ``` -------------------------------- ### Tag GPU branch release Source: https://github.com/qdrant/fastembed/blob/main/RELEASE.md Creates a version tag on the gpu branch. ```bash git checkout gpu git tag -a v0.1.0-gpu -m "Release v0.1.0" ``` -------------------------------- ### Run SPLADE Model with ONNX Runtime Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb Loads and runs the SPLADE_PP_en_v1 model using ONNX Runtime. This involves loading the ORTModelForMaskedLM and tokenizer, preparing inputs in NumPy format, and performing inference. ```python from optimum.onnxruntime import ORTModelForMaskedLM model = ORTModelForMaskedLM.from_pretrained("nirantk/SPLADE_PP_en_v1") tokenizer = AutoTokenizer.from_pretrained("nirantk/SPLADE_PP_en_v1") ``` -------------------------------- ### Benchmark Hugging Face Pipeline Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb Measure and print the performance of the standard Hugging Face pipeline. ```python _, _, chars_per_sec = measure_pipeline_time(hf_embed, input_texts=input_texts, model_id=model_id) print(f"Multilingual Speed: {chars_per_sec:.2f} chars/sec") _, _, chars_per_sec = measure_pipeline_time(hf_embed, input_texts=english_texts, model_id=model_id) print(f"English Speed: {chars_per_sec:.2f} chars/sec") ``` -------------------------------- ### Benchmark ONNX Pipeline Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb Measure and print the performance of the optimized ONNX pipeline. ```python _, _, chars_per_sec = measure_pipeline_time(onnx_quant_embed, input_texts) print(f"Multilingual Speed: {chars_per_sec:.2f} chars/sec") _, _, chars_per_sec = measure_pipeline_time(onnx_quant_embed, english_texts) print(f"English Speed: {chars_per_sec:.2f} chars/sec") ``` -------------------------------- ### Import Libraries Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb Import required modules for data handling, embedding, and model inference. ```python import numpy as np from datasets import load_dataset from peft import AutoPeftModelForCausalLM from qdrant_client import QdrantClient from qdrant_client.models import PointStruct, VectorParams, Distance from transformers import AutoTokenizer from fastembed import TextEmbedding ``` -------------------------------- ### Troubleshoot CuDNN library missing Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Error message indicating the libcudnn library is missing. ```bash FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.x: cannot open shared object file: No such file or directory ``` -------------------------------- ### Tag main branch release Source: https://github.com/qdrant/fastembed/blob/main/RELEASE.md Creates a version tag on the main branch. ```bash git checkout main git tag -a v0.1.0 -m "Release v0.1.0" ``` -------------------------------- ### CPU provider result Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Output showing active CPU execution provider. ```text ['CPUExecutionProvider'] ``` -------------------------------- ### Store Vectors in Qdrant Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb Prepare points and upsert them into an in-memory Qdrant collection. ```python context_points = [ PointStruct(id=idx, vector=emb, payload={"text": text}) for idx, (emb, text) in enumerate(zip(context_embeddings, contexts)) ] len(context_points[0].vector) search_client = QdrantClient(":memory:") search_client.create_collection( collection_name="hindi_tamil_contexts", vectors_config=VectorParams(size=len(context_points[0].vector), distance=Distance.COSINE), ) search_client.upsert(collection_name="hindi_tamil_contexts", points=context_points) ``` -------------------------------- ### Run Accuracy Benchmarks Source: https://github.com/qdrant/fastembed/blob/main/docs/experimental/Binary Quantization from Scratch.ipynb Execute accuracy tests across different sampling rates and limits. ```python number_of_samples = 10 limits = [3, 10] sampling_rate = [1, 2, 3, 5] results = [] def mean_accuracy(number_of_samples, limit, sampling_rate): return np.mean( [accuracy(i, limit=limit, oversampling=sampling_rate) for i in range(number_of_samples)] ) for i in tqdm(sampling_rate): for j in tqdm(limits): result = { "sampling_rate": i, "limit": j, "mean_acc": mean_accuracy(number_of_samples, j, i), } print(result) results.append(result) ``` -------------------------------- ### Hugging Face Token Initialization Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb Initializes a Hugging Face token variable. Replace '' with your actual token. ```python hf_token = "" ``` -------------------------------- ### Import FastEmbed libraries Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Retrieval_with_FastEmbed.ipynb Import the required classes for embedding generation. ```python import numpy as np from fastembed import TextEmbedding ``` -------------------------------- ### Import required libraries Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_vs_HF_Comparison.ipynb Imports libraries for time tracking, type hinting, plotting, PyTorch functions, Huggingface models, and FastEmbed. ```python import time from typing import Callable import matplotlib.pyplot as plt import torch.nn.functional as F from torch import Tensor from transformers import AutoModel, AutoTokenizer from fastembed import TextEmbedding ``` -------------------------------- ### Run SPLADE Model with PyTorch Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb Loads and runs the SPLADE_PP_en_v1 model using PyTorch for generating sparse embeddings. It handles device placement (CUDA or CPU), tokenization, model inference, and transforms the output logits into a sparse vector representation. ```python # Download the model and tokenizer device = "cuda:0" if torch.cuda.is_available() else "cpu" tokenizer = AutoTokenizer.from_pretrained("prithivida/Splade_PP_en_v1", token=hf_token) reverse_voc = {v: k for k, v in tokenizer.vocab.items()} model = AutoModelForMaskedLM.from_pretrained("prithivida/Splade_PP_en_v1", token=hf_token) model.to(device) # Tokenize the input inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True, max_length=512) inputs = {key: val.to(device) for key, val in inputs.items()} input_ids = inputs["input_ids"] attention_mask = inputs["attention_mask"] token_type_ids = inputs["token_type_ids"] # Run model and prepare sparse vector outputs = model(**inputs) logits = outputs.logits print("Output Logits shape: ", logits.shape) print("Output Attention mask shape: ", attention_mask.shape) relu_log = torch.log(1 + torch.relu(logits)) weighted_log = relu_log * attention_mask.unsqueeze(-1) max_val, _ = torch.max(weighted_log, dim=1) vector = max_val.squeeze() print("Sparse Vector shape: ", vector.shape) # print("Number of Actual Dimensions: ", len(cols)) cols = [vec.nonzero().squeeze().cpu().tolist() for vec in vector] weights = [vec[col].cpu().tolist() for vec, col in zip(vector, cols)] idx = 1 cols, weights = cols[idx], weights[idx] # Print the BOW representation d = {k: v for k, v in zip(cols, weights)} sorted_d = {k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse=True)} bow_rep = [] for k, v in sorted_d.items(): bow_rep.append((reverse_voc[k], round(v, 2))) print(f"SPLADE BOW rep for sentence:\t{sentences[idx]}\n{bow_rep}") ``` -------------------------------- ### Define Benchmarking Utility Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb Create a function to measure the performance of embedding pipelines in characters per second. ```python def measure_pipeline_time( pipeline, input_texts: list[str], num_runs=10, **kwargs: Any ) -> tuple[float, float]: """Measures the time it takes to run the pipeline on the input texts.""" times = [] total_chars = sum(len(text) for text in input_texts) for _ in range(num_runs): start_time = time.time() _ = pipeline(inputs=input_texts, **kwargs) end_time = time.time() times.append(end_time - start_time) mean_time = np.mean(times) std_dev = np.std(times) chars_per_second = total_chars / mean_time return mean_time, std_dev, chars_per_second ``` -------------------------------- ### Generate Image Embeddings Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Image_Embedding.ipynb Initializes the ImageEmbedding model and processes a list of image file paths to generate embeddings. ```python from fastembed import ImageEmbedding model = ImageEmbedding("Qdrant/resnet50-onnx") embeddings_generator = model.embed( ["../../tests/misc/image.jpeg", "../../tests/misc/small_image.jpeg"] ) embeddings_list = list(embeddings_generator) embeddings_list ``` -------------------------------- ### Sparse Text Embedding with SPLADE++ Source: https://github.com/qdrant/fastembed/blob/main/README.md Initialize and use `SparseTextEmbedding` with the 'prithivida/Splade_PP_en_v1' model to generate sparse embeddings. The output is a list of SparseEmbedding objects. ```python from fastembed import SparseTextEmbedding model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1") embeddings = list(model.embed(documents)) # [ # SparseEmbedding(indices=[ 17, 123, 919, ... ], values=[0.71, 0.22, 0.39, ...]), # SparseEmbedding(indices=[ 38, 12, 91, ... ], values=[0.11, 0.22, 0.39, ...]) # ] ``` -------------------------------- ### Configure CUDA library path Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Add the CUDA library directory to the LD_LIBRARY_PATH environment variable to ensure proper linking. ```bash LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH ``` -------------------------------- ### Define Documents and Queries Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/ColBERT_with_FastEmbed.ipynb Prepare lists of documents and queries for embedding. These are standard Python lists of strings. ```python documents = [ "ColBERT is a late interaction text embedding model, however, there are also other models such as TwinBERT.", "On the contrary to the late interaction models, the early interaction models contains interaction steps at embedding generation process", ] queries = [ "Are there any other late interaction text embedding models except ColBERT?", "What is the difference between late interaction and early interaction text embedding models?", ] ``` -------------------------------- ### Push tags to remote Source: https://github.com/qdrant/fastembed/blob/main/RELEASE.md Pushes all local tags to the remote repository. ```bash git push --tags ``` -------------------------------- ### Rebase GPU branch on main Source: https://github.com/qdrant/fastembed/blob/main/RELEASE.md Updates the gpu branch with the latest changes from main. ```bash git checkout gpu git rebase main git push -f origin gpu ``` -------------------------------- ### Import FastEmbed classes Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb Imports the required classes for working with sparse text embeddings. ```python from fastembed import SparseTextEmbedding, SparseEmbedding ``` -------------------------------- ### Initialize FastEmbed TextEmbedding Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_vs_HF_Comparison.ipynb Initializes the FastEmbed TextEmbedding model using a specified model name. This uses the default Flag Embedding model. ```python embedding_model = TextEmbedding(model_name=model_id) ``` -------------------------------- ### Hugging Face authentication warning Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Warning output regarding missing HF_TOKEN. ```text /usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks. Please note that authentication is recommended but still optional to access public models or datasets. warnings.warn( ``` -------------------------------- ### Load Dataset Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb Load the dbpedia-entities dataset from HuggingFace. ```python dataset = datasets.load_dataset( "Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K", split="train" ) len(dataset) ``` -------------------------------- ### List Supported Image Models Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Image_Embedding.ipynb Retrieves a list of all available image embedding models supported by the library. ```python ImageEmbedding.list_supported_models() ``` -------------------------------- ### List supported sparse embedding models Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb Retrieves and displays all available sparse embedding models supported by the library. ```python SparseTextEmbedding.list_supported_models() ``` -------------------------------- ### CPU benchmark output Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Performance result for CPU embedding. ```text 4.33 s ± 591 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` -------------------------------- ### Display Ranked Results Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/ColBERT_with_FastEmbed.ipynb Prints the query and the corresponding ranked documents. ```python print(f"Query: {queries[0]}") for index in sorted_indices: print(f"Document: {documents[index]}") ``` -------------------------------- ### Load Custom Models Source: https://github.com/qdrant/fastembed/blob/main/docs/Getting Started.ipynb Specify a different model identifier to use models other than the default. ```python multilingual_large_model = TextEmbedding("intfloat/multilingual-e5-large") ``` -------------------------------- ### Search Context Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb Retrieve relevant context for a selected question. ```python idx = 997 question = questions[idx] print(question) search_context = search_client.search( query_vector=embed_text(question), collection_name="hindi_tamil_contexts", limit=2 ) search_context_text = search_context[0].payload["text"] len(search_context_text) ``` -------------------------------- ### Import Libraries for FastEmbed Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb Imports necessary libraries including numpy, torch, and components from the transformers library for model operations. ```python import numpy as np import torch from transformers import AutoModelForMaskedLM, AutoTokenizer ``` -------------------------------- ### Upload Data to Collection Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb Iterate through the dataset and upload vectors and payloads to the Qdrant collection. ```python def iter_dataset(dataset): for point in dataset: yield point["openai"], {"text": point["text"]} vectors, payload = zip(*iter_dataset(dataset)) client.upload_collection( collection_name=collection_name, vectors=vectors, payload=payload, parallel=max(1, (os.cpu_count() // 2)), ) ``` -------------------------------- ### Compare query embeddings Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Retrieval_with_FastEmbed.ipynb Inspect the generated embeddings from query_embed and embed methods. ```python query_embedding[:5], plain_query_embedding[:5] ``` -------------------------------- ### Benchmark CPU performance Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Measure execution time for CPU-based embedding. ```python %%timeit list(embedding_model_cpu.embed(documents)) ``` -------------------------------- ### Define Documents for Embedding Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Usage_With_Qdrant.ipynb Prepare a list of string documents that will be embedded and stored. Each string represents a document. ```python # Example list of documents documents: list[str] = [ "Maharana Pratap was a Rajput warrior king from Mewar", "He fought against the Mughal Empire led by Akbar", "The Battle of Haldighati in 1576 was his most famous battle", "He refused to submit to Akbar and continued guerrilla warfare", "His capital was Chittorgarh, which he lost to the Mughals", "He died in 1597 at the age of 57", "Maharana Pratap is considered a symbol of Rajput resistance against foreign rule", "His legacy is celebrated in Rajasthan through festivals and monuments", "He had 11 wives and 17 sons, including Amar Singh I who succeeded him as ruler of Mewar", "His life has been depicted in various films, TV shows, and books", ] ``` -------------------------------- ### GPU benchmark output Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Performance result for GPU embedding. ```text 43.4 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` -------------------------------- ### Multi-GPU Text Embedding with Fastembed Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_Multi_GPU.ipynb Initialize a TextEmbedding model to utilize multiple GPUs. Ensure `parallel` matches the number of `device_ids` and `lazy_load=True` to prevent redundant memory usage. ```python from fastembed import TextEmbedding # define the documents to embed docs = ["hello world", "flag embedding"] * 100 # define gpu ids device_ids = [0, 1] if __name__ == "__main__": # initialize a TextEmbedding model using CUDA text_model = TextEmbedding( model_name="sentence-transformers/all-MiniLM-L6-v2", cuda=True, device_ids=device_ids, lazy_load=True, ) # generate embeddings text_embeddings = list(text_model.embed(docs, batch_size=2, parallel=len(device_ids))) print(text_embeddings) ``` -------------------------------- ### List Supported Image Embedding Models Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Supported_Models.ipynb Retrieves and displays a sorted list of supported image embedding models. The output is formatted as a pandas DataFrame, sorted by size in ascending order. ```python ( pd.DataFrame(ImageEmbedding.list_supported_models()) .sort_values("size_in_GB") .drop(columns=["sources", "model_file"]) .reset_index(drop=True) ) ``` -------------------------------- ### Inspect Sparse Embedding Tokens and Weights Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb Displays the token-weight mapping for a specific sparse embedding. ```python print(json.dumps(get_tokens_and_weights(sparse_embedding[0], sparse_model_name), indent=4)) ``` -------------------------------- ### Troubleshoot CUDA path configuration Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Error message indicating a failure to map segments from the shared object. ```bash FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcufft.so.x: failed to map segment from shared object ``` -------------------------------- ### Compress Directory to Tar.gz Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb Compresses a given directory into a .tar.gz file. It checks if an output file already exists and prevents overwriting. Ensure the directory exists before calling. ```python import os from pathlib import Path import tarfile save_dir = Path("../local_cache/fast-bge-small-en-v1.5") def compress(directory_path): directory_path = Path(directory_path) assert directory_path.exists(), f"{directory_path} does not exist" output_filename = directory_path.name + ".tar.gz" if Path(output_filename).exists(): print("We've an output file already? Manually delete that first") return output_filename with tarfile.open(output_filename, "w:gz") as tar: tar.add(directory_path, arcname=os.path.basename(directory_path)) return output_filename compressed_file_name = compress(save_dir) ``` -------------------------------- ### Dense Text Embedding with Specific Model Source: https://github.com/qdrant/fastembed/blob/main/README.md Initialize TextEmbedding with a specific model name, such as 'BAAI/bge-small-en-v1.5', and generate embeddings. The output is a list of numpy arrays. ```python from fastembed import TextEmbedding model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5") embeddings = list(model.embed(documents)) # [ # array([-0.1115, 0.0097, 0.0052, 0.0195, ...], dtype=float32), # array([-0.1019, 0.0635, -0.0332, 0.0522, ...], dtype=float32) # ] ``` -------------------------------- ### Execution provider result Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb Output showing active execution providers. ```text ['CUDAExecutionProvider', 'CPUExecutionProvider'] ``` -------------------------------- ### Export PyTorch Model to ONNX Source: https://github.com/qdrant/fastembed/blob/main/experiments/Example. Convert Resnet50 to ONNX.ipynb Exports a pre-trained ResNet-50 model (without the final classification layer) to ONNX format. Ensure the 'example.jpg' file exists for input preprocessing. The model is configured for dynamic batch sizes. ```python import torch import torch.onnx import torchvision.models as models import torchvision.transforms as transforms from PIL import Image import numpy as np from tests.config import TEST_MISC_DIR # Load pre-trained ResNet-50 model resnet = models.resnet50(pretrained=True) resnet = torch.nn.Sequential(*(list(resnet.children())[:-1])) # Remove the last fully connected layer resnet.eval() # Define preprocessing transform preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) # Load and preprocess the image def preprocess_image(image_path): input_image = Image.open(image_path) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # Add batch dimension return input_batch # Example input for exporting input_image = preprocess_image('example.jpg') # Export the model to ONNX with dynamic axes torch.onnx.export( resnet, input_image, "model.onnx", export_params=True, opset_version=9, input_names=['input'], output_names=['output'], dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}} ) ``` -------------------------------- ### List Supported Sparse Text Embedding Models Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Supported_Models.ipynb Retrieves and displays a sorted list of supported sparse text embedding models. The output is formatted as a pandas DataFrame, sorted by size in ascending order. ```python ( pd.DataFrame(SparseTextEmbedding.list_supported_models()) .sort_values("size_in_GB") .drop(columns=["sources", "model_file", "additional_files"]) .reset_index(drop=True) ) ``` -------------------------------- ### Generate Text Embeddings with FastEmbed Source: https://github.com/qdrant/fastembed/blob/main/docs/index.md Import the TextEmbedding class and use it to generate embeddings for a list of documents. Ensure numpy is imported as np. ```python from fastembed import TextEmbedding documents: list[str] = [ "passage: Hello, World!", "query: Hello, World!", "passage: This is an example passage.", "fastembed is supported by and maintained by Qdrant." ] embedding_model = TextEmbedding() embeddings: list[np.ndarray] = embedding_model.embed(documents) ``` -------------------------------- ### Display Results Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb Extract the generated answer and compare it with the ground truth. ```python response.split(sep="### Answer:")[-1].strip("").strip() ds[idx]["answer_text"] ```