### GPU Acceleration Setup

Source: https://github.com/qdrant/fastembed/blob/main/README.md

Install the GPU-enabled package and configure the model to use CUDA.

```bash
pip install fastembed-gpu
```

```python
from fastembed import TextEmbedding

embedding_model = TextEmbedding(
    model_name="BAAI/bge-small-en-v1.5", 
    providers=["CUDAExecutionProvider"]
)
print("The model BAAI/bge-small-en-v1.5 is ready to use on a GPU.")
```

--------------------------------

### Install Qdrant Client with FastEmbed

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Usage_With_Qdrant.ipynb

Install the necessary Qdrant client library with FastEmbed support. Use --quiet for silent installation.

```python
!pip install 'qdrant-client[fastembed]' --quiet --upgrade
```

--------------------------------

### Setup CUDA 12.x on Ubuntu 22.04

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Commands to install the CUDA toolkit on Ubuntu 22.04 using the NVIDIA repository.

```bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda
```

--------------------------------

### Install Dependencies

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb

Install the necessary libraries for Qdrant, FastEmbed, datasets, and transformers.

```python
!pip install -qU qdrant-client fastembed datasets transformers
```

--------------------------------

### Install Dependencies

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb

Install the necessary Python packages for Qdrant and data handling.

```python
!pip install qdrant-client pandas dataset --quiet --upgrade
```

--------------------------------

### Install necessary libraries

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_vs_HF_Comparison.ipynb

Installs matplotlib, transformers, and torch. Use -qq for quieter output.

```python
!pip install matplotlib transformers torch -qq
```

--------------------------------

### Install Dependencies

Source: https://github.com/qdrant/fastembed/blob/main/docs/experimental/Binary Quantization from Scratch.ipynb

Install the required libraries for data processing and visualization.

```python
!pip install matplotlib tqdm pandas numpy datasets --quiet --upgrade
```

--------------------------------

### Setup CuDNN 9.x on Ubuntu 22.04

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Commands to install the CuDNN library on Ubuntu 22.04 using the NVIDIA repository.

```bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cudnn
```

--------------------------------

### Install FastEmbed

Source: https://github.com/qdrant/fastembed/blob/main/README.md

Install the FastEmbed library using pip. Use 'fastembed-gpu' for GPU support.

```bash
pip install fastembed

# or with GPU support

pip install fastembed-gpu
```

--------------------------------

### Install Dependencies

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Install the necessary libraries for the RAG pipeline.

```bash
!pip install -U fastembed datasets qdrant-client peft transformers accelerate bitsandbytes -qq
```

--------------------------------

### Install FastEmbed

Source: https://github.com/qdrant/fastembed/blob/main/docs/index.md

Install the FastEmbed library using pip. This command fetches and installs the latest version of the package.

```bash
pip install fastembed
```

--------------------------------

### Install FastEmbed

Source: https://github.com/qdrant/fastembed/blob/main/docs/Getting Started.ipynb

Install the fastembed package using pip.

```bash
!pip install -Uqq fastembed
```

--------------------------------

### Qdrant Integration

Source: https://github.com/qdrant/fastembed/blob/main/README.md

Install the Qdrant client with FastEmbed support and use it to index and search documents.

```bash
pip install qdrant-client[fastembed]
```

```bash
pip install qdrant-client[fastembed-gpu]
```

```python
from qdrant_client import QdrantClient, models

# Initialize the client
client = QdrantClient("localhost", port=6333) # For production
# client = QdrantClient(":memory:") # For experimentation

model_name = "sentence-transformers/all-MiniLM-L6-v2"
payload = [
    {"document": "Qdrant has Langchain integrations", "source": "Langchain-docs", },
    {"document": "Qdrant also has Llama Index integrations", "source": "LlamaIndex-docs"},
]
docs = [models.Document(text=data["document"], model=model_name) for data in payload]
ids = [42, 2]

client.create_collection(
    "demo_collection",
    vectors_config=models.VectorParams(
        size=client.get_embedding_size(model_name), distance=models.Distance.COSINE)
)

client.upload_collection(
    collection_name="demo_collection",
    vectors=docs,
    ids=ids,
    payload=payload,
)

search_result = client.query_points(
    collection_name="demo_collection",
    query=models.Document(text="This is a query document", model=model_name)
).points
print(search_result)
```

--------------------------------

### Install Qdrant Client with FastEmbed Support

Source: https://github.com/qdrant/fastembed/blob/main/docs/index.md

Install the Qdrant client library with FastEmbed integration. This enables seamless use of FastEmbed within Qdrant operations. Note that on zsh, you might need to use quotes around the package name.

```bash
pip install qdrant-client[fastembed]
```

```bash
pip install 'qdrant-client[fastembed]'
```

--------------------------------

### Install FastEmbed dependencies

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb

Installs the necessary library for sparse embedding generation.

```python
# !pip install -q fastembed
```

--------------------------------

### Install FastEmbed dependencies

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Retrieval_with_FastEmbed.ipynb

Install the necessary package to create embeddings and perform retrieval.

```python
# !pip install fastembed --quiet --upgrade
```

--------------------------------

### Initialize CPU embedding model

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Setup for the TextEmbedding model using the default CPU provider.

```python
embedding_model_cpu = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embedding_model_cpu.model.model.get_providers()
```

--------------------------------

### Install Pre-Commit Hooks

Source: https://github.com/qdrant/fastembed/blob/main/CONTRIBUTING.md

Install pre-commit hooks to ensure code is linted before committing. Run this command in the project's root directory.

```bash
pre-commit install
```

--------------------------------

### Install FastEmbed GPU for CUDA 11.x

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Install onnxruntime-gpu configured for CUDA 11.x environments.

```python
!pip install onnxruntime-gpu -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/ -qq
```

--------------------------------

### Install FastEmbed GPU package

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Install the GPU-enabled version of FastEmbed. Ensure that standard fastembed is uninstalled first to avoid conflicts.

```python
!pip install fastembed-gpu
```

--------------------------------

### Quickstart Text Embedding Generation

Source: https://github.com/qdrant/fastembed/blob/main/README.md

Initialize the default TextEmbedding model and generate embeddings for a list of documents. The model is downloaded and initialized on first use. The embed method returns a generator.

```python
from fastembed import TextEmbedding


# Example list of documents
documents: list[str] = [
    "This is built to be faster and lighter than other embedding libraries e.g. Transformers, Sentence-Transformers, etc.",
    "fastembed is supported by and maintained by Qdrant.",
]

# This will trigger the model download and initialization
embedding_model = TextEmbedding()
print("The model BAAI/bge-small-en-v1.5 is ready to use.")

embeddings_generator = embedding_model.embed(documents)  # reminder this is a generator
embeddings_list = list(embedding_model.embed(documents))
  # you can also convert the generator to a list, and that to a numpy array
len(embeddings_list[0]) # Vector of 384 dimensions
```

--------------------------------

### Initialize GPU embedding model

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Setup for the TextEmbedding model using the CUDA execution provider.

```python
import numpy as np

from fastembed import TextEmbedding

embedding_model_gpu = TextEmbedding(
    model_name="BAAI/bge-small-en-v1.5", providers=["CUDAExecutionProvider"]
)
embedding_model_gpu.model.model.get_providers()
```

--------------------------------

### Get FastEmbed Version

Source: https://github.com/qdrant/fastembed/blob/main/CONTRIBUTING.md

Run this command to get the exact version of FastEmbed you are using. This is useful for bug reports.

```python
import fastembed
print(fastembed.__version__)
```

--------------------------------

### Check FastEmbed Version

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb

Verify the installed version of the FastEmbed library.

```python
fastembed.__version__  # 0.2.5
```

--------------------------------

### Prepare ONNX Inputs and Run Inference

Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb

Prepares the input data in NumPy format for the ONNX model and runs the inference to get the logits.

```python
inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {key: val.to(device) for key, val in inputs.items()}
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]
token_type_ids = inputs["token_type_ids"]

onnx_input = {
    "input_ids": input_ids.cpu().numpy(),
    "attention_mask": attention_mask.cpu().numpy(),
    "token_type_ids": token_type_ids.cpu().numpy(),
}

logits = model(**onnx_input).logits
```

--------------------------------

### Install FastEmbed GPU with CuDNN 9.x

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Install CuDNN 9.x and the GPU-enabled FastEmbed package for environments requiring the latest dependencies.

```python
!sudo apt install cudnn9
!pip install fastembed-gpu -qqq
```

--------------------------------

### Display FastEmbed version

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_vs_HF_Comparison.ipynb

Checks and displays the installed version of the FastEmbed library.

```python
import fastembed

fastembed.__version__
```

--------------------------------

### List Supported ColBERT Models

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/ColBERT_with_FastEmbed.ipynb

Use this method to see all available ColBERT models supported by FastEmbed. Ensure you have the necessary libraries installed.

```python
from fastembed import LateInteractionTextEmbedding

LateInteractionTextEmbedding.list_supported_models()
```

--------------------------------

### Load Tokenizer for SPLADE

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb

Loads the tokenizer from Hugging Face for use with SPLADE models. Ensure you have the 'transformers' library installed.

```python
import json
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    SparseTextEmbedding.list_supported_models()[0]["sources"]["hf"]
)
```

--------------------------------

### Install FastEmbed GPU with CuDNN 8.x

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Lock onnxruntime-gpu to version 1.18.0 to maintain compatibility with CuDNN 8.x and CUDA 12.x.

```python
!pip install onnxruntime-gpu==1.18.0 -i https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ -qq
!pip install fastembed-gpu -qqq
```

--------------------------------

### Initialize Environment

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb

Import required libraries and set random seeds for reproducibility.

```python
import os
import random
import time

import datasets
import numpy as np
import pandas as pd
from qdrant_client import QdrantClient, models

random.seed(37)
np.random.seed(37)
```

--------------------------------

### Initialize Qdrant Client

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb

Creates an in-memory instance of the Qdrant client.

```python
client = QdrantClient(":memory:")
```

--------------------------------

### Initialize Qdrant Client and Add Documents

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Usage_With_Qdrant.ipynb

Initialize an in-memory Qdrant client and add the prepared documents to a specified collection. The 'add' method creates the collection if it doesn't exist.

```python
client = QdrantClient(":memory:")
client.add(collection_name="test_collection", documents=documents)
```

--------------------------------

### Usage with Qdrant Client

Source: https://github.com/qdrant/fastembed/blob/main/docs/index.md

Initialize an in-memory Qdrant client, add documents with metadata and IDs, and perform a query. This demonstrates basic data management and retrieval using Qdrant and FastEmbed.

```python
from qdrant_client import QdrantClient

# Initialize the client
client = QdrantClient(":memory:")  # Using an in-process Qdrant

# Prepare your documents, metadata, and IDs
docs = ["Qdrant has Langchain integrations", "Qdrant also has Llama Index integrations"]
metadata = [
    {"source": "Langchain-docs"},
    {"source": "Llama-index-docs"},
]
olds = [42, 2]

client.add(
    collection_name="demo_collection",
    documents=docs,
    metadata=metadata,
    ids=ids
)

search_result = client.query(
    collection_name="demo_collection",
    query_text="This is a query document"
)
print(search_result)
```

--------------------------------

### Prepare benchmark documents

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Create a list of strings for performance testing.

```python
documents: list[str] = list(np.repeat("Demonstrating GPU acceleration in fastembed", 500))
```

--------------------------------

### Import Qdrant Client

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Usage_With_Qdrant.ipynb

Import the QdrantClient class for interacting with the Qdrant database.

```python
from qdrant_client import QdrantClient
```

--------------------------------

### Create Qdrant Collection with Binary Quantization

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb

Initialize a Qdrant collection configured for binary quantization.

```python
client = QdrantClient(  # assumes Qdrant is launched at localhost:6333
    prefer_grpc=True,
)

collection_name = "binary-quantization"

client.create_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=n_dim,
        distance=models.Distance.DOT,
        on_disk=True,
    ),
    quantization_config=models.BinaryQuantization(
        binary=models.BinaryQuantizationConfig(always_ram=True),
    ),
)
```

--------------------------------

### Import Libraries

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb

Import required modules for data handling, Qdrant client, and FastEmbed models.

```python
import json

import numpy as np
import pandas as pd
from datasets import load_dataset
from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance,
    NamedSparseVector,
    NamedVector,
    SparseVector,
    PointStruct,
    SearchRequest,
    SparseIndexParams,
    SparseVectorParams,
    VectorParams,
    ScoredPoint,
)
from transformers import AutoTokenizer

import fastembed
from fastembed import SparseEmbedding, SparseTextEmbedding, TextEmbedding
```

--------------------------------

### Initialize Model and Prepare Input Data

Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb

Load the Hugging Face model and tokenizer, and define sample multilingual and English input lists.

```python
hf_model = AutoModel.from_pretrained(model_id)
hf_tokenizer = AutoTokenizer.from_pretrained(model_id)

# The input texts can be in any language, not just English.
# Each input text should start with "query: " or "passage: ", even for non-English texts.
# For tasks other than retrieval, you can simply use the "query: " prefix.
input_texts = [
    "query: how much protein should a female eat",
    "query: 南瓜的家常做法",
    "query: भारत का राष्ट्रीय खेल कौन-सा है?",  # Hindi text
    "query: భారత్ దేశంలో రాష్ట్రపతి ఎవరు?",  # Telugu text
    "query: இந்தியாவின் தேசிய கோப்பை எது?",  # Tamil text
    "query: ಭಾರತದಲ್ಲಿ ರಾಷ್ಟ್ರಪತಿ ಯಾರು?",  # Kannada text
    "query: ഇന്ത്യയുടെ രാഷ്ട്രീയ ഗാനം എന്താണ്?",  # Malayalam text
]

english_texts = [
    "India: Where the Taj Mahal meets spicy curry.",
    "Machine Learning: Turning data into knowledge, one algorithm at a time.",
    "Python: The language that makes programming a piece of cake.",
    "fastembed: Accelerating embeddings for lightning-fast similarity search.",
    "Qdrant: The ultimate tool for high-dimensional indexing and search.",
]
```

--------------------------------

### Import Required Libraries

Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb

Load necessary dependencies for model handling, ONNX optimization, and numerical computation.

```python
from pathlib import Path
from typing import Any

import numpy as np
import time
from torch import Tensor
from transformers import AutoTokenizer, AutoModel

from optimum.onnxruntime import AutoOptimizationConfig, ORTModelForFeatureExtraction, ORTOptimizer
from optimum.pipelines import pipeline
import torch.nn.functional as F
```

--------------------------------

### Initialize ONNX Pipeline

Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb

Create an inference pipeline using the optimized ONNX model.

```python
onnx_quant_embed = pipeline(
    "feature-extraction", model=model, accelerator="ort", tokenizer=tokenizer, return_tensors=True
)
```

--------------------------------

### Get Tokens and Weights from Sparse Embedding

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb

This function decodes sparse embedding indices into tokens using a provided tokenizer and returns a dictionary of tokens mapped to their weights, sorted by weight in descending order. It's helpful for analyzing the importance of each token in a document.

```python
def get_tokens_and_weights(sparse_embedding, tokenizer):
    token_weight_dict = {}
    for i in range(len(sparse_embedding.indices)):
        token = tokenizer.decode([sparse_embedding.indices[i]])
        weight = sparse_embedding.values[i]
        token_weight_dict[token] = weight

    # Sort the dictionary by weights
    token_weight_dict = dict(
        sorted(token_weight_dict.items(), key=lambda item: item[1], reverse=True)
    )
    return token_weight_dict


# Test the function with the first SparseEmbedding
print(json.dumps(get_tokens_and_weights(sparse_embeddings_list[index], tokenizer), indent=4))
```

--------------------------------

### Run Inference

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Format the prompt and generate an answer using the LLM.

```python
input_prompt = """
Answer the following question based on the context given after it in the same language as the question:
### Question:
{}

### Context:
{}

### Answer:
{}"""

input_text = input_prompt.format(
    questions[idx],  # question
    search_context_text[:2000],  # context
    "",  # output - leave this blank for generation!
)

inputs = tokenizer([input_text], return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=50, use_cache=True)
response = tokenizer.batch_decode(outputs)[0]
```

--------------------------------

### Model download progress logs

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Console output showing progress for model file downloads.

```text
Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]
```

```text
tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]
```

```text
config.json:   0%|          | 0.00/706 [00:00<?, ?B/s]
```

```text
special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]
```

```text
tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]
```

```text
model_optimized.onnx:   0%|          | 0.00/66.5M [00:00<?, ?B/s]
```

--------------------------------

### Create Qdrant Collection with Dense and Sparse Vectors

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb

Use this to create a Qdrant collection supporting both dense and sparse vector types. Configure vector sizes and distance metrics as needed.

```python
collection_name = "esci"
client.create_collection(
    collection_name,
    vectors_config={
        "text-dense": VectorParams(
            size=1024,  # OpenAI Embeddings
            distance=Distance.COSINE,
        )
    },
    sparse_vectors_config={
        "text-sparse": SparseVectorParams(
            index=SparseIndexParams(
                on_disk=False,
            )
        )
    },
)
```

--------------------------------

### Load Navarasa LLM

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Download and load the Navarasa LLM using PEFT.

```python
model = AutoPeftModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=False,
    token=hf_token,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
```

--------------------------------

### Configure Authentication

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Set the Hugging Face token required for downloading model weights.

```python
hf_token = "<YOUR_HF_TOKEN_HERE>"  # Get your token from https://huggingface.co/settings/token, needed for Gemma weights
```

--------------------------------

### Sample query dataset

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb

Selects a random subset of indices from an existing dataset to create a query set.

```python
query_indices = random.sample(range(len(dataset)), 100)
query_dataset = dataset[query_indices]
query_indices
```

--------------------------------

### Initialize Embedding Models

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb

Initialize the sparse and dense embedding models. This process triggers the automatic download of the specified models.

```python
sparse_model = SparseTextEmbedding(model_name=sparse_model_name, batch_size=32)
dense_model = TextEmbedding(model_name=dense_model_name, batch_size=32)
```

--------------------------------

### Import Libraries

Source: https://github.com/qdrant/fastembed/blob/main/docs/experimental/Binary Quantization from Scratch.ipynb

Import necessary modules for numerical operations and dataset handling.

```python
import numpy as np
import pandas as pd
from datasets import load_dataset
from tqdm import tqdm
```

--------------------------------

### Load Dataset

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Load the Hindi and Tamil QA dataset from Hugging Face.

```python
ds = load_dataset("nirantk/chaii-hindi-and-tamil-question-answering", split="train")
ds
```

--------------------------------

### Combine Text Fields

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb

Create a new column by concatenating product title, text, and bullet points.

```python
df["combined_text"] = (
    df["product_title"] + "\n" + df["product_text"] + "\n" + df["product_bullet_point"]
)
```

--------------------------------

### Importing Required Libraries

Source: https://github.com/qdrant/fastembed/blob/main/experiments/Throughput_Across_Models.ipynb

Imports necessary modules for embedding generation, timing, and data visualization.

```python
import time
from typing import Callable

import torch.nn.functional as F
from fastembed import TextEmbedding
import matplotlib.pyplot as plt
from transformers import AutoModel, AutoTokenizer
```

--------------------------------

### Print First 5 Features and Weights

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb

This snippet prints the first 5 features and their corresponding weights from a sparse embedding list. Useful for a quick look at the most prominent terms.

```python
for i in range(5):
    print(
        f"Token at index {sparse_embeddings_list[0].indices[i]} has weight {sparse_embeddings_list[0].values[i]}"
    )
```

--------------------------------

### Initialize FastEmbed

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Initialize the embedding model instance.

```python
embedding_model = TextEmbedding(model_name=embedding_model)
```

--------------------------------

### Troubleshoot CUDA library missing

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Error message indicating the libcublasLt library is missing.

```bash
FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.x: cannot open shared object file: No such file or directory
```

--------------------------------

### Retrieve Collection Information

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb

Fetch and display the configuration and status of the created collection.

```python
collection_info = client.get_collection(collection_name=f"{collection_name}")
collection_info.dict()
```

--------------------------------

### Initialize the SPLADE model

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb

Sets the model name and initializes the SparseTextEmbedding instance, which triggers the model download.

```python
model_name = "prithvida/Splade_PP_en_v1"
# This triggers the model download
model = SparseTextEmbedding(model_name=model_name)
```

--------------------------------

### Optimize Model for ONNX Runtime

Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb

Export the model to ONNX format and apply optimization configurations.

```python
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForFeatureExtraction.from_pretrained(model_id, export=True)

# Remove all existing files in the save_dir using Path.unlink()
save_dir = Path(save_dir)
save_dir.mkdir(parents=True, exist_ok=True)
for p in save_dir.iterdir():
    p.unlink()

# Load the optimization configuration detailing the optimization we wish to apply
optimization_config = AutoOptimizationConfig.O4()
optimizer = ORTOptimizer.from_pretrained(model)

optimizer.optimize(
    save_dir=save_dir, optimization_config=optimization_config, use_external_data_format=True
)
model = ORTModelForFeatureExtraction.from_pretrained(save_dir)

tokenizer.save_pretrained(save_dir)
# model.save_pretrained(save_dir)
# model.push_to_hub("new_path_for_directory", repository_id="my-onnx-repo", use_auth_token=True)
```

--------------------------------

### Load and Convert Embeddings

Source: https://github.com/qdrant/fastembed/blob/main/docs/experimental/Binary Quantization from Scratch.ipynb

Download dataset from Hugging Face, convert continuous vectors to binary, and determine dimensionality.

```python
# Download from Huggingface Hub
ds = load_dataset(
    "Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-100K", split="train"
)
openai_vectors = np.array(ds["text-embedding-3-large-3072-embedding"])
del ds
```

```python
openai_bin = np.zeros_like(openai_vectors, dtype=np.int8)
openai_bin[openai_vectors > 0] = 1
```

```python
n_dim = openai_vectors.shape[1]
n_dim
```

--------------------------------

### Tag GPU branch release

Source: https://github.com/qdrant/fastembed/blob/main/RELEASE.md

Creates a version tag on the gpu branch.

```bash
git checkout gpu
git tag -a v0.1.0-gpu -m "Release v0.1.0"
```

--------------------------------

### Run SPLADE Model with ONNX Runtime

Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb

Loads and runs the SPLADE_PP_en_v1 model using ONNX Runtime. This involves loading the ORTModelForMaskedLM and tokenizer, preparing inputs in NumPy format, and performing inference.

```python
from optimum.onnxruntime import ORTModelForMaskedLM

model = ORTModelForMaskedLM.from_pretrained("nirantk/SPLADE_PP_en_v1")
tokenizer = AutoTokenizer.from_pretrained("nirantk/SPLADE_PP_en_v1")
```

--------------------------------

### Benchmark Hugging Face Pipeline

Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb

Measure and print the performance of the standard Hugging Face pipeline.

```python
_, _, chars_per_sec = measure_pipeline_time(hf_embed, input_texts=input_texts, model_id=model_id)
print(f"Multilingual Speed: {chars_per_sec:.2f} chars/sec")
_, _, chars_per_sec = measure_pipeline_time(hf_embed, input_texts=english_texts, model_id=model_id)
print(f"English Speed: {chars_per_sec:.2f} chars/sec")
```

--------------------------------

### Benchmark ONNX Pipeline

Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb

Measure and print the performance of the optimized ONNX pipeline.

```python
_, _, chars_per_sec = measure_pipeline_time(onnx_quant_embed, input_texts)
print(f"Multilingual Speed: {chars_per_sec:.2f} chars/sec")
_, _, chars_per_sec = measure_pipeline_time(onnx_quant_embed, english_texts)
print(f"English Speed: {chars_per_sec:.2f} chars/sec")
```

--------------------------------

### Import Libraries

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Import required modules for data handling, embedding, and model inference.

```python
import numpy as np
from datasets import load_dataset
from peft import AutoPeftModelForCausalLM
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams, Distance
from transformers import AutoTokenizer

from fastembed import TextEmbedding
```

--------------------------------

### Troubleshoot CuDNN library missing

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Error message indicating the libcudnn library is missing.

```bash
FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.x: cannot open shared object file: No such file or directory
```

--------------------------------

### Tag main branch release

Source: https://github.com/qdrant/fastembed/blob/main/RELEASE.md

Creates a version tag on the main branch.

```bash
git checkout main
git tag -a v0.1.0 -m "Release v0.1.0"
```

--------------------------------

### CPU provider result

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Output showing active CPU execution provider.

```text
['CPUExecutionProvider']
```

--------------------------------

### Store Vectors in Qdrant

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Prepare points and upsert them into an in-memory Qdrant collection.

```python
context_points = [
    PointStruct(id=idx, vector=emb, payload={"text": text})
    for idx, (emb, text) in enumerate(zip(context_embeddings, contexts))
]
len(context_points[0].vector)
search_client = QdrantClient(":memory:")

search_client.create_collection(
    collection_name="hindi_tamil_contexts",
    vectors_config=VectorParams(size=len(context_points[0].vector), distance=Distance.COSINE),
)
search_client.upsert(collection_name="hindi_tamil_contexts", points=context_points)
```

--------------------------------

### Run Accuracy Benchmarks

Source: https://github.com/qdrant/fastembed/blob/main/docs/experimental/Binary Quantization from Scratch.ipynb

Execute accuracy tests across different sampling rates and limits.

```python
number_of_samples = 10
limits = [3, 10]
sampling_rate = [1, 2, 3, 5]
results = []


def mean_accuracy(number_of_samples, limit, sampling_rate):
    return np.mean(
        [accuracy(i, limit=limit, oversampling=sampling_rate) for i in range(number_of_samples)]
    )


for i in tqdm(sampling_rate):
    for j in tqdm(limits):
        result = {
            "sampling_rate": i,
            "limit": j,
            "mean_acc": mean_accuracy(number_of_samples, j, i),
        }
        print(result)
        results.append(result)
```

--------------------------------

### Hugging Face Token Initialization

Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb

Initializes a Hugging Face token variable. Replace '<your_hf_token_here>' with your actual token.

```python
hf_token = "<your_hf_token_here>"
```

--------------------------------

### Import FastEmbed libraries

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Retrieval_with_FastEmbed.ipynb

Import the required classes for embedding generation.

```python
import numpy as np
from fastembed import TextEmbedding
```

--------------------------------

### Import required libraries

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_vs_HF_Comparison.ipynb

Imports libraries for time tracking, type hinting, plotting, PyTorch functions, Huggingface models, and FastEmbed.

```python
import time
from typing import Callable

import matplotlib.pyplot as plt
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoModel, AutoTokenizer

from fastembed import TextEmbedding
```

--------------------------------

### Run SPLADE Model with PyTorch

Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb

Loads and runs the SPLADE_PP_en_v1 model using PyTorch for generating sparse embeddings. It handles device placement (CUDA or CPU), tokenization, model inference, and transforms the output logits into a sparse vector representation.

```python
# Download the model and tokenizer
device = "cuda:0" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("prithivida/Splade_PP_en_v1", token=hf_token)
reverse_voc = {v: k for k, v in tokenizer.vocab.items()}
model = AutoModelForMaskedLM.from_pretrained("prithivida/Splade_PP_en_v1", token=hf_token)
model.to(device)

# Tokenize the input
inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {key: val.to(device) for key, val in inputs.items()}
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]
token_type_ids = inputs["token_type_ids"]

# Run model and prepare sparse vector
outputs = model(**inputs)
logits = outputs.logits
print("Output Logits shape: ", logits.shape)
print("Output Attention mask shape: ", attention_mask.shape)
relu_log = torch.log(1 + torch.relu(logits))
weighted_log = relu_log * attention_mask.unsqueeze(-1)
max_val, _ = torch.max(weighted_log, dim=1)
vector = max_val.squeeze()
print("Sparse Vector shape: ", vector.shape)
# print("Number of Actual Dimensions: ", len(cols))
cols = [vec.nonzero().squeeze().cpu().tolist() for vec in vector]
weights = [vec[col].cpu().tolist() for vec, col in zip(vector, cols)]

idx = 1
cols, weights = cols[idx], weights[idx]
# Print the BOW representation
d = {k: v for k, v in zip(cols, weights)}
sorted_d = {k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse=True)}
bow_rep = []
for k, v in sorted_d.items():
    bow_rep.append((reverse_voc[k], round(v, 2)))
print(f"SPLADE BOW rep for sentence:\t{sentences[idx]}\n{bow_rep}")
```

--------------------------------

### Define Benchmarking Utility

Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb

Create a function to measure the performance of embedding pipelines in characters per second.

```python
def measure_pipeline_time(
    pipeline, input_texts: list[str], num_runs=10, **kwargs: Any
) -> tuple[float, float]:
    """Measures the time it takes to run the pipeline on the input texts."""
    times = []
    total_chars = sum(len(text) for text in input_texts)
    for _ in range(num_runs):
        start_time = time.time()
        _ = pipeline(inputs=input_texts, **kwargs)
        end_time = time.time()
        times.append(end_time - start_time)

    mean_time = np.mean(times)
    std_dev = np.std(times)
    chars_per_second = total_chars / mean_time
    return mean_time, std_dev, chars_per_second
```

--------------------------------

### Generate Image Embeddings

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Image_Embedding.ipynb

Initializes the ImageEmbedding model and processes a list of image file paths to generate embeddings.

```python
from fastembed import ImageEmbedding

model = ImageEmbedding("Qdrant/resnet50-onnx")

embeddings_generator = model.embed(
    ["../../tests/misc/image.jpeg", "../../tests/misc/small_image.jpeg"]
)
embeddings_list = list(embeddings_generator)
embeddings_list
```

--------------------------------

### Sparse Text Embedding with SPLADE++

Source: https://github.com/qdrant/fastembed/blob/main/README.md

Initialize and use `SparseTextEmbedding` with the 'prithivida/Splade_PP_en_v1' model to generate sparse embeddings. The output is a list of SparseEmbedding objects.

```python
from fastembed import SparseTextEmbedding

model = SparseTextEmbedding(model_name="prithivida/Splade_PP_en_v1")
embeddings = list(model.embed(documents))

# [
#   SparseEmbedding(indices=[ 17, 123, 919, ... ], values=[0.71, 0.22, 0.39, ...]),
#   SparseEmbedding(indices=[ 38,  12,  91, ... ], values=[0.11, 0.22, 0.39, ...])
# ]
```

--------------------------------

### Configure CUDA library path

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Add the CUDA library directory to the LD_LIBRARY_PATH environment variable to ensure proper linking.

```bash
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
```

--------------------------------

### Define Documents and Queries

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/ColBERT_with_FastEmbed.ipynb

Prepare lists of documents and queries for embedding. These are standard Python lists of strings.

```python
documents = [
    "ColBERT is a late interaction text embedding model, however, there are also other models such as TwinBERT.",
    "On the contrary to the late interaction models, the early interaction models contains interaction steps at embedding generation process",
]
queries = [
    "Are there any other late interaction text embedding models except ColBERT?",
    "What is the difference between late interaction and early interaction text embedding models?",
]
```

--------------------------------

### Push tags to remote

Source: https://github.com/qdrant/fastembed/blob/main/RELEASE.md

Pushes all local tags to the remote repository.

```bash
git push --tags
```

--------------------------------

### Rebase GPU branch on main

Source: https://github.com/qdrant/fastembed/blob/main/RELEASE.md

Updates the gpu branch with the latest changes from main.

```bash
git checkout gpu
git rebase main
git push -f origin gpu
```

--------------------------------

### Import FastEmbed classes

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb

Imports the required classes for working with sparse text embeddings.

```python
from fastembed import SparseTextEmbedding, SparseEmbedding
```

--------------------------------

### Initialize FastEmbed TextEmbedding

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_vs_HF_Comparison.ipynb

Initializes the FastEmbed TextEmbedding model using a specified model name. This uses the default Flag Embedding model.

```python
embedding_model = TextEmbedding(model_name=model_id)
```

--------------------------------

### Hugging Face authentication warning

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Warning output regarding missing HF_TOKEN.

```text
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
```

--------------------------------

### Load Dataset

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb

Load the dbpedia-entities dataset from HuggingFace.

```python
dataset = datasets.load_dataset(
    "Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K", split="train"
)
len(dataset)
```

--------------------------------

### List Supported Image Models

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Image_Embedding.ipynb

Retrieves a list of all available image embedding models supported by the library.

```python
ImageEmbedding.list_supported_models()
```

--------------------------------

### List supported sparse embedding models

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/SPLADE_with_FastEmbed.ipynb

Retrieves and displays all available sparse embedding models supported by the library.

```python
SparseTextEmbedding.list_supported_models()
```

--------------------------------

### CPU benchmark output

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Performance result for CPU embedding.

```text
4.33 s ± 591 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

--------------------------------

### Display Ranked Results

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/ColBERT_with_FastEmbed.ipynb

Prints the query and the corresponding ranked documents.

```python
print(f"Query: {queries[0]}")
for index in sorted_indices:
    print(f"Document: {documents[index]}")
```

--------------------------------

### Load Custom Models

Source: https://github.com/qdrant/fastembed/blob/main/docs/Getting Started.ipynb

Specify a different model identifier to use models other than the default.

```python
multilingual_large_model = TextEmbedding("intfloat/multilingual-e5-large")
```

--------------------------------

### Search Context

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Retrieve relevant context for a selected question.

```python
idx = 997

question = questions[idx]
print(question)
search_context = search_client.search(
    query_vector=embed_text(question), collection_name="hindi_tamil_contexts", limit=2
)
search_context_text = search_context[0].payload["text"]
len(search_context_text)
```

--------------------------------

### Import Libraries for FastEmbed

Source: https://github.com/qdrant/fastembed/blob/main/experiments/02_SPLADE_to_ONNX.ipynb

Imports necessary libraries including numpy, torch, and components from the transformers library for model operations.

```python
import numpy as np
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
```

--------------------------------

### Upload Data to Collection

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Binary_Quantization_with_Qdrant.ipynb

Iterate through the dataset and upload vectors and payloads to the Qdrant collection.

```python
def iter_dataset(dataset):
    for point in dataset:
        yield point["openai"], {"text": point["text"]}


vectors, payload = zip(*iter_dataset(dataset))
client.upload_collection(
    collection_name=collection_name,
    vectors=vectors,
    payload=payload,
    parallel=max(1, (os.cpu_count() // 2)),
)
```

--------------------------------

### Compare query embeddings

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Retrieval_with_FastEmbed.ipynb

Inspect the generated embeddings from query_embed and embed methods.

```python
query_embedding[:5], plain_query_embedding[:5]
```

--------------------------------

### Benchmark CPU performance

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Measure execution time for CPU-based embedding.

```python
%%timeit
list(embedding_model_cpu.embed(documents))
```

--------------------------------

### Define Documents for Embedding

Source: https://github.com/qdrant/fastembed/blob/main/docs/qdrant/Usage_With_Qdrant.ipynb

Prepare a list of string documents that will be embedded and stored. Each string represents a document.

```python
# Example list of documents
documents: list[str] =
    [
        "Maharana Pratap was a Rajput warrior king from Mewar",
        "He fought against the Mughal Empire led by Akbar",
        "The Battle of Haldighati in 1576 was his most famous battle",
        "He refused to submit to Akbar and continued guerrilla warfare",
        "His capital was Chittorgarh, which he lost to the Mughals",
        "He died in 1597 at the age of 57",
        "Maharana Pratap is considered a symbol of Rajput resistance against foreign rule",
        "His legacy is celebrated in Rajasthan through festivals and monuments",
        "He had 11 wives and 17 sons, including Amar Singh I who succeeded him as ruler of Mewar",
        "His life has been depicted in various films, TV shows, and books",
    ]
```

--------------------------------

### GPU benchmark output

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Performance result for GPU embedding.

```text
43.4 ms ± 2.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

--------------------------------

### Multi-GPU Text Embedding with Fastembed

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_Multi_GPU.ipynb

Initialize a TextEmbedding model to utilize multiple GPUs. Ensure `parallel` matches the number of `device_ids` and `lazy_load=True` to prevent redundant memory usage.

```python
from fastembed import TextEmbedding

# define the documents to embed
docs = ["hello world", "flag embedding"] * 100

# define gpu ids
device_ids = [0, 1]

if __name__ == "__main__":
    # initialize a TextEmbedding model using CUDA
    text_model = TextEmbedding(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        cuda=True,
        device_ids=device_ids,
        lazy_load=True,
    )

    # generate embeddings
    text_embeddings = list(text_model.embed(docs, batch_size=2, parallel=len(device_ids)))
    print(text_embeddings)
```

--------------------------------

### List Supported Image Embedding Models

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Supported_Models.ipynb

Retrieves and displays a sorted list of supported image embedding models. The output is formatted as a pandas DataFrame, sorted by size in ascending order.

```python
(
    pd.DataFrame(ImageEmbedding.list_supported_models())
    .sort_values("size_in_GB")
    .drop(columns=["sources", "model_file"])
    .reset_index(drop=True)
)
```

--------------------------------

### Inspect Sparse Embedding Tokens and Weights

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hybrid_Search.ipynb

Displays the token-weight mapping for a specific sparse embedding.

```python
print(json.dumps(get_tokens_and_weights(sparse_embedding[0], sparse_model_name), indent=4))
```

--------------------------------

### Troubleshoot CUDA path configuration

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Error message indicating a failure to map segments from the shared object.

```bash
FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcufft.so.x: failed to map segment from shared object
```

--------------------------------

### Compress Directory to Tar.gz

Source: https://github.com/qdrant/fastembed/blob/main/experiments/01_ONNX_Port.ipynb

Compresses a given directory into a .tar.gz file. It checks if an output file already exists and prevents overwriting. Ensure the directory exists before calling.

```python
import os
from pathlib import Path
import tarfile

save_dir = Path("../local_cache/fast-bge-small-en-v1.5")


def compress(directory_path):
    directory_path = Path(directory_path)
    assert directory_path.exists(), f"{directory_path} does not exist"
    output_filename = directory_path.name + ".tar.gz"
    if Path(output_filename).exists():
        print("We've an output file already? Manually delete that first")
        return output_filename

    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(directory_path, arcname=os.path.basename(directory_path))
    return output_filename


compressed_file_name = compress(save_dir)
```

--------------------------------

### Dense Text Embedding with Specific Model

Source: https://github.com/qdrant/fastembed/blob/main/README.md

Initialize TextEmbedding with a specific model name, such as 'BAAI/bge-small-en-v1.5', and generate embeddings. The output is a list of numpy arrays.

```python
from fastembed import TextEmbedding

model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = list(model.embed(documents))

# [
#   array([-0.1115,  0.0097,  0.0052,  0.0195, ...], dtype=float32),
#   array([-0.1019,  0.0635, -0.0332,  0.0522, ...], dtype=float32)
# ]
```

--------------------------------

### Execution provider result

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/FastEmbed_GPU.ipynb

Output showing active execution providers.

```text
['CUDAExecutionProvider', 'CPUExecutionProvider']
```

--------------------------------

### Export PyTorch Model to ONNX

Source: https://github.com/qdrant/fastembed/blob/main/experiments/Example. Convert Resnet50 to ONNX.ipynb

Exports a pre-trained ResNet-50 model (without the final classification layer) to ONNX format. Ensure the 'example.jpg' file exists for input preprocessing. The model is configured for dynamic batch sizes.

```python
import torch
import torch.onnx
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import numpy as np
from tests.config import TEST_MISC_DIR

# Load pre-trained ResNet-50 model
resnet = models.resnet50(pretrained=True)
resnet = torch.nn.Sequential(*(list(resnet.children())[:-1]))  # Remove the last fully connected layer
resnet.eval()

# Define preprocessing transform
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Load and preprocess the image
def preprocess_image(image_path):
    input_image = Image.open(image_path)
    input_tensor = preprocess(input_image)
    input_batch = input_tensor.unsqueeze(0)  # Add batch dimension
    return input_batch

# Example input for exporting
input_image = preprocess_image('example.jpg')

# Export the model to ONNX with dynamic axes
torch.onnx.export(
    resnet, 
    input_image, 
    "model.onnx", 
    export_params=True, 
    opset_version=9, 
    input_names=['input'], 
    output_names=['output'],
    dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}
)
```

--------------------------------

### List Supported Sparse Text Embedding Models

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Supported_Models.ipynb

Retrieves and displays a sorted list of supported sparse text embedding models. The output is formatted as a pandas DataFrame, sorted by size in ascending order.

```python
(
    pd.DataFrame(SparseTextEmbedding.list_supported_models())
    .sort_values("size_in_GB")
    .drop(columns=["sources", "model_file", "additional_files"])
    .reset_index(drop=True)
)
```

--------------------------------

### Generate Text Embeddings with FastEmbed

Source: https://github.com/qdrant/fastembed/blob/main/docs/index.md

Import the TextEmbedding class and use it to generate embeddings for a list of documents. Ensure numpy is imported as np.

```python
from fastembed import TextEmbedding

documents: list[str] = [
    "passage: Hello, World!",
    "query: Hello, World!",
    "passage: This is an example passage.",
    "fastembed is supported by and maintained by Qdrant."
]
embedding_model = TextEmbedding()
embeddings: list[np.ndarray] = embedding_model.embed(documents)
```

--------------------------------

### Display Results

Source: https://github.com/qdrant/fastembed/blob/main/docs/examples/Hindi_Tamil_RAG_with_Navarasa7B.ipynb

Extract the generated answer and compare it with the ground truth.

```python
response.split(sep="### Answer:")[-1].strip("<eos>").strip()
ds[idx]["answer_text"]
```