### ContrastiveTrainer Setup

Source: https://context7.com/illuin-tech/colpali/llms.txt

Initializes the ContrastiveTrainer for custom training loops, supporting multi-GPU setups. Requires model, processor, collator, and loss function.

```python
import torch
from colpali_engine.trainer.contrastive_trainer import ContrastiveTrainer
from colpali_engine.collators import VisualRetrieverCollator
from colpali_engine.loss.late_interaction_losses import ColbertLoss
from colpali_engine.models import ColQwen2, ColQwen2Processor
from transformers import TrainingArguments

# Setup
model = ColQwen2.from_pretrained("vidore/colqwen2-v1.0", torch_dtype=torch.bfloat16)
processor = ColQwen2Processor.from_pretrained("vidore/colqwen2-v1.0")

# Collator and loss
collator = VisualRetrieverCollator(processor=processor)
loss_func = ColbertLoss(temperature=0.02, normalize_scores=True)

```

--------------------------------

### SLURM Cluster Training Examples

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Submits training jobs to a SLURM cluster. The first example configures a single GPU job with specific resources, while the second example requests multiple GPUs with different constraints.

```bash
sbatch --nodes=1 --cpus-per-task=16 --mem-per-cpu=32GB --time=20:00:00 --gres=gpu:1  -p gpua100 --job-name=colidefics --output=colidefics.out --error=colidefics.err --wrap="accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml"
```

```bash
sbatch --nodes=1  --time=5:00:00 -A cad15443 --gres=gpu:8  --constraint=MI250 --job-name=colpali --wrap="accelerate launch --multi-gpu scripts/configs/qwen2/train_colqwen25_model.py"
```

--------------------------------

### Install ColPali Engine

Source: https://context7.com/illuin-tech/colpali/llms.txt

Install the ColPali engine from PyPI or source. Additional dependencies for training or interpretability tools can be included.

```bash
pip install colpali-engine
```

```bash
pip install git+https://github.com/illuin-tech/colpali
```

```bash
pip install "colpali-engine[train]"
```

```bash
pip install "colpali-engine[interpretability]"
```

```bash
pip install "colpali-engine[all]"
```

--------------------------------

### Local Training Example

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Launches the ColPali training script for local execution, potentially utilizing multiple GPUs. Ensure 'accelerate' is configured correctly for your environment.

```bash
accelerate launch --multi-gpu scripts/configs/qwen2/train_colqwen25_model.py
```

--------------------------------

### Install Colpali Engine with Interpretability

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Install the Colpali engine with interpretability features enabled. This is required for generating similarity maps.

```bash
pip install colpali-engine[interpretability]
```

--------------------------------

### Install ColPali Training Dependencies

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Install the essential packages for using the ColPali training script. This command ensures all necessary dependencies for training are available.

```bash
pip install "colpali-engine[train]"
```

--------------------------------

### Quick Start with ColQwen2

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Load the ColQwen2 model and processor, prepare image and query inputs, and generate embeddings. Ensure flash attention 2 is available for optimized performance.

```python
import torch
from PIL import Image
from transformers.utils.import_utils import is_flash_attn_2_available

from colpali_engine.models import ColQwen2, ColQwen2Processor

model_name = "vidore/colqwen2-v1.0"

model = ColQwen2.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",  # or "mps" if on Apple Silicon
    attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None,
).eval()

processor = ColQwen2Processor.from_pretrained(model_name)

# Your inputs
images = [
    Image.new("RGB", (128, 128), color="white"),
    Image.new("RGB", (64, 32), color="black"),
]
queries = [
    "What is the organizational structure for our R&D department?",
    "Can you provide a breakdown of last year’s financial performance?",
]

# Process the inputs
batch_images = processor.process_images(images).to(model.device)
batch_queries = processor.process_queries(queries).to(model.device)

# Forward pass
with torch.no_grad():
    image_embeddings = model(**batch_images)
    query_embeddings = model(**batch_queries)

scores = processor.score_multi_vector(query_embeddings, image_embeddings)
```

--------------------------------

### Install ColPali Engine

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Install a specific version of the colpali-engine package. This is useful for reproducing results from a particular release.

```bash
pip install colpali-engine==0.1.1
```

--------------------------------

### Install All ColPali Optional Dependencies for Testing

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Installs all optional dependencies for ColPali to ensure comprehensive test discovery and execution. This is necessary to avoid errors during test runs.

```bash
pip install "colpali-engine[all]"
```

--------------------------------

### Install ColPali Development Dependencies

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Installs development dependencies for ColPali, enabling proper testing and linting. This is required for contributing to the project.

```bash
pip install "colpali-engine[dev]"
```

--------------------------------

### Install Colpali Engine

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Install the Colpali engine package from PyPi or directly from source. Ensure to use a version above v0.2.0 for ColPali versions above v1.0.

```bash
pip install colpali-engine # from PyPi
```

```bash
pip install git+https://github.com/illuin-tech/colpali # from source
```

--------------------------------

### Create a Corpus and Dataset

Source: https://context7.com/illuin-tech/colpali/llms.txt

Demonstrates how to create a Corpus from data and then initialize a ColPaliEngineDataset using this corpus.

```python
corpus_data = [{"doc": f"document_{i}"} for i in range(100)]
corpus = Corpus(
    corpus_data=corpus_data,
    doc_column_name="doc",
)

# Dataset with external corpus
train_data = [
    {"query": "query 1", "pos_target": 0, "neg_target": [1, 2, 3]},
    {"query": "query 2", "pos_target": 5, "neg_target": [6, 7, 8]},
]
dataset_with_corpus = ColPaliEngineDataset(
    data=train_data,
    corpus=corpus,
    query_column_name="query",
    pos_target_column_name="pos_target",
    neg_target_column_name="neg_target",
)
```

--------------------------------

### Full Training Pipeline with ColModelTraining

Source: https://context7.com/illuin-tech/colpali/llms.txt

Sets up and runs a complete training pipeline using HuggingFace Trainer for contrastive learning. Configure datasets, loss functions, and optional LoRA for fine-tuning.

```python
import torch
from datasets import load_dataset
from peft import LoraConfig
from transformers import TrainingArguments

from colpali_engine.data.dataset import ColPaliEngineDataset
from colpali_engine.loss.late_interaction_losses import ColbertLoss
from colpali_engine.models import ColQwen2, ColQwen2Processor
from colpali_engine.trainer.colmodel_training import (
    ColModelTraining,
    ColModelTrainingConfig,
)

# Load model and processor
processor = ColQwen2Processor.from_pretrained(
    "vidore/colqwen2-v1.0",
    max_num_visual_tokens=768,
)
model = ColQwen2.from_pretrained(
    "vidore/colqwen2-v1.0",
    torch_dtype=torch.bfloat16,
    use_cache=False,
    attn_implementation="flash_attention_2",
)

# Prepare datasets
train_hf = load_dataset("your-dataset", split="train")
eval_hf = load_dataset("your-dataset", split="validation")

train_dataset = ColPaliEngineDataset(
    data=train_hf,
    query_column_name="query",
    pos_target_column_name="image",
)
eval_dataset = ColPaliEngineDataset(
    data=eval_hf,
    query_column_name="query",
    pos_target_column_name="image",
)

# Configure training
config = ColModelTrainingConfig(
    output_dir="./models/my-colqwen2",
    processor=processor,
    model=model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    run_eval=True,
    loss_func=ColbertLoss(
        temperature=0.02,
        normalize_scores=True,
    ),
    tr_args=TrainingArguments(
        output_dir=None,  # Will use config.output_dir
        num_train_epochs=3,
        per_device_train_batch_size=32,
        gradient_checkpointing=True,
        gradient_checkpointing_kwargs={"use_reentrant": False},
        learning_rate=2e-4,
        warmup_steps=100,
        logging_steps=10,
        save_steps=500,
        eval_strategy="steps",
        eval_steps=100,
    ),
    # Optional: Use LoRA for efficient fine-tuning
    peft_config=LoraConfig(
        r=32,
        lora_alpha=32,
        lora_dropout=0.1,
        bias="none",
        task_type="FEATURE_EXTRACTION",
        target_modules=r"(.*(model).*(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).*$|.*(custom_text_proj).*$)",
    ),
)

# Train and save
trainer = ColModelTraining(config)
trainer.train()
trainer.save()
```

--------------------------------

### Create and Train ContrastiveTrainer

Source: https://context7.com/illuin-tech/colpali/llms.txt

Instantiate and train a ContrastiveTrainer for visual document retrieval tasks. Ensure necessary datasets, collator, loss function, and training arguments are provided.

```python
trainer = ContrastiveTrainer(
    model=model,
    train_dataset=train_dataset,  # ColPaliEngineDataset
    eval_dataset=eval_dataset,
    data_collator=collator,
    loss_func=loss_func,
    is_vision_model=True,
    compute_symetric_loss=False,  # Optional: bidirectional loss
    args=TrainingArguments(
        output_dir="./output",
        per_device_train_batch_size=16,
        num_train_epochs=3,
        learning_rate=2e-4,
    ),
)

# Train
trainer.train()
```

--------------------------------

### Generate Similarity Maps for Interpretability

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Load the ColPali model and processor, preprocess an image and query, and generate similarity maps to visualize model focus zones.

```python
import torch
from PIL import Image

from colpali_engine.interpretability import (
    get_similarity_maps_from_embeddings,
    plot_all_similarity_maps,
)
from colpali_engine.models import ColPali, ColPaliProcessor
from colpali_engine.utils.torch_utils import get_torch_device

model_name = "vidore/colpali-v1.3"
device = get_torch_device("auto")

# Load the model
model = ColPali.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map=device,
).eval()

# Load the processor
processor = ColPaliProcessor.from_pretrained(model_name)

# Load the image and query
image = Image.open("shift_kazakhstan.jpg")
query = "Quelle partie de la production pétrolière du Kazakhstan provient de champs en mer ?"

# Preprocess inputs
batch_images = processor.process_images([image]).to(device)
batch_queries = processor.process_queries([query]).to(device)
```

--------------------------------

### Fast-Plaid Index Creation and Querying

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Utilize fast-plaid for quicker matching with larger corpus sizes. Process images in batches and create a plaid index for efficient similarity scoring.

```python
# !pip install --no-deps fast-plaid fastkmeans

# Process the inputs by batches of 4
dataloader = DataLoader(
    dataset=images,
    batch_size=4,
    shuffle=False,
    collate_fn=lambda x: processor.process_images(x),
)

ds  = []
for batch_doc in tqdm(dataloader):
    with torch.no_grad():
        batch_doc = {k: v.to(model.device) for k, v in batch_doc.items()}
        embeddings_doc = model(**batch_doc)
    ds.extend(list(torch.unbind(embeddings_doc.to("cpu"))))

plaid_index = processor.create_plaid_index(ds)

scores = processor.get_topk_plaid(query_embeddings, plaid_index, k=10)
```

--------------------------------

### Load ColQwen2 Model and Processor

Source: https://context7.com/illuin-tech/colpali/llms.txt

Load the ColQwen2 vision retriever model and its corresponding processor. Supports optional flash attention for performance. Ensure CUDA or MPS is available for GPU acceleration.

```python
import torch
from PIL import Image
from transformers.utils.import_utils import is_flash_attn_2_available

from colpali_engine.models import ColQwen2, ColQwen2Processor

model_name = "vidore/colqwen2-v1.0"

# Load the model with optional flash attention
model = ColQwen2.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",  # or "mps" for Apple Silicon
    attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None,
).eval()

# Load the processor
processor = ColQwen2Processor.from_pretrained(model_name)

# Example: get embedding dimension
print(f"Embedding dimension: {model.dim}")  # Output: 128
print(f"Patch size: {model.patch_size}")
```

--------------------------------

### VisualRetrieverCollator for Batching

Source: https://context7.com/illuin-tech/colpali/llms.txt

Prepares batches of queries and images for training vision retrieval models. Ensure the processor is loaded correctly and samples contain image data for positive targets.

```python
from colpali_engine.collators import VisualRetrieverCollator
from colpali_engine.models import ColQwen2Processor
from PIL import Image

processor = ColQwen2Processor.from_pretrained("vidore/colqwen2-v1.0")

# Create the collator
collator = VisualRetrieverCollator(
    processor=processor,
    max_length=2048,
)

# Example batch of training samples
samples = [
    {
        "query": "What is the revenue?",
        "pos_target": [Image.new("RGB", (800, 600), "white")],
        "neg_target": None,
    },
    {
        "query": "Show the organizational chart",
        "pos_target": [Image.new("RGB", (800, 600), "lightgray")],
        "neg_target": None,
    },
]

# Collate into model-ready batch
batch = collator(samples)

print("Batch keys:", list(batch.keys()))
# Output: ['query_input_ids', 'query_attention_mask', 'doc_input_ids',
#          'doc_attention_mask', 'doc_pixel_values', ...]

print(f"Query input shape: {batch['query_input_ids'].shape}")
print(f"Doc input shape: {batch['doc_input_ids'].shape}")
```

--------------------------------

### ViDoRe Benchmark Citation (arXiv)

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

BibTeX entry for the ViDoRe Benchmark V2 paper. Use this to cite the benchmark in academic contexts.

```latex
@misc{macé2025vidorebenchmarkv2raising,
      title={ViDoRe Benchmark V2: Raising the Bar for Visual Retrieval},
      author={Quentin Macé and António Loison and Manuel Faysse},
      year={2025},
      eprint={2505.17166},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2505.17166},
}

```

--------------------------------

### Load and Use ColIdefics3 (SmolVLM) Model

Source: https://context7.com/illuin-tech/colpali/llms.txt

Load a ColIdefics3 (SmolVLM) model for resource-constrained environments. Process images and queries to generate embeddings and compute similarity scores using multi-vector scoring.

```python
import torch
from PIL import Image
from colpali_engine.models import ColIdefics3, ColIdefics3Processor

# ColSmol variants: 256M or 500M parameters
model_name = "vidore/colSmol-500M"

model = ColIdefics3.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()
processor = ColIdefics3Processor.from_pretrained(model_name)

# Process inputs
images = [Image.new("RGB", (800, 600), "white")]
queries = ["Find the quarterly results"]

batch_images = processor.process_images(images).to(model.device)
batch_queries = processor.process_queries(queries).to(model.device)

with torch.no_grad():
    image_embeddings = model(**batch_images)
    query_embeddings = model(**batch_queries)

scores = processor.score_multi_vector(query_embeddings, image_embeddings)
print(f"Score: {scores[0, 0].item():.4f}")
```

--------------------------------

### Load and Use BiQwen2 Bi-Encoder Model

Source: https://context7.com/illuin-tech/colpali/llms.txt

Load a BiQwen2 bi-encoder model for faster retrieval. Process images and queries to generate single-vector embeddings and compute similarity scores using dot product.

```python
import torch
from PIL import Image
from colpali_engine.models import BiQwen2, BiQwen2Processor

# Load bi-encoder model
model = BiQwen2.from_pretrained(
    "vidore/biqwen2-v1.0",
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()
processor = BiQwen2Processor.from_pretrained("vidore/biqwen2-v1.0")

# Process inputs
images = [Image.new("RGB", (800, 600), "white")]
queries = ["What is the revenue?"]

batch_images = processor.process_images(images).to(model.device)
batch_queries = processor.process_queries(queries).to(model.device)

# Generate single-vector embeddings
with torch.no_grad():
    image_embeddings = model(**batch_images)  # (batch, dim)
    query_embeddings = model(**batch_queries)  # (batch, dim)

# Simple dot product scoring
scores = processor.score_single_vector(query_embeddings, image_embeddings)
print(f"Similarity score: {scores[0, 0].item():.4f}")
```

--------------------------------

### Generate Similarity Maps for Interpretability

Source: https://context7.com/illuin-tech/colpali/llms.txt

Visualize model focus for each query token on document images using similarity maps. This involves generating embeddings, calculating similarity maps, and plotting them.

```python
import torch
from PIL import Image
from colpali_engine.models import ColPali, ColPaliProcessor
from colpali_engine.interpretability import (
    get_similarity_maps_from_embeddings,
    plot_all_similarity_maps,
)

model_name = "vidore/colpali-v1.3"
model = ColPali.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()
processor = ColPaliProcessor.from_pretrained(model_name)

# Load a document image
image = Image.open("document.jpg")
query = "What is the total revenue?"

# Process inputs
batch_images = processor.process_images([image]).to(model.device)
batch_queries = processor.process_queries([query]).to(model.device)

# Generate embeddings
with torch.no_grad():
    image_embeddings = model(**batch_images)
    query_embeddings = model(**batch_queries)

# Get the number of patches
n_patches = processor.get_n_patches(
    image_size=image.size,
    patch_size=model.patch_size,
)

# Get image mask
image_mask = processor.get_image_mask(batch_images)

# Generate similarity maps
similarity_maps_batch = get_similarity_maps_from_embeddings(
    image_embeddings=image_embeddings,
    query_embeddings=query_embeddings,
    n_patches=n_patches,
    image_mask=image_mask,
)

# Get maps for our image (first in batch)
similarity_maps = similarity_maps_batch[0]  # (query_length, n_patches_x, n_patches_y)

# Tokenize query for labels
query_tokens = processor.tokenizer.tokenize(query)

# Plot similarity maps for each token
plots = plot_all_similarity_maps(
    image=image,
    query_tokens=query_tokens,
    similarity_maps=similarity_maps,
    figsize=(8, 8),
    show_colorbar=True,
)

# Save each plot
for idx, (fig, ax) in enumerate(plots):
    fig.savefig(f"similarity_map_{idx}.png")
```

--------------------------------

### Process Text Queries with ColQwen2

Source: https://context7.com/illuin-tech/colpali/llms.txt

Prepare text queries using the `process_queries` method from ColQwen2Processor. This method automatically adds augmentation tokens for improved retrieval performance. Embeddings are then generated from the processed queries.

```python
import torch
from colpali_engine.models import ColQwen2, ColQwen2Processor

model = ColQwen2.from_pretrained(
    "vidore/colqwen2-v1.0",
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()
processor = ColQwen2Processor.from_pretrained("vidore/colqwen2-v1.0")

# Define queries
queries = [
    "What is the organizational structure for our R&D department?",
    "Can you provide a breakdown of last year's financial performance?",
    "Show me the quarterly revenue chart",
]

# Process queries (adds augmentation tokens automatically)
batch_queries = processor.process_queries(queries).to(model.device)

# Generate query embeddings
with torch.no_grad():
    query_embeddings = model(**batch_queries)

print(f"Query embeddings shape: {query_embeddings.shape}")
# Output: torch.Size([3, query_seq_len, 128])
```

--------------------------------

### ColPaliEngineDataset for Training Data

Source: https://context7.com/illuin-tech/colpali/llms.txt

A PyTorch Dataset class for loading query-document pairs, optionally including hard negatives. Requires `datasets` and `colpali_engine`.

```python
from datasets import load_dataset
from colpali_engine.data.dataset import ColPaliEngineDataset, Corpus

# Load a HuggingFace dataset
hf_dataset = load_dataset("vidore/docvqa_test_subsampled", split="test")

# Create training dataset directly from HF dataset
train_dataset = ColPaliEngineDataset(
    data=hf_dataset,
    query_column_name="query",       # Column containing query text
    pos_target_column_name="image",  # Column containing document images
    neg_target_column_name=None,     # Optional: column with hard negative IDs
    num_negatives=3,                 # Max negatives to sample per query
)

print(f"Dataset size: {len(train_dataset)}")

# Access a sample
sample = train_dataset[0]
print(f"Query: {sample['query']}")
print(f"Positive target type: {type(sample['pos_target'])}")
```

--------------------------------

### Process Document Images with ColQwen2

Source: https://context7.com/illuin-tech/colpali/llms.txt

Use the `process_images` method from the ColQwen2Processor to convert PIL images into model-ready tensors. These tensors are then used to generate multi-vector embeddings.

```python
import torch
from PIL import Image
from colpali_engine.models import ColQwen2, ColQwen2Processor

model = ColQwen2.from_pretrained(
    "vidore/colqwen2-v1.0",
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()
processor = ColQwen2Processor.from_pretrained("vidore/colqwen2-v1.0")

# Create sample images (or load real document images)
images = [
    Image.new("RGB", (800, 600), color="white"),
    Image.new("RGB", (1024, 768), color="lightgray"),
]

# Process images into batched tensors
batch_images = processor.process_images(images).to(model.device)

# Generate multi-vector embeddings
with torch.no_grad():
    image_embeddings = model(**batch_images)

print(f"Image embeddings shape: {image_embeddings.shape}")
# Output: torch.Size([2, seq_len, 128])
```

--------------------------------

### Load ColPali Model and Processor

Source: https://context7.com/illuin-tech/colpali/llms.txt

Load the original ColPali model based on PaliGemma and its processor. This model is designed for document retrieval using vision language models.

```python
import torch
from colpali_engine.models import ColPali, ColPaliProcessor

model_name = "vidore/colpali-v1.3"

# Load model
model = ColPali.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()

# Load processor
processor = ColPaliProcessor.from_pretrained(model_name)

# Model properties
print(f"Embedding dimension: {model.dim}")  # Output: 128
print(f"Patch size: {model.patch_size}")
```

--------------------------------

### Generate Similarity Maps with ColPali

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Use this code to generate similarity maps by performing forward passes on image and query batches, then processing embeddings to visualize query-token relevance.

```python
with torch.no_grad():
    image_embeddings = model.forward(**batch_images)
    query_embeddings = model.forward(**batch_queries)

n_patches = processor.get_n_patches(image_size=image.size, patch_size=model.patch_size)

image_mask = processor.get_image_mask(batch_images)

batched_similarity_maps = get_similarity_maps_from_embeddings(
    image_embeddings=image_embeddings,
    query_embeddings=query_embeddings,
    n_patches=n_patches,
    image_mask=image_mask,
)

similarity_maps = batched_similarity_maps[0]  # (query_length, n_patches_x, n_patches_y)

query_tokens = processor.tokenizer.tokenize(query)

plots = plot_all_similarity_maps(
    image=image,
    query_tokens=query_tokens,
    similarity_maps=similarity_maps,
)
for idx, (fig, ax) in enumerate(plots):
    fig.savefig(f"similarity_map_{idx}.png")
```

--------------------------------

### ColbertLoss for Training Retrieval Models

Source: https://context7.com/illuin-tech/colpali/llms.txt

An InfoNCE-style loss function for training late interaction retrieval models using in-batch negatives. Requires `torch` and `colpali_engine`.

```python
import torch
from colpali_engine.loss.late_interaction_losses import ColbertLoss

# Initialize the loss function
loss_func = ColbertLoss(
    temperature=0.02,          # Scaling factor for logits
    normalize_scores=True,     # Normalize by query length
    use_smooth_max=False,      # Use amax instead of log-sum-exp
    pos_aware_negative_filtering=False,  # Filter false negatives
)

# Simulated batch of embeddings
batch_size = 8
query_length = 32
doc_length = 1024
dim = 128

query_embeddings = torch.randn(batch_size, query_length, dim)
doc_embeddings = torch.randn(batch_size, doc_length, dim)

# L2 normalize embeddings (as done in the model)
query_embeddings = query_embeddings / query_embeddings.norm(dim=-1, keepdim=True)
doc_embeddings = doc_embeddings / doc_embeddings.norm(dim=-1, keepdim=True)

# Compute loss (diagonal elements are positive pairs)
loss = loss_func(
    query_embeddings=query_embeddings,
    doc_embeddings=doc_embeddings,
    offset=0,  # For multi-GPU training offset
)

print(f"Training loss: {loss.item():.4f}")
```

--------------------------------

### Token Pooling with HierarchicalTokenPooler (Padded Tensor Input)

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Pool embeddings from padded 3D tensor inputs using HierarchicalTokenPooler. Set padding=True and provide the tokenizer's padding_side for correct padding removal before pooling.

```python
import torch
from PIL import Image
from transformers.utils.import_utils import is_flash_attn_2_available

from colpali_engine.compression.token_pooling import HierarchicalTokenPooler
from colpali_engine.models import ColQwen2, ColQwen2Processor

model_name = "vidore/colqwen2-v1.0"
model = ColQwen2.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",  # or "mps" if on Apple Silicon
    attn_implementation="flash_attention_2" if is_flash_attn_2_available() else None,
).eval()
processor = ColQwen2Processor.from_pretrained(model_name)

token_pooler = HierarchicalTokenPooler()

# Your page images
images = [
    Image.new("RGB", (128, 128), color="white"),
    Image.new("RGB", (32, 32), color="black"),
]

# Process the inputs
batch_images = processor.process_images(images).to(model.device)

# Forward pass
with torch.no_grad():
    image_embeddings = model(**batch_images)

# Apply token pooling (reduces the sequence length of the multi-vector embeddings)
image_embeddings = token_pooler.pool_embeddings(
    image_embeddings,
    pool_factor=2,
    padding=True,
    padding_side=processor.tokenizer.padding_side,
)
```

--------------------------------

### ColPali Paper Citation (arXiv)

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

BibTeX entry for the ColPali paper. Use this to cite the work in academic contexts.

```latex
@misc{faysse2024colpaliefficientdocumentretrieval,
      title={ColPali: Efficient Document Retrieval with Vision Language Models},
      author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
      year={2024},
      eprint={2407.01449},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2407.01449},
}

```

--------------------------------

### Token Pooling for Embedding Compression

Source: https://context7.com/illuin-tech/colpali/llms.txt

Reduces multi-vector embedding size using hierarchical token pooling. Requires `torch`, `PIL`, and `colpali_engine`.

```python
import torch
from PIL import Image
from colpali_engine.models import ColQwen2, ColQwen2Processor
from colpali_engine.compression.token_pooling import HierarchicalTokenPooler

model = ColQwen2.from_pretrained(
    "vidore/colqwen2-v1.0",
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()
processor = ColQwen2Processor.from_pretrained("vidore/colqwen2-v1.0")

# Create the token pooler
token_pooler = HierarchicalTokenPooler()

# Process images
images = [
    Image.new("RGB", (800, 600), color="white"),
    Image.new("RGB", (1024, 768), color="lightgray"),
]
batch_images = processor.process_images(images).to(model.device)

# Generate embeddings
with torch.no_grad():
    image_embeddings = model(**batch_images)

print(f"Original shape: {image_embeddings.shape}")

# Apply token pooling with pool_factor=2 (reduces by ~50%)
pooled_embeddings = token_pooler.pool_embeddings(
    image_embeddings,
    pool_factor=2,
    padding=True,
    padding_side=processor.tokenizer.padding_side,
)

print(f"Pooled embeddings: {len(pooled_embeddings)} tensors")
for i, emb in enumerate(pooled_embeddings):
    print(f"  Document {i}: {emb.shape}")

# Example with pool_factor=3 (reduces by ~66.7%, retains ~97.8% performance)
pooled_3x = token_pooler.pool_embeddings(
    image_embeddings,
    pool_factor=3,
    padding=True,
    padding_side=processor.tokenizer.padding_side,
)
```

--------------------------------

### Token Pooling with HierarchicalTokenPooler (List Input)

Source: https://github.com/illuin-tech/colpali/blob/main/README.md

Pool embeddings from a list of 2D tensors using HierarchicalTokenPooler. Specify the pool_factor to control the compression level.

```python
import torch

from colpali_engine.compression.token_pooling import HierarchicalTokenPooler

# Dummy multivector embeddings
list_embeddings = [
    torch.rand(10, 768),
    torch.rand(20, 768),
]

# Define the pooler with the desired level of compression
pooler = HierarchicalTokenPooler()

# Pool the embeddings
outputs = pooler.pool_embeddings(list_embeddings, pool_factor=2)
```

--------------------------------

### ColbertNegativeCELoss with Hard Negatives

Source: https://context7.com/illuin-tech/colpali/llms.txt

A loss function that incorporates explicit hard negatives for improved training of retrieval models. Requires `torch` and `colpali_engine`.

```python
import torch
from colpali_engine.loss.late_interaction_losses import ColbertNegativeCELoss

# Initialize loss with hard negative support
loss_func = ColbertNegativeCELoss(
    temperature=0.02,
    normalize_scores=True,
    in_batch_term_weight=0.5,  # Weight for in-batch negatives (0-1)
)

# Simulated embeddings
batch_size = 8
query_length = 32
doc_length = 1024
num_negatives = 3
dim = 128

query_embeddings = torch.randn(batch_size, query_length, dim)
doc_embeddings = torch.randn(batch_size, doc_length, dim)
neg_doc_embeddings = torch.randn(batch_size, num_negatives, doc_length, dim)

# Normalize
query_embeddings = query_embeddings / query_embeddings.norm(dim=-1, keepdim=True)
doc_embeddings = doc_embeddings / doc_embeddings.norm(dim=-1, keepdim=True)
neg_doc_embeddings = neg_doc_embeddings / neg_doc_embeddings.norm(dim=-1, keepdim=True)

# Compute loss with explicit negatives
loss = loss_func(
    query_embeddings=query_embeddings,
    doc_embeddings=doc_embeddings,
    neg_doc_embeddings=neg_doc_embeddings,
    offset=0,
)

print(f"Loss with hard negatives: {loss.item():.4f}")
```

--------------------------------

### Large-Scale Retrieval with FastPlaid

Source: https://context7.com/illuin-tech/colpali/llms.txt

Utilize FastPlaid for efficient approximate search over multi-vector embeddings in large document collections. This involves creating a FastPlaid index from document embeddings and then querying it.

```python
import torch
from PIL import Image
from torch.utils.data import DataLoader
from tqdm import tqdm
from colpali_engine.models import ColQwen2, ColQwen2Processor

# pip install --no-deps fast-plaid fastkmeans

model = ColQwen2.from_pretrained(
    "vidore/colqwen2-v1.0",
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()
processor = ColQwen2Processor.from_pretrained("vidore/colqwen2-v1.0")

# Create a large document corpus
images = [Image.new("RGB", (800, 600), color="white") for _ in range(100)]

# Process documents in batches
dataloader = DataLoader(
    dataset=images,
    batch_size=4,
    shuffle=False,
    collate_fn=lambda x: processor.process_images(x),
)

doc_embeddings = []
for batch_doc in tqdm(dataloader, desc="Embedding documents"):
    with torch.no_grad():
        batch_doc = {k: v.to(model.device) for k, v in batch_doc.items()}
        embeddings = model(**batch_doc)
    doc_embeddings.extend(list(torch.unbind(embeddings.to("cpu"))))

# Create FastPlaid index
plaid_index = processor.create_plaid_index(doc_embeddings)

# Process queries
queries = ["Find revenue information", "Show organizational structure"]
batch_queries = processor.process_queries(queries).to(model.device)

with torch.no_grad():
    query_embeddings = model(**batch_queries)

# Search using the index
top_k_results = processor.get_topk_plaid(
    query_embeddings.cpu(),
    plaid_index,
    k=10
)

print(f"Top-10 results for each query: {top_k_results}")
```

--------------------------------

### Compute ColBERT-style Similarity Scores

Source: https://context7.com/illuin-tech/colpali/llms.txt

Use `score_multi_vector` to compute late interaction scores between query and document embeddings. Ensure model and processor are loaded and inputs are processed before calling this method.

```python
import torch
from PIL import Image
from colpali_engine.models import ColQwen2, ColQwen2Processor

model = ColQwen2.from_pretrained(
    "vidore/colqwen2-v1.0",
    torch_dtype=torch.bfloat16,
    device_map="cuda:0",
).eval()
processor = ColQwen2Processor.from_pretrained("vidore/colqwen2-v1.0")

# Sample data
images = [
    Image.new("RGB", (800, 600), color="white"),
    Image.new("RGB", (1024, 768), color="lightgray"),
]
queries = [
    "What is the revenue breakdown?",
    "Show organizational chart",
]

# Process inputs
batch_images = processor.process_images(images).to(model.device)
batch_queries = processor.process_queries(queries).to(model.device)

# Generate embeddings
with torch.no_grad():
    image_embeddings = model(**batch_images)
    query_embeddings = model(**batch_queries)

# Compute MaxSim scores (late interaction)
# Returns a (n_queries, n_documents) score matrix
scores = processor.score_multi_vector(query_embeddings, image_embeddings)

print(f"Scores shape: {scores.shape}")  # Output: torch.Size([2, 2])
print(f"Scores:\n{scores}")
# Higher scores indicate better query-document matches
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.