# Scirpy: Single-Cell Immune Receptor Analysis in Python

Scirpy is a Python package for analyzing T cell receptor (TCR) and B cell receptor (BCR) repertoires from single-cell RNA sequencing data. It seamlessly integrates with the scverse ecosystem, particularly scanpy and MuData, providing comprehensive modules for data import, clonotype analysis, diversity metrics, and visualization. The package supports multiple input formats including 10x Genomics CellRanger, TraCeR, BraCeR, BD Rhapsody, and the AIRR rearrangement standard.

Scirpy stores immune receptor data as awkward arrays in `adata.obsm["airr"]`, enabling lossless representation of AIRR rearrangement data while maintaining compatibility with AnnData's cell-centric structure. The package follows the scanpy API conventions with modules divided into preprocessing (`pp`), tools (`tl`), and plotting (`pl`), allowing users to build analysis pipelines for clonotype definition, clonal expansion analysis, repertoire comparison, and integration with gene expression data.

## Data Import and I/O Functions

### read_10x_vdj - Load 10x Genomics VDJ Data

Reads immune receptor data from 10x Genomics CellRanger output files, supporting both CSV and JSON formats. Returns an AnnData object with AIRR data stored in obsm.

```python
import scirpy as ir
import scanpy as sc
from mudata import MuData

# Load TCR data from 10x CellRanger output
adata_tcr = ir.io.read_10x_vdj("filtered_contig_annotations.csv")

# Load gene expression data
adata_gex = sc.read_10x_h5("filtered_feature_bc_matrix.h5")
adata_gex.var_names_make_unique()

# Combine into MuData object
mdata = MuData({"gex": adata_gex, "airr": adata_tcr})

# Create chain indices and run quality control
ir.pp.index_chains(mdata)
ir.tl.chain_qc(mdata)

print(f"Loaded {mdata['airr'].n_obs} cells with TCR data")
# Output: Loaded 1523 cells with TCR data
```

### read_airr - Load AIRR Rearrangement Format

Reads data from AIRR-compliant rearrangement TSV files. Supports loading multiple files at once for datasets split by chain type.

```python
import scirpy as ir

# Load multiple AIRR rearrangement tables (e.g., separate TRA and TRB files)
adata = ir.io.read_airr([
    "immunesim_tra.tsv",
    "immunesim_trb.tsv",
])

# Process chain indices
ir.pp.index_chains(adata)
ir.tl.chain_qc(adata)

print(f"Receptor types: {adata.obs['receptor_type'].value_counts().to_dict()}")
# Output: Receptor types: {'TCR': 98, 'no IR': 2}
```

### from_airr_cells - Create AnnData from Custom Data

Converts a list of AirrCell objects to an AnnData object, useful for importing custom data formats not directly supported by scirpy.

```python
import scirpy as ir
import pandas as pd

# Example: Convert a custom TCR table to scirpy format
tcr_table = pd.DataFrame({
    "cell_id": ["cell1", "cell2", "cell3"],
    "cdr3_alpha": ["CAVRDNDYKLSF", "CAENTGNQFYF", "CAVMDSNYQLIW"],
    "cdr3_beta": ["CASSLAPGATNEKLFF", "CASSLEETQYF", "CASSFSTCSANYGYTF"],
    "v_alpha": ["TRAV12-1", "TRAV8-3", "TRAV12-2"],
    "v_beta": ["TRBV6-5", "TRBV19", "TRBV7-9"],
})

tcr_cells = []
for _, row in tcr_table.iterrows():
    cell = ir.io.AirrCell(cell_id=row["cell_id"])

    # Create alpha chain
    alpha_chain = ir.io.AirrCell.empty_chain_dict()
    alpha_chain.update({
        "locus": "TRA",
        "junction_aa": row["cdr3_alpha"],
        "v_call": row["v_alpha"],
        "productive": True,
    })

    # Create beta chain
    beta_chain = ir.io.AirrCell.empty_chain_dict()
    beta_chain.update({
        "locus": "TRB",
        "junction_aa": row["cdr3_beta"],
        "v_call": row["v_beta"],
        "productive": True,
    })

    cell.add_chain(alpha_chain)
    cell.add_chain(beta_chain)
    tcr_cells.append(cell)

# Convert to AnnData
adata_tcr = ir.io.from_airr_cells(tcr_cells)
ir.pp.index_chains(adata_tcr)

print(f"Created AnnData with {adata_tcr.n_obs} cells")
# Output: Created AnnData with 3 cells
```

### write_airr - Export to AIRR Format

Exports immune receptor data from an AnnData object to the AIRR rearrangement TSV format.

```python
import scirpy as ir

# Load example data
mdata = ir.datasets.wu2020_3k()

# Export AIRR data to TSV format
ir.io.write_airr(mdata, "exported_airr_data.tsv")

# The exported file follows AIRR rearrangement schema
# with columns: cell_id, locus, junction_aa, junction, v_call, d_call, j_call, etc.
```

## Preprocessing Functions

### index_chains - Create Chain Indices

Creates indices that map cells to their primary and secondary VJ/VDJ chains according to scirpy's receptor model. This is required before running most analysis functions.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()

# Create chain indices (required for downstream analysis)
ir.pp.index_chains(mdata)

# Chain indices are stored in adata.obsm["chain_indices"]
# Structure: {"VJ": [primary_idx, secondary_idx], "VDJ": [primary_idx, secondary_idx], "multichain": bool}
```

### ir_dist - Compute Sequence Distances

Computes pairwise distance matrices between CDR3 sequences. Supports multiple metrics including identity, Levenshtein, Hamming, alignment, and TCRdist.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Compute nucleotide sequence identity distances (for exact clonotype definition)
ir.pp.ir_dist(mdata, metric="identity", sequence="nt")

# Compute amino acid TCRdist distances with cutoff (for clonotype clustering)
ir.pp.ir_dist(
    mdata,
    metric="tcrdist",
    sequence="aa",
    cutoff=15,  # Maximum distance to consider sequences as similar
)

# Distance matrices are stored in adata.uns
# Keys follow pattern: "ir_dist_{sequence}_{metric}"
```

### merge_airr - Merge Multiple AIRR Datasets

Merges two AnnData objects with immune receptor information, useful for combining BCR and TCR data from the same cells.

```python
import scirpy as ir

# Example: Merge TCR and BCR data from the same experiment
adata_tcr = ir.io.read_10x_vdj("tcr_contig_annotations.csv")
adata_bcr = ir.io.read_10x_vdj("bcr_contig_annotations.csv")

# Merge BCR data into TCR AnnData (cells are matched by barcode)
ir.pp.merge_airr(adata_tcr, adata_bcr)

ir.pp.index_chains(adata_tcr)
ir.tl.chain_qc(adata_tcr)
# Cells with both TCR and BCR will be flagged as "ambiguous" receptor type
```

## Data Retrieval Functions

### get.airr - Retrieve AIRR Variables

Retrieves AIRR rearrangement fields for each cell as a pandas Series or DataFrame, useful for accessing CDR3 sequences, V/J genes, and other chain-level data.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Get CDR3 amino acid sequence for primary VJ chain
cdr3_vj = ir.get.airr(mdata, "junction_aa", "VJ_1")
print(cdr3_vj.head())
# Output:
# AAACCTGAGAGTGAGA-1    CAVRDSNYQLIW
# AAACCTGAGGCATTGG-1    CAENTGNQFYF
# ...

# Get multiple fields for multiple chains as DataFrame
df = ir.get.airr(
    mdata,
    airr_variable=["junction_aa", "v_call"],
    chain=["VJ_1", "VDJ_1"],
)
print(df.columns.tolist())
# Output: ['VJ_1_junction_aa', 'VJ_1_v_call', 'VDJ_1_junction_aa', 'VDJ_1_v_call']
```

### get.airr_context - Temporarily Add AIRR Data to obs

Context manager that temporarily adds AIRR information to adata.obs for use with plotting functions or other tools that require data in obs.

```python
import scirpy as ir
import scanpy as sc

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Temporarily add V gene information to obs for plotting
with ir.get.airr_context(mdata, "v_call", chain=["VJ_1", "VDJ_1"]) as m:
    # V gene columns are now available in m.obs
    print(m.obs[["VJ_1_v_call", "VDJ_1_v_call"]].head())

    # Can use with plotting functions
    # sc.pl.umap(m["gex"], color="VJ_1_v_call")

# Columns are automatically removed after the context
```

## Analysis Tools

### chain_qc - Receptor Chain Quality Control

Performs quality control on immune receptor chains, identifying receptor types, subtypes, and chain pairing configurations.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Run chain quality control
ir.tl.chain_qc(mdata)

# Results are stored in obs:
# - receptor_type: TCR, BCR, or ambiguous
# - receptor_subtype: TRA+TRB, TRG+TRD, IGH+IGK, IGH+IGL, etc.
# - chain_pairing: single pair, extra VJ, extra VDJ, two full chains, multichain, orphan VJ/VDJ

print(mdata.obs["airr:chain_pairing"].value_counts())
# Output:
# single pair        2156
# extra VJ            142
# extra VDJ            98
# two full chains      54
# multichain           23
# orphan VJ            15
# orphan VDJ            8
```

### define_clonotypes - Define Clonotypes by Sequence Identity

Defines clonotypes based on identical CDR3 nucleotide sequences. Cells with the same clonotype share identical receptor sequences.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Compute nucleotide identity distances
ir.pp.ir_dist(mdata, metric="identity", sequence="nt")

# Define clonotypes based on nucleotide sequence identity
ir.tl.define_clonotypes(
    mdata,
    receptor_arms="all",      # Require both VJ and VDJ to match
    dual_ir="primary_only",   # Only consider primary chains
)

# Results stored in obs: clone_id, clone_id_size
print(f"Number of unique clonotypes: {mdata.obs['airr:clone_id'].nunique()}")
print(f"Largest clonotype size: {mdata.obs['airr:clone_id_size'].max()}")
# Output:
# Number of unique clonotypes: 2341
# Largest clonotype size: 45
```

### define_clonotype_clusters - Cluster Similar Receptors

Clusters cells based on CDR3 amino acid sequence similarity using metrics like TCRdist. Creates broader clonotype clusters than strict sequence identity.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Compute TCRdist amino acid distances
ir.pp.ir_dist(mdata, metric="tcrdist", sequence="aa", cutoff=15)

# Define clonotype clusters based on sequence similarity
ir.tl.define_clonotype_clusters(
    mdata,
    sequence="aa",
    metric="tcrdist",
    receptor_arms="all",
    dual_ir="any",           # Match if any chain pair matches
    same_v_gene=False,       # Don't require same V gene
)

# Results stored in obs: cc_aa_tcrdist, cc_aa_tcrdist_size
print(f"Number of clonotype clusters: {mdata.obs['airr:cc_aa_tcrdist'].nunique()}")
# Output: Number of clonotype clusters: 1893
```

### clonal_expansion - Categorize Clonal Expansion

Adds clonal expansion categories to each cell based on clonotype size, useful for identifying expanded vs singleton clonotypes.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.pp.ir_dist(mdata)
ir.tl.define_clonotypes(mdata)

# Categorize clonal expansion with custom breakpoints
ir.tl.clonal_expansion(
    mdata,
    breakpoints=(1, 2, 5),  # Categories: 1, 2, 3-5, >5
)

# Results stored in obs: clonal_expansion
print(mdata.obs["airr:clonal_expansion"].value_counts())
# Output:
# 1           1850
# 2            234
# 3-5          156
# >5           108
```

### alpha_diversity - Compute Clonotype Diversity

Computes alpha diversity metrics for clonotype distributions within groups of cells.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.pp.ir_dist(mdata)
ir.tl.define_clonotypes(mdata)

# Compute Shannon entropy diversity per cluster
diversity = ir.tl.alpha_diversity(
    mdata,
    groupby="gex:cluster",
    metric="normalized_shannon_entropy",
    inplace=False,
)

print(diversity)
# Output: Series with diversity values per cluster
# CD8_Teff     0.76
# CD8_Trm      0.89
# CD4_Treg     0.92
# ...
```

### repertoire_overlap - Compare Repertoire Similarity

Computes clonotype overlap and similarity between groups of cells, returning distance matrices for clustering.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.pp.ir_dist(mdata)
ir.tl.define_clonotypes(mdata)

# Compute repertoire overlap between samples
abundance_df, distance_matrix, linkage = ir.tl.repertoire_overlap(
    mdata,
    groupby="gex:sample",
    inplace=False,
)

print("Clonotype abundance matrix shape:", abundance_df.shape)
print("Distance matrix:\n", distance_matrix)
# Output: Distance matrix showing Jaccard distances between samples
```

### clonotype_network - Compute Network Layout

Computes the layout for visualizing the clonotype network, where nodes represent cells with identical receptors.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.pp.ir_dist(mdata, metric="tcrdist", sequence="aa", cutoff=15)
ir.tl.define_clonotype_clusters(mdata, sequence="aa", metric="tcrdist")

# Compute network layout (only for cells with >=2 cells in clonotype)
ir.tl.clonotype_network(
    mdata,
    min_cells=2,
    sequence="aa",
    metric="tcrdist",
)

# Layout coordinates are stored in obsm for plotting
```

### clonotype_modularity - Find Transcriptionally Related Clonotypes

Identifies clonotypes consisting of cells that are transcriptionally more similar than expected by random chance.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.pp.ir_dist(mdata, metric="tcrdist", sequence="aa", cutoff=15)
ir.tl.define_clonotype_clusters(mdata, sequence="aa", metric="tcrdist")

# Compute clonotype modularity (requires gene expression neighbors graph)
ir.tl.clonotype_modularity(
    mdata,
    target_col="airr:cc_aa_tcrdist",
)

# Results stored in obs: clonotype_modularity, clonotype_modularity_fdr
# High modularity = clonotype cells are transcriptionally similar
top_modular = mdata.obs.nlargest(5, "airr:clonotype_modularity")[
    ["airr:cc_aa_tcrdist", "airr:clonotype_modularity", "airr:clonotype_modularity_fdr"]
]
print(top_modular)
```

### ir_query - Query Reference Databases

Queries a reference database (e.g., VDJdb, IEDB) for matching immune receptors based on sequence similarity.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Load reference database
vdjdb = ir.datasets.vdjdb()

# Compute distances to reference
ir.pp.ir_dist(mdata, vdjdb, metric="identity", sequence="aa")

# Query the database
ir.tl.ir_query(
    mdata,
    vdjdb,
    metric="identity",
    sequence="aa",
    receptor_arms="any",  # Match if any chain matches
    dual_ir="any",
)

# Annotate cells with matching epitope information
ir.tl.ir_query_annotate(
    mdata,
    vdjdb,
    metric="identity",
    sequence="aa",
    include_ref_cols=["antigen.species", "antigen.gene"],
    strategy="most-frequent",  # Use most frequent annotation if multiple matches
)

print(mdata.obs["airr:antigen.species"].value_counts().head())
# Output:
# CMV              45
# InfluenzaA       23
# EBV              18
# SARS-CoV-2       12
# ...
```

### ir_query_annotate_df - Get All Database Matches

Returns a DataFrame with all matching entries between cells and the reference database, useful for detailed analysis of matches.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

vdjdb = ir.datasets.vdjdb()
ir.pp.ir_dist(mdata, vdjdb, metric="identity", sequence="aa")
ir.tl.ir_query(mdata, vdjdb, metric="identity", sequence="aa")

# Get all matches as DataFrame (cells with multiple matches have multiple rows)
matches_df = ir.tl.ir_query_annotate_df(
    mdata,
    vdjdb,
    metric="identity",
    sequence="aa",
    include_ref_cols=["antigen.species", "antigen.gene", "antigen.epitope"],
)

print(f"Total matches: {len(matches_df)}")
print(matches_df[["antigen.species", "antigen.gene", "antigen.epitope"]].head())
# Output: DataFrame with all cell-reference matches and their annotations
```

## Plotting Functions

### pl.clonal_expansion - Visualize Expansion

Creates bar plots showing the distribution of clonal expansion categories across groups.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.pp.ir_dist(mdata)
ir.tl.define_clonotypes(mdata)

# Plot clonal expansion by cluster (normalized)
ir.pl.clonal_expansion(
    mdata,
    groupby="gex:cluster",
    target_col="clone_id",
    breakpoints=(1, 2, 5),
    normalize=True,
)

# Plot absolute counts
ir.pl.clonal_expansion(
    mdata,
    groupby="gex:cluster",
    target_col="clone_id",
    normalize=False,
)
```

### pl.group_abundance - Plot Category Distributions

Creates bar plots showing the distribution of one categorical variable across another, useful for showing clonotype distributions.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.tl.chain_qc(mdata)
ir.pp.ir_dist(mdata)
ir.tl.define_clonotypes(mdata)

# Plot chain pairing configuration by sample
ir.pl.group_abundance(
    mdata,
    groupby="airr:chain_pairing",
    target_col="gex:source",
)

# Plot top 10 largest clonotypes by cluster
ir.pl.group_abundance(
    mdata,
    groupby="airr:clone_id",
    target_col="gex:cluster",
    max_cols=10,
    normalize="gex:sample",  # Normalize by sample to reduce bias
)
```

### pl.alpha_diversity - Plot Diversity Metrics

Visualizes alpha diversity metrics across groups as a box or bar plot.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.pp.ir_dist(mdata)
ir.tl.define_clonotypes(mdata)

# Plot Shannon entropy diversity by cluster
ir.pl.alpha_diversity(
    mdata,
    groupby="gex:cluster",
    metric="normalized_shannon_entropy",
)
```

### pl.clonotype_network - Visualize Clonotype Relationships

Plots the clonotype network where nodes represent cells with identical receptors and edges connect similar receptors.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.pp.ir_dist(mdata, metric="tcrdist", sequence="aa", cutoff=15)
ir.tl.define_clonotype_clusters(mdata, sequence="aa", metric="tcrdist")
ir.tl.clonotype_network(mdata, min_cells=2, sequence="aa", metric="tcrdist")

# Plot network colored by patient (categorical shown as pie charts)
ir.pl.clonotype_network(
    mdata,
    color="gex:patient",
    base_size=20,
    panel_size=(7, 7),
    label_fontsize=9,
)

# Plot network colored by clonotype modularity (continuous)
ir.pl.clonotype_network(
    mdata,
    color="clonotype_modularity",
    base_size=20,
)
```

### pl.repertoire_overlap - Compare Repertoires

Visualizes repertoire overlap between samples as a heatmap or scatter plot.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)
ir.pp.ir_dist(mdata)
ir.tl.define_clonotypes(mdata)

# Plot repertoire overlap heatmap with hierarchical clustering
ir.pl.repertoire_overlap(
    mdata,
    groupby="gex:sample",
    heatmap_cats=["gex:patient", "gex:source"],
    xticklabels=True,
    yticklabels=True,
)

# Compare specific pair of samples on scatter plot
ir.pl.repertoire_overlap(
    mdata,
    groupby="gex:sample",
    pair_to_plot=["LN2", "LT2"],
)
```

### pl.vdj_usage - Visualize V(D)J Gene Usage

Creates Sankey/ribbon plots showing the combinations of V, D, and J genes in the dataset.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Plot most common VDJ combinations
ir.pl.vdj_usage(
    mdata,
    full_combination=False,
    max_segments=None,
    max_ribbons=30,
    fig_kws={"figsize": (8, 5)},
)

# Plot VDJ usage for specific clonotypes
ir.pl.vdj_usage(
    mdata[mdata.obs["airr:clone_id"].isin(["68", "101", "127"])],
    max_ribbons=None,
    max_segments=100,
)
```

### pl.spectratype - CDR3 Length Distribution

Plots the distribution of CDR3 region lengths, optionally grouped by categories.

```python
import scirpy as ir

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Bar plot of CDR3 length distribution by cluster
ir.pl.spectratype(
    mdata,
    color="gex:cluster",
    viztype="bar",
    chain="VDJ_1",
)

# Ridge plot (KDE curves)
ir.pl.spectratype(
    mdata,
    color="gex:cluster",
    viztype="curve",
    curve_layout="shifted",
    kde_kws={"kde_norm": False},
)
```

### pl.logoplot_cdr3_motif - Sequence Logo Plots

Generates sequence logo plots showing amino acid composition patterns in CDR3 sequences.

```python
import scirpy as ir
import matplotlib.pyplot as plt

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# Filter for sequences of specific length (logos require same length)
mask = (
    (ir.get.airr(mdata, "junction_aa", "VDJ_1").str.len() == 14)
    & (mdata.obs["gex:cluster"] == "CD8_Teff")
)

# Create logo plot for VDJ chain
fig, ax = plt.subplots(figsize=(10, 2))
ir.pl.logoplot_cdr3_motif(
    mdata[mask],
    chains="VDJ_1",
    to_type="information",  # Information content or probability
    ax=ax,
)
plt.title("CDR3 motif - CD8 Effector T cells (length 14)")
plt.show()
```

## Example Datasets

### datasets.wu2020_3k - TCR Example Dataset

Loads a downsampled dataset of 3000 T cells from cancer patients with paired TCR and gene expression data.

```python
import scirpy as ir

# Load example dataset (downloads on first use, cached afterwards)
mdata = ir.datasets.wu2020_3k()

print(f"Modalities: {list(mdata.mod.keys())}")
print(f"Gene expression: {mdata['gex'].shape}")
print(f"AIRR data: {mdata['airr'].shape}")
# Output:
# Modalities: ['gex', 'airr']
# Gene expression: (3000, 36601)
# AIRR data: (3000, 0)
```

### datasets.vdjdb - VDJdb Reference Database

Downloads and processes the VDJdb database of TCR sequences with known antigen specificities.

```python
import scirpy as ir

# Download VDJdb (cached after first download)
vdjdb = ir.datasets.vdjdb(cached=True, cache_path="data/vdjdb.h5ad")

print(f"VDJdb entries: {vdjdb.n_obs}")
print(f"Metadata fields: {list(vdjdb.obs.columns)}")
# Contains: antigen.species, antigen.gene, antigen.epitope, mhc.class, etc.
```

### datasets.iedb - IEDB Reference Database

Downloads and processes the Immune Epitope Database (IEDB) for TCR/BCR epitope annotation.

```python
import scirpy as ir

# Download IEDB (cached after first download)
iedb = ir.datasets.iedb(cached=True, cache_path="data/iedb.h5ad")

print(f"IEDB entries: {iedb.n_obs}")
# Can be used with ir.tl.ir_query() for epitope annotation
```

## Utility Functions

### util.DataHandler - Unified Data Access

Provides transparent access to AIRR modality in both AnnData and MuData objects, handling the complexity of multimodal data access.

```python
import scirpy as ir
from scirpy.util import DataHandler

mdata = ir.datasets.wu2020_3k()
ir.pp.index_chains(mdata)

# DataHandler abstracts whether input is AnnData or MuData
params = DataHandler(mdata, airr_mod="airr", airr_key="airr")

# Access AIRR data uniformly
print(f"Data type: {type(params.data)}")
print(f"AIRR array shape: {len(params.airr)}")
print(f"Chain indices available: {params.chain_indices is not None}")
```

## Summary

Scirpy provides a comprehensive toolkit for single-cell immune receptor analysis that integrates seamlessly with the scverse ecosystem. The primary use cases include: (1) importing and preprocessing TCR/BCR data from various sequencing platforms, (2) defining clonotypes based on sequence identity or similarity, (3) analyzing clonal expansion and repertoire diversity across conditions or cell types, (4) querying reference databases like VDJdb to annotate receptor specificities, and (5) integrating immune receptor analysis with gene expression data to identify transcriptionally related clonotypes.

The package follows a consistent workflow pattern: load data with `ir.io.read_*` functions, create chain indices with `ir.pp.index_chains`, compute distances with `ir.pp.ir_dist`, define clonotypes with `ir.tl.define_clonotypes` or `ir.tl.define_clonotype_clusters`, and visualize results with `ir.pl.*` functions. For multimodal analysis, MuData containers allow seamless integration of gene expression and AIRR data, enabling analyses that leverage both transcriptomics and receptor sequence information. The AIRR data structure based on awkward arrays ensures lossless representation while maintaining compatibility with the AnnData/scanpy ecosystem.