# Scirpy: Single-Cell Immune Receptor Analysis in Python Scirpy is a Python package for analyzing T cell receptor (TCR) and B cell receptor (BCR) repertoires from single-cell RNA sequencing data. It seamlessly integrates with the scverse ecosystem, particularly scanpy and MuData, providing comprehensive modules for data import, clonotype analysis, diversity metrics, and visualization. The package supports multiple input formats including 10x Genomics CellRanger, TraCeR, BraCeR, BD Rhapsody, and the AIRR rearrangement standard. Scirpy stores immune receptor data as awkward arrays in `adata.obsm["airr"]`, enabling lossless representation of AIRR rearrangement data while maintaining compatibility with AnnData's cell-centric structure. The package follows the scanpy API conventions with modules divided into preprocessing (`pp`), tools (`tl`), and plotting (`pl`), allowing users to build analysis pipelines for clonotype definition, clonal expansion analysis, repertoire comparison, and integration with gene expression data. ## Data Import and I/O Functions ### read_10x_vdj - Load 10x Genomics VDJ Data Reads immune receptor data from 10x Genomics CellRanger output files, supporting both CSV and JSON formats. Returns an AnnData object with AIRR data stored in obsm. ```python import scirpy as ir import scanpy as sc from mudata import MuData # Load TCR data from 10x CellRanger output adata_tcr = ir.io.read_10x_vdj("filtered_contig_annotations.csv") # Load gene expression data adata_gex = sc.read_10x_h5("filtered_feature_bc_matrix.h5") adata_gex.var_names_make_unique() # Combine into MuData object mdata = MuData({"gex": adata_gex, "airr": adata_tcr}) # Create chain indices and run quality control ir.pp.index_chains(mdata) ir.tl.chain_qc(mdata) print(f"Loaded {mdata['airr'].n_obs} cells with TCR data") # Output: Loaded 1523 cells with TCR data ``` ### read_airr - Load AIRR Rearrangement Format Reads data from AIRR-compliant rearrangement TSV files. Supports loading multiple files at once for datasets split by chain type. ```python import scirpy as ir # Load multiple AIRR rearrangement tables (e.g., separate TRA and TRB files) adata = ir.io.read_airr([ "immunesim_tra.tsv", "immunesim_trb.tsv", ]) # Process chain indices ir.pp.index_chains(adata) ir.tl.chain_qc(adata) print(f"Receptor types: {adata.obs['receptor_type'].value_counts().to_dict()}") # Output: Receptor types: {'TCR': 98, 'no IR': 2} ``` ### from_airr_cells - Create AnnData from Custom Data Converts a list of AirrCell objects to an AnnData object, useful for importing custom data formats not directly supported by scirpy. ```python import scirpy as ir import pandas as pd # Example: Convert a custom TCR table to scirpy format tcr_table = pd.DataFrame({ "cell_id": ["cell1", "cell2", "cell3"], "cdr3_alpha": ["CAVRDNDYKLSF", "CAENTGNQFYF", "CAVMDSNYQLIW"], "cdr3_beta": ["CASSLAPGATNEKLFF", "CASSLEETQYF", "CASSFSTCSANYGYTF"], "v_alpha": ["TRAV12-1", "TRAV8-3", "TRAV12-2"], "v_beta": ["TRBV6-5", "TRBV19", "TRBV7-9"], }) tcr_cells = [] for _, row in tcr_table.iterrows(): cell = ir.io.AirrCell(cell_id=row["cell_id"]) # Create alpha chain alpha_chain = ir.io.AirrCell.empty_chain_dict() alpha_chain.update({ "locus": "TRA", "junction_aa": row["cdr3_alpha"], "v_call": row["v_alpha"], "productive": True, }) # Create beta chain beta_chain = ir.io.AirrCell.empty_chain_dict() beta_chain.update({ "locus": "TRB", "junction_aa": row["cdr3_beta"], "v_call": row["v_beta"], "productive": True, }) cell.add_chain(alpha_chain) cell.add_chain(beta_chain) tcr_cells.append(cell) # Convert to AnnData adata_tcr = ir.io.from_airr_cells(tcr_cells) ir.pp.index_chains(adata_tcr) print(f"Created AnnData with {adata_tcr.n_obs} cells") # Output: Created AnnData with 3 cells ``` ### write_airr - Export to AIRR Format Exports immune receptor data from an AnnData object to the AIRR rearrangement TSV format. ```python import scirpy as ir # Load example data mdata = ir.datasets.wu2020_3k() # Export AIRR data to TSV format ir.io.write_airr(mdata, "exported_airr_data.tsv") # The exported file follows AIRR rearrangement schema # with columns: cell_id, locus, junction_aa, junction, v_call, d_call, j_call, etc. ``` ## Preprocessing Functions ### index_chains - Create Chain Indices Creates indices that map cells to their primary and secondary VJ/VDJ chains according to scirpy's receptor model. This is required before running most analysis functions. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() # Create chain indices (required for downstream analysis) ir.pp.index_chains(mdata) # Chain indices are stored in adata.obsm["chain_indices"] # Structure: {"VJ": [primary_idx, secondary_idx], "VDJ": [primary_idx, secondary_idx], "multichain": bool} ``` ### ir_dist - Compute Sequence Distances Computes pairwise distance matrices between CDR3 sequences. Supports multiple metrics including identity, Levenshtein, Hamming, alignment, and TCRdist. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Compute nucleotide sequence identity distances (for exact clonotype definition) ir.pp.ir_dist(mdata, metric="identity", sequence="nt") # Compute amino acid TCRdist distances with cutoff (for clonotype clustering) ir.pp.ir_dist( mdata, metric="tcrdist", sequence="aa", cutoff=15, # Maximum distance to consider sequences as similar ) # Distance matrices are stored in adata.uns # Keys follow pattern: "ir_dist_{sequence}_{metric}" ``` ### merge_airr - Merge Multiple AIRR Datasets Merges two AnnData objects with immune receptor information, useful for combining BCR and TCR data from the same cells. ```python import scirpy as ir # Example: Merge TCR and BCR data from the same experiment adata_tcr = ir.io.read_10x_vdj("tcr_contig_annotations.csv") adata_bcr = ir.io.read_10x_vdj("bcr_contig_annotations.csv") # Merge BCR data into TCR AnnData (cells are matched by barcode) ir.pp.merge_airr(adata_tcr, adata_bcr) ir.pp.index_chains(adata_tcr) ir.tl.chain_qc(adata_tcr) # Cells with both TCR and BCR will be flagged as "ambiguous" receptor type ``` ## Data Retrieval Functions ### get.airr - Retrieve AIRR Variables Retrieves AIRR rearrangement fields for each cell as a pandas Series or DataFrame, useful for accessing CDR3 sequences, V/J genes, and other chain-level data. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Get CDR3 amino acid sequence for primary VJ chain cdr3_vj = ir.get.airr(mdata, "junction_aa", "VJ_1") print(cdr3_vj.head()) # Output: # AAACCTGAGAGTGAGA-1 CAVRDSNYQLIW # AAACCTGAGGCATTGG-1 CAENTGNQFYF # ... # Get multiple fields for multiple chains as DataFrame df = ir.get.airr( mdata, airr_variable=["junction_aa", "v_call"], chain=["VJ_1", "VDJ_1"], ) print(df.columns.tolist()) # Output: ['VJ_1_junction_aa', 'VJ_1_v_call', 'VDJ_1_junction_aa', 'VDJ_1_v_call'] ``` ### get.airr_context - Temporarily Add AIRR Data to obs Context manager that temporarily adds AIRR information to adata.obs for use with plotting functions or other tools that require data in obs. ```python import scirpy as ir import scanpy as sc mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Temporarily add V gene information to obs for plotting with ir.get.airr_context(mdata, "v_call", chain=["VJ_1", "VDJ_1"]) as m: # V gene columns are now available in m.obs print(m.obs[["VJ_1_v_call", "VDJ_1_v_call"]].head()) # Can use with plotting functions # sc.pl.umap(m["gex"], color="VJ_1_v_call") # Columns are automatically removed after the context ``` ## Analysis Tools ### chain_qc - Receptor Chain Quality Control Performs quality control on immune receptor chains, identifying receptor types, subtypes, and chain pairing configurations. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Run chain quality control ir.tl.chain_qc(mdata) # Results are stored in obs: # - receptor_type: TCR, BCR, or ambiguous # - receptor_subtype: TRA+TRB, TRG+TRD, IGH+IGK, IGH+IGL, etc. # - chain_pairing: single pair, extra VJ, extra VDJ, two full chains, multichain, orphan VJ/VDJ print(mdata.obs["airr:chain_pairing"].value_counts()) # Output: # single pair 2156 # extra VJ 142 # extra VDJ 98 # two full chains 54 # multichain 23 # orphan VJ 15 # orphan VDJ 8 ``` ### define_clonotypes - Define Clonotypes by Sequence Identity Defines clonotypes based on identical CDR3 nucleotide sequences. Cells with the same clonotype share identical receptor sequences. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Compute nucleotide identity distances ir.pp.ir_dist(mdata, metric="identity", sequence="nt") # Define clonotypes based on nucleotide sequence identity ir.tl.define_clonotypes( mdata, receptor_arms="all", # Require both VJ and VDJ to match dual_ir="primary_only", # Only consider primary chains ) # Results stored in obs: clone_id, clone_id_size print(f"Number of unique clonotypes: {mdata.obs['airr:clone_id'].nunique()}") print(f"Largest clonotype size: {mdata.obs['airr:clone_id_size'].max()}") # Output: # Number of unique clonotypes: 2341 # Largest clonotype size: 45 ``` ### define_clonotype_clusters - Cluster Similar Receptors Clusters cells based on CDR3 amino acid sequence similarity using metrics like TCRdist. Creates broader clonotype clusters than strict sequence identity. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Compute TCRdist amino acid distances ir.pp.ir_dist(mdata, metric="tcrdist", sequence="aa", cutoff=15) # Define clonotype clusters based on sequence similarity ir.tl.define_clonotype_clusters( mdata, sequence="aa", metric="tcrdist", receptor_arms="all", dual_ir="any", # Match if any chain pair matches same_v_gene=False, # Don't require same V gene ) # Results stored in obs: cc_aa_tcrdist, cc_aa_tcrdist_size print(f"Number of clonotype clusters: {mdata.obs['airr:cc_aa_tcrdist'].nunique()}") # Output: Number of clonotype clusters: 1893 ``` ### clonal_expansion - Categorize Clonal Expansion Adds clonal expansion categories to each cell based on clonotype size, useful for identifying expanded vs singleton clonotypes. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.pp.ir_dist(mdata) ir.tl.define_clonotypes(mdata) # Categorize clonal expansion with custom breakpoints ir.tl.clonal_expansion( mdata, breakpoints=(1, 2, 5), # Categories: 1, 2, 3-5, >5 ) # Results stored in obs: clonal_expansion print(mdata.obs["airr:clonal_expansion"].value_counts()) # Output: # 1 1850 # 2 234 # 3-5 156 # >5 108 ``` ### alpha_diversity - Compute Clonotype Diversity Computes alpha diversity metrics for clonotype distributions within groups of cells. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.pp.ir_dist(mdata) ir.tl.define_clonotypes(mdata) # Compute Shannon entropy diversity per cluster diversity = ir.tl.alpha_diversity( mdata, groupby="gex:cluster", metric="normalized_shannon_entropy", inplace=False, ) print(diversity) # Output: Series with diversity values per cluster # CD8_Teff 0.76 # CD8_Trm 0.89 # CD4_Treg 0.92 # ... ``` ### repertoire_overlap - Compare Repertoire Similarity Computes clonotype overlap and similarity between groups of cells, returning distance matrices for clustering. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.pp.ir_dist(mdata) ir.tl.define_clonotypes(mdata) # Compute repertoire overlap between samples abundance_df, distance_matrix, linkage = ir.tl.repertoire_overlap( mdata, groupby="gex:sample", inplace=False, ) print("Clonotype abundance matrix shape:", abundance_df.shape) print("Distance matrix:\n", distance_matrix) # Output: Distance matrix showing Jaccard distances between samples ``` ### clonotype_network - Compute Network Layout Computes the layout for visualizing the clonotype network, where nodes represent cells with identical receptors. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.pp.ir_dist(mdata, metric="tcrdist", sequence="aa", cutoff=15) ir.tl.define_clonotype_clusters(mdata, sequence="aa", metric="tcrdist") # Compute network layout (only for cells with >=2 cells in clonotype) ir.tl.clonotype_network( mdata, min_cells=2, sequence="aa", metric="tcrdist", ) # Layout coordinates are stored in obsm for plotting ``` ### clonotype_modularity - Find Transcriptionally Related Clonotypes Identifies clonotypes consisting of cells that are transcriptionally more similar than expected by random chance. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.pp.ir_dist(mdata, metric="tcrdist", sequence="aa", cutoff=15) ir.tl.define_clonotype_clusters(mdata, sequence="aa", metric="tcrdist") # Compute clonotype modularity (requires gene expression neighbors graph) ir.tl.clonotype_modularity( mdata, target_col="airr:cc_aa_tcrdist", ) # Results stored in obs: clonotype_modularity, clonotype_modularity_fdr # High modularity = clonotype cells are transcriptionally similar top_modular = mdata.obs.nlargest(5, "airr:clonotype_modularity")[ ["airr:cc_aa_tcrdist", "airr:clonotype_modularity", "airr:clonotype_modularity_fdr"] ] print(top_modular) ``` ### ir_query - Query Reference Databases Queries a reference database (e.g., VDJdb, IEDB) for matching immune receptors based on sequence similarity. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Load reference database vdjdb = ir.datasets.vdjdb() # Compute distances to reference ir.pp.ir_dist(mdata, vdjdb, metric="identity", sequence="aa") # Query the database ir.tl.ir_query( mdata, vdjdb, metric="identity", sequence="aa", receptor_arms="any", # Match if any chain matches dual_ir="any", ) # Annotate cells with matching epitope information ir.tl.ir_query_annotate( mdata, vdjdb, metric="identity", sequence="aa", include_ref_cols=["antigen.species", "antigen.gene"], strategy="most-frequent", # Use most frequent annotation if multiple matches ) print(mdata.obs["airr:antigen.species"].value_counts().head()) # Output: # CMV 45 # InfluenzaA 23 # EBV 18 # SARS-CoV-2 12 # ... ``` ### ir_query_annotate_df - Get All Database Matches Returns a DataFrame with all matching entries between cells and the reference database, useful for detailed analysis of matches. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) vdjdb = ir.datasets.vdjdb() ir.pp.ir_dist(mdata, vdjdb, metric="identity", sequence="aa") ir.tl.ir_query(mdata, vdjdb, metric="identity", sequence="aa") # Get all matches as DataFrame (cells with multiple matches have multiple rows) matches_df = ir.tl.ir_query_annotate_df( mdata, vdjdb, metric="identity", sequence="aa", include_ref_cols=["antigen.species", "antigen.gene", "antigen.epitope"], ) print(f"Total matches: {len(matches_df)}") print(matches_df[["antigen.species", "antigen.gene", "antigen.epitope"]].head()) # Output: DataFrame with all cell-reference matches and their annotations ``` ## Plotting Functions ### pl.clonal_expansion - Visualize Expansion Creates bar plots showing the distribution of clonal expansion categories across groups. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.pp.ir_dist(mdata) ir.tl.define_clonotypes(mdata) # Plot clonal expansion by cluster (normalized) ir.pl.clonal_expansion( mdata, groupby="gex:cluster", target_col="clone_id", breakpoints=(1, 2, 5), normalize=True, ) # Plot absolute counts ir.pl.clonal_expansion( mdata, groupby="gex:cluster", target_col="clone_id", normalize=False, ) ``` ### pl.group_abundance - Plot Category Distributions Creates bar plots showing the distribution of one categorical variable across another, useful for showing clonotype distributions. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.tl.chain_qc(mdata) ir.pp.ir_dist(mdata) ir.tl.define_clonotypes(mdata) # Plot chain pairing configuration by sample ir.pl.group_abundance( mdata, groupby="airr:chain_pairing", target_col="gex:source", ) # Plot top 10 largest clonotypes by cluster ir.pl.group_abundance( mdata, groupby="airr:clone_id", target_col="gex:cluster", max_cols=10, normalize="gex:sample", # Normalize by sample to reduce bias ) ``` ### pl.alpha_diversity - Plot Diversity Metrics Visualizes alpha diversity metrics across groups as a box or bar plot. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.pp.ir_dist(mdata) ir.tl.define_clonotypes(mdata) # Plot Shannon entropy diversity by cluster ir.pl.alpha_diversity( mdata, groupby="gex:cluster", metric="normalized_shannon_entropy", ) ``` ### pl.clonotype_network - Visualize Clonotype Relationships Plots the clonotype network where nodes represent cells with identical receptors and edges connect similar receptors. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.pp.ir_dist(mdata, metric="tcrdist", sequence="aa", cutoff=15) ir.tl.define_clonotype_clusters(mdata, sequence="aa", metric="tcrdist") ir.tl.clonotype_network(mdata, min_cells=2, sequence="aa", metric="tcrdist") # Plot network colored by patient (categorical shown as pie charts) ir.pl.clonotype_network( mdata, color="gex:patient", base_size=20, panel_size=(7, 7), label_fontsize=9, ) # Plot network colored by clonotype modularity (continuous) ir.pl.clonotype_network( mdata, color="clonotype_modularity", base_size=20, ) ``` ### pl.repertoire_overlap - Compare Repertoires Visualizes repertoire overlap between samples as a heatmap or scatter plot. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) ir.pp.ir_dist(mdata) ir.tl.define_clonotypes(mdata) # Plot repertoire overlap heatmap with hierarchical clustering ir.pl.repertoire_overlap( mdata, groupby="gex:sample", heatmap_cats=["gex:patient", "gex:source"], xticklabels=True, yticklabels=True, ) # Compare specific pair of samples on scatter plot ir.pl.repertoire_overlap( mdata, groupby="gex:sample", pair_to_plot=["LN2", "LT2"], ) ``` ### pl.vdj_usage - Visualize V(D)J Gene Usage Creates Sankey/ribbon plots showing the combinations of V, D, and J genes in the dataset. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Plot most common VDJ combinations ir.pl.vdj_usage( mdata, full_combination=False, max_segments=None, max_ribbons=30, fig_kws={"figsize": (8, 5)}, ) # Plot VDJ usage for specific clonotypes ir.pl.vdj_usage( mdata[mdata.obs["airr:clone_id"].isin(["68", "101", "127"])], max_ribbons=None, max_segments=100, ) ``` ### pl.spectratype - CDR3 Length Distribution Plots the distribution of CDR3 region lengths, optionally grouped by categories. ```python import scirpy as ir mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Bar plot of CDR3 length distribution by cluster ir.pl.spectratype( mdata, color="gex:cluster", viztype="bar", chain="VDJ_1", ) # Ridge plot (KDE curves) ir.pl.spectratype( mdata, color="gex:cluster", viztype="curve", curve_layout="shifted", kde_kws={"kde_norm": False}, ) ``` ### pl.logoplot_cdr3_motif - Sequence Logo Plots Generates sequence logo plots showing amino acid composition patterns in CDR3 sequences. ```python import scirpy as ir import matplotlib.pyplot as plt mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # Filter for sequences of specific length (logos require same length) mask = ( (ir.get.airr(mdata, "junction_aa", "VDJ_1").str.len() == 14) & (mdata.obs["gex:cluster"] == "CD8_Teff") ) # Create logo plot for VDJ chain fig, ax = plt.subplots(figsize=(10, 2)) ir.pl.logoplot_cdr3_motif( mdata[mask], chains="VDJ_1", to_type="information", # Information content or probability ax=ax, ) plt.title("CDR3 motif - CD8 Effector T cells (length 14)") plt.show() ``` ## Example Datasets ### datasets.wu2020_3k - TCR Example Dataset Loads a downsampled dataset of 3000 T cells from cancer patients with paired TCR and gene expression data. ```python import scirpy as ir # Load example dataset (downloads on first use, cached afterwards) mdata = ir.datasets.wu2020_3k() print(f"Modalities: {list(mdata.mod.keys())}") print(f"Gene expression: {mdata['gex'].shape}") print(f"AIRR data: {mdata['airr'].shape}") # Output: # Modalities: ['gex', 'airr'] # Gene expression: (3000, 36601) # AIRR data: (3000, 0) ``` ### datasets.vdjdb - VDJdb Reference Database Downloads and processes the VDJdb database of TCR sequences with known antigen specificities. ```python import scirpy as ir # Download VDJdb (cached after first download) vdjdb = ir.datasets.vdjdb(cached=True, cache_path="data/vdjdb.h5ad") print(f"VDJdb entries: {vdjdb.n_obs}") print(f"Metadata fields: {list(vdjdb.obs.columns)}") # Contains: antigen.species, antigen.gene, antigen.epitope, mhc.class, etc. ``` ### datasets.iedb - IEDB Reference Database Downloads and processes the Immune Epitope Database (IEDB) for TCR/BCR epitope annotation. ```python import scirpy as ir # Download IEDB (cached after first download) iedb = ir.datasets.iedb(cached=True, cache_path="data/iedb.h5ad") print(f"IEDB entries: {iedb.n_obs}") # Can be used with ir.tl.ir_query() for epitope annotation ``` ## Utility Functions ### util.DataHandler - Unified Data Access Provides transparent access to AIRR modality in both AnnData and MuData objects, handling the complexity of multimodal data access. ```python import scirpy as ir from scirpy.util import DataHandler mdata = ir.datasets.wu2020_3k() ir.pp.index_chains(mdata) # DataHandler abstracts whether input is AnnData or MuData params = DataHandler(mdata, airr_mod="airr", airr_key="airr") # Access AIRR data uniformly print(f"Data type: {type(params.data)}") print(f"AIRR array shape: {len(params.airr)}") print(f"Chain indices available: {params.chain_indices is not None}") ``` ## Summary Scirpy provides a comprehensive toolkit for single-cell immune receptor analysis that integrates seamlessly with the scverse ecosystem. The primary use cases include: (1) importing and preprocessing TCR/BCR data from various sequencing platforms, (2) defining clonotypes based on sequence identity or similarity, (3) analyzing clonal expansion and repertoire diversity across conditions or cell types, (4) querying reference databases like VDJdb to annotate receptor specificities, and (5) integrating immune receptor analysis with gene expression data to identify transcriptionally related clonotypes. The package follows a consistent workflow pattern: load data with `ir.io.read_*` functions, create chain indices with `ir.pp.index_chains`, compute distances with `ir.pp.ir_dist`, define clonotypes with `ir.tl.define_clonotypes` or `ir.tl.define_clonotype_clusters`, and visualize results with `ir.pl.*` functions. For multimodal analysis, MuData containers allow seamless integration of gene expression and AIRR data, enabling analyses that leverage both transcriptomics and receptor sequence information. The AIRR data structure based on awkward arrays ensures lossless representation while maintaining compatibility with the AnnData/scanpy ecosystem.