### Install and Run Pre-commit Hooks (Bash)

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/contributing.md

Installs the pre-commit framework and sets up the git pre-commit hooks for the project. It also provides a command to manually run all pre-commit checks on all files in the repository.

```bash
pip install pre-commit
pre-commit install

pre-commit run --all-files
```

--------------------------------

### Perform GPU-accelerated Preprocessing

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/usage_principles.md

Example of running highly variable gene selection, regression, and scaling on GPU-backed AnnData objects.

```python
rsc.pp.highly_variable_genes(adata, n_top_genes=5000, flavor="seurat_v3", batch_key= "PatientNumber", layer = "counts")
adata = adata[:,adata.var["highly_variable"]==True]
rsc.pp.regress_out(adata,keys=["n_counts", "percent_MT"])
rsc.pp.scale(adata,max_value=10)
```

--------------------------------

### Install rapids-singlecell with pip

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Installs the rapids-singlecell library using pip. Prebuilt wheels are available for specific CUDA versions. Additional RAPIDS dependencies can be installed using the '[rapids]' extra.

```bash
# For CUDA 13
pip install rapids-singlecell-cu13

# For CUDA 12
pip install rapids-singlecell-cu12

# With RAPIDS dependencies
pip install 'rapids-singlecell-cu13[rapids]' --extra-index-url=https://pypi.nvidia.com
```

--------------------------------

### Importing CUDA Modules with Lazy Loading

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/contributing.md

This Python example illustrates how to import CUDA modules from the `_cuda` package. The `_cuda` package handles lazy loading, returning `None` if the compiled extension is unavailable, thus avoiding `ImportError`.

```python
from rapids_singlecell._cuda import _my_module_cuda as _my

def my_function(adata):
    # _my is either the real module or None
    _my.kernel(...)
```

--------------------------------

### Start Dask CUDA Cluster (Capacity/Robustness Preset)

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/out_of_core.md

Initializes a Dask CUDA cluster using the Capacity/Robustness preset, prioritizing VRAM stretching and robustness over raw P2P speed. It uses TCP for transport and enables RMM managed memory, which allows for oversubscription and paging, reducing the likelihood of Out-of-Memory errors.

```python
from dask.distributed import Client
from dask_cuda import LocalCUDACluster

cluster = LocalCUDACluster(
    CUDA_VISIBLE_DEVICES="0,1",     # scale as needed
    protocol="tcp",                 # TCP is often more predictable with UVM
    threads_per_worker=1,
    rmm_managed_memory=True,        # allow oversubscription (paging)
    rmm_allocator_external_lib="cupy",
)
client = Client(cluster)
```

--------------------------------

### Clone and Install rapids_singlecell (Bash)

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/contributing.md

Clones the rapids_singlecell repository and installs it in editable mode with test dependencies. This process compiles CUDA kernels for the local GPU architecture, placing compiled modules and type stubs in `src/rapids_singlecell/_cuda/`.

```bash
git clone https://github.com/scverse/rapids_singlecell.git
cd rapids_singlecell
(uv) pip install -e ".[test]"
```

--------------------------------

### Install rapids-singlecell with Conda

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Installs rapids-singlecell using Conda environments. This method requires specifying a YAML file that defines the environment, including the CUDA version.

```bash
conda env create -f conda/rsc_rapids_26.02_cuda13.yml
# or
mamba env create -f conda/rsc_rapids_26.02_cuda12.yml
```

--------------------------------

### Start Dask CUDA Cluster (NVLink/Performance Preset)

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/out_of_core.md

Initializes a Dask CUDA cluster using the NVLink/Performance preset, optimized for fast peer-to-peer communication between GPUs. It utilizes UCX for transport and an RMM pool with managed memory disabled for maximum P2P performance. This is suitable when the dataset fits across available GPUs.

```python
from dask.distributed import Client
from dask_cuda import LocalCUDACluster

# Example: use 8 local GPUs
cluster = LocalCUDACluster(
    CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7",
    protocol="ucx",
    threads_per_worker=1,           # GPU-safe default
    rmm_pool_size="80%",            # per-worker pool; % of free VRAM at start
    rmm_managed_memory=False,       # avoid UM to maximize P2P
    rmm_allocator_external_lib="cupy",  # auto-patch CuPy to use RMM
)
client = Client(cluster)
```

--------------------------------

### Declare nanobind CUDA Module (CMake)

Source: https://github.com/scverse/rapids_singlecell/blob/main/CMakeLists.txt

The `add_nb_cuda_module` function in CMake is used to declare and configure nanobind CUDA modules. It sets up build targets, links necessary CUDA libraries (like `cudart` and `cublas`), enables separable compilation, and manages the installation of the compiled module and its type stubs for both development and distribution builds. Dependencies are handled to ensure correct build order.

```cmake
function(add_nb_cuda_module target src)
  if (RSC_BUILD_EXTENSIONS)
    nanobind_add_module(${target} STABLE_ABI LTO
        ${src}
    )
    target_link_libraries(${target} PRIVATE CUDA::cudart)
    set_target_properties(${target} PROPERTIES
        CUDA_SEPARABLE_COMPILATION ON
    )
    install(TARGETS ${target} LIBRARY DESTINATION rapids_singlecell/_cuda)
    # Generate type stubs at install time (for wheel installs)
    nanobind_add_stub(${target}_stub
        MODULE ${target}
        OUTPUT rapids_singlecell/_cuda/${target}.pyi
        PYTHON_PATH $<TARGET_FILE_DIR:${target}>
        DEPENDS ${target}
        INSTALL_TIME
        MARKER_FILE rapids_singlecell/_cuda/py.typed
    )
    # Generate type stubs at build time (for editable installs)
    nanobind_add_stub(${target}_stub_dev
        MODULE ${target}
        OUTPUT ${target}.pyi
        PYTHON_PATH $<TARGET_FILE_DIR:${target}>
        DEPENDS ${target}
    )
    # Copy built module + stub into source tree for editable installs
    add_custom_command(TARGET ${target}_stub_dev POST_BUILD
        COMMAND ${CMAKE_COMMAND} -E copy
            ${CMAKE_CURRENT_BINARY_DIR}/${target}.pyi
            ${PROJECT_SOURCE_DIR}/src/rapids_singlecell/_cuda/${target}.pyi
        COMMAND ${CMAKE_COMMAND} -E touch
            ${PROJECT_SOURCE_DIR}/src/rapids_singlecell/_cuda/py.typed
    )
    add_custom_command(TARGET ${target} POST_BUILD
        COMMAND ${CMAKE_COMMAND} -E copy
            $<TARGET_FILE:${target}>
            ${PROJECT_SOURCE_DIR}/src/rapids_singlecell/_cuda/$<TARGET_FILE_NAME:${target}>)
  endif()
endfunction()
```

--------------------------------

### Running Hatch Test Environments

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/contributing.md

These bash commands demonstrate how to execute the project's test suite using hatch. Different combinations of CUDA versions and dependency sources can be selected to test various build configurations.

```bash
# Run stable tests with CUDA 13
(uvx) hatch run hatch-test.stable-13:run

# Run stable tests with CUDA 12
(uvx) hatch run hatch-test.stable-12:run

# Run dev tests (upstream anndata/scanpy) with CUDA 13
(uvx) hatch run hatch-test.dev-13:run
```

--------------------------------

### Building Project Documentation

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/contributing.md

This command builds the project's documentation using hatch. An optional environment variable can be set to build documentation without compiling CUDA extensions, useful on systems without a GPU.

```bash
(uvx) hatch run docs:build

# Build without compiling CUDA extensions
CMAKE_ARGS="-DRSC_BUILD_EXTENSIONS=OFF" (uvx) hatch run docs:build
```

--------------------------------

### Running Individual Tests with Hatch

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/contributing.md

This snippet shows how to run specific test files or individual test cases using hatch. This is useful for quick iteration during development, allowing targeted testing of changes.

```bash
# Run a specific test file
(uvx) hatch run hatch-test.stable-13:run tests/path/to/test.py -v

# Run a specific test
(uvx) hatch run hatch-test.stable-13:run tests/path/to/test.py::test_name -v
```

--------------------------------

### Preprocessing API

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/api/scanpy_gpu.md

Provides accelerated functions for basic preprocessing tasks such as calculating QC metrics, filtering cells and genes, normalization, log transformation, identifying highly variable genes, regression, scaling, PCA, and normalizing Pearson residuals.

```APIDOC
## Preprocessing Functions

### Description
Accelerated functions for common preprocessing steps in single-cell analysis.

### Methods
- `pp.calculate_qc_metrics`
- `pp.filter_cells`
- `pp.filter_genes`
- `pp.normalize_total`
- `pp.log1p`
- `pp.highly_variable_genes`
- `pp.regress_out`
- `pp.scale`
- `pp.pca`
- `pp.normalize_pearson_residuals`
- `pp.flag_gene_family`
- `pp.filter_highly_variable`

### Endpoint
`/scverse/rapids_singlecell/pp`

### Parameters
(Specific parameters depend on each function, refer to individual function documentation for details.)

### Request Example
(Not applicable for these preprocessing functions as they operate on AnnData objects in memory.)

### Response
(These functions typically modify the AnnData object in-place or return modified AnnData objects.)
```

--------------------------------

### Execute GPU-accelerated Analysis Tools

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/usage_principles.md

Demonstrates running analysis tools like t-SNE which are drop-in replacements for Scanpy, allowing for seamless integration with Scanpy's plotting API.

```python
rsc.tl.tsne(adata)
sc.pl.tsne(adata, color="leiden")
```

--------------------------------

### Doublet Detection API

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/api/scanpy_gpu.md

Includes accelerated functions for doublet detection and simulation.

```APIDOC
## Doublet Detection

### Description
Accelerated functions for detecting and simulating doublets in single-cell data.

### Methods
- `pp.scrublet`
- `pp.scrublet_simulate_doublets`

### Endpoint
`/scverse/rapids_singlecell/pp`

### Parameters
(Specific parameters depend on each function, refer to individual function documentation for details.)

### Request Example
(Not applicable for these functions as they operate on AnnData objects in memory.)

### Response
(These functions typically modify the AnnData object in-place or return modified AnnData objects.)
```

--------------------------------

### Distance Class Methods

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/api/pertpy_gpu.md

API documentation for the Distance class, which provides GPU-accelerated methods for calculating pairwise distances, one-sided distances, and bootstrapping.

```APIDOC
## Distance Class

### Description
The `Distance` class provides methods to compute various distance metrics on GPU, accelerating perturbation analysis workflows.

### Methods
- **pairwise**: Computes pairwise distances between samples.
- **onesided_distances**: Calculates distances relative to a reference group.
- **bootstrap**: Performs statistical bootstrapping for distance metrics.

### Usage Example
```python
from rapids_singlecell import ptg
dist = ptg.Distance()
# Calculate pairwise distances
result = dist.pairwise(adata)
```

### Parameters
- **adata** (AnnData) - Required - The annotated data matrix used for distance calculations.

### Response
- **result** (ndarray/cupy.ndarray) - The computed distance matrix or statistical result.
```

--------------------------------

### Run Squidpy GPU Helpers

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/usage_principles.md

Executes spatial analysis workflows using the squidpy_gpu module for accelerated spatial statistics.

```python
from rapids_singlecell import squidpy_gpu as sqg

sqg.spatial_autocorr(
	adata,
	connectivity_key="spatial_connectivities",
	mode="moran",
	n_perms=500,
)
sqg.co_occurrence(adata, cluster_key="labels", interval=50)
sqg.ligrec(adata, cluster_key="labels", n_perms=1000)
```

--------------------------------

### Clustering API

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/api/scanpy_gpu.md

Provides accelerated tools for clustering cells based on their similarity.

```APIDOC
## Clustering Tools

### Description
Accelerated tools for performing clustering on single-cell data.

### Methods
- `tl.louvain`
- `tl.leiden`
- `tl.kmeans`

### Endpoint
`/scverse/rapids_singlecell/tl`

### Parameters
(Specific parameters depend on each function, refer to individual function documentation for details.)

### Request Example
(Not applicable for these functions as they operate on AnnData objects in memory.)

### Response
(These functions typically add cluster labels to the AnnData object.)
```

--------------------------------

### Move AnnData to GPU and CPU

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/usage_principles.md

Methods for transferring AnnData matrices between host memory and GPU device memory using cupy or built-in helper functions.

```python
import cupy as cpx

# Manual transfer
adata.X = cpx.scipy.sparse.csr_matrix(adata.X)
adata.X = adata.X.get()

# Using built-in helpers
rsc.get.anndata_to_GPU(adata)
rsc.get.anndata_to_CPU(adata)
```

--------------------------------

### Transfer Data Between CPU and GPU with AnnData

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Demonstrates transferring AnnData objects between CPU and GPU memory using the `rapids_singlecell.get` module. This is crucial for integrating GPU-accelerated analysis with CPU-based tools like Scanpy for visualization.

```python
import rapids_singlecell as rsc
import scanpy as sc

# Load data on CPU
adata = sc.read_h5ad("pbmc3k.h5ad")

# Transfer to GPU for accelerated analysis
rsc.get.anndata_to_GPU(adata)

# Perform GPU-accelerated analysis
rsc.pp.normalize_total(adata, target_sum=1e4)
rsc.pp.log1p(adata)
rsc.pp.highly_variable_genes(adata, n_top_genes=2000)
rsc.pp.pca(adata)
rsc.pp.neighbors(adata)
rsc.tl.umap(adata)

# Transfer back to CPU for visualization with Scanpy
rsc.get.anndata_to_CPU(adata)
sc.pl.umap(adata, color="louvain")
```

--------------------------------

### Neighborhood Graph Construction

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Builds a k-nearest neighbors graph using GPU-accelerated algorithms. Offers multiple metrics and algorithms like CAGRA or brute-force to balance speed and accuracy.

```python
import rapids_singlecell as rsc

# Standard neighbors computation
rsc.pp.neighbors(adata, n_neighbors=15, n_pcs=50)

# Approximate algorithms for large datasets
rsc.pp.neighbors(adata, n_neighbors=15, algorithm="cagra", metric="euclidean")

# Batch-balanced KNN
rsc.pp.bbknn(adata, batch_key="batch", neighbors_within_batch=3, n_pcs=50)
```

--------------------------------

### Gene Scoring and Cell Cycle API

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/api/scanpy_gpu.md

Offers accelerated tools for scoring genes and analyzing cell cycle progression.

```APIDOC
## Gene Scoring and Cell Cycle

### Description
Accelerated tools for scoring genes and analyzing cell cycle effects.

### Methods
- `tl.score_genes`
- `tl.score_genes_cell_cycle`

### Endpoint
`/scverse/rapids_singlecell/tl`

### Parameters
(Specific parameters depend on each function, refer to individual function documentation for details.)

### Request Example
(Not applicable for these functions as they operate on AnnData objects in memory.)

### Response
(These functions typically add gene scores to the AnnData object.)
```

--------------------------------

### Marker Genes API

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/api/scanpy_gpu.md

Provides accelerated tools for identifying marker genes.

```APIDOC
## Marker Genes Identification

### Description
Accelerated tool for identifying marker genes, typically between groups of cells.

### Method
`tl.rank_genes_groups`

### Endpoint
`/scverse/rapids_singlecell/tl/rank_genes_groups`

### Parameters
(Specific parameters depend on the `rank_genes_groups` function, refer to its documentation for details.)

### Request Example
(Not applicable for this function as it operates on AnnData objects in memory.)

### Response
(This function typically returns a DataFrame containing ranked genes.)
```

--------------------------------

### Use GPU-accelerated Decoupler

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/usage_principles.md

Utilizes the dcg module for accelerated pathway activity inference, compatible with the decoupler library.

```python
import decoupler as dc

model = dc.op.resource("PanglaoDB", organism="human")
rsc.dcg.ulm(adata, model , tmin=3)
acts_mlm = dc.pp.get_obsm(adata, key="score_ulm")
sc.pl.umap(acts_mlm, color=['NK cells'], cmap='coolwarm', vcenter=0)
```

--------------------------------

### Perform Pertpy-compatible Analysis

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/usage_principles.md

Accesses the ptg module to perform GPU-accelerated perturbation analysis, such as pairwise distance calculations.

```python
from rapids_singlecell import ptg

distance = ptg.Distance(metric="edistance", obsm_key="X_pca")
result = distance.pairwise(adata, groupby="perturbation")
res, res_var = distance.pairwise(
	adata, groupby="perturbation", bootstrap=True, n_bootstrap=100, multi_gpu=None
)
```

--------------------------------

### Harmony Batch Integration

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Integrates data from multiple batches to correct technical effects while preserving biological signal. Requires a precomputed PCA representation.

```python
import rapids_singlecell as rsc

rsc.pp.pca(adata, n_comps=50)
rsc.pp.harmony_integrate(adata, key="batch", basis="X_pca", adjusted_basis="X_pca_harmony")
rsc.pp.neighbors(adata, use_rep="X_pca_harmony")
```

--------------------------------

### Import rapids-singlecell

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/usage_principles.md

The standard import statement for accessing the rapids-singlecell library functionality.

```python
import rapids_singlecell as rsc
```

--------------------------------

### Neighbors API

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/api/scanpy_gpu.md

Provides accelerated functions for calculating cell-to-cell nearest neighbors.

```APIDOC
## Neighbors Calculation

### Description
Accelerated functions for computing nearest neighbor graphs for single-cell data.

### Methods
- `pp.neighbors`
- `pp.bbknn`

### Endpoint
`/scverse/rapids_singlecell/pp`

### Parameters
(Specific parameters depend on each function, refer to individual function documentation for details.)

### Request Example
(Not applicable for these functions as they operate on AnnData objects in memory.)

### Response
(These functions typically modify the AnnData object in-place or return modified AnnData objects.)
```

--------------------------------

### Batch Effect Correction API

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/api/scanpy_gpu.md

Offers accelerated functions for batch effect correction, including integration methods.

```APIDOC
## Batch Effect Correction

### Description
Provides accelerated methods for correcting batch effects in single-cell data.

### Method
`pp.harmony_integrate`

### Endpoint
`/scverse/rapids_singlecell/pp/harmony_integrate`

### Parameters
(Specific parameters depend on the `harmony_integrate` function, refer to its documentation for details.)

### Request Example
(Not applicable for this function as it operates on AnnData objects in memory.)

### Response
(This function typically modifies the AnnData object in-place or returns a modified AnnData object.)
```

--------------------------------

### Clustering Algorithms (Leiden and Louvain)

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Performs graph-based clustering to identify cell populations. Leiden is recommended for better community detection, while Louvain is provided as an alternative.

```python
import rapids_singlecell as rsc

# Leiden clustering
rsc.pp.neighbors(adata, n_neighbors=15)
rsc.tl.leiden(adata, resolution=1.0)

# Louvain clustering
rsc.tl.louvain(adata, resolution=1.0, n_iterations=100)
```

--------------------------------

### Add CUDA Module to Python Package Initialization

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/contributing.md

This Python code snippet demonstrates how to add a newly compiled CUDA module to the `__all__` list in the package's `__init__.py` file. This enables lazy loading, preventing import errors when the compiled extension is not available.

```python
__all__ = [
    ...,
    "_your_module_cuda",
]
```

--------------------------------

### Execute complete single-cell analysis workflow

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Demonstrates a standard end-to-end analysis pipeline including data loading, GPU transfer, QC, normalization, dimensionality reduction, clustering, and marker gene identification.

```python
import rapids_singlecell as rsc
import scanpy as sc

adata = sc.read_h5ad("raw_counts.h5ad")
rsc.get.anndata_to_GPU(adata)

rsc.pp.flag_gene_family(adata, gene_family_name="MT", gene_family_prefix="MT-")
rsc.pp.calculate_qc_metrics(adata, qc_vars=["MT"])
rsc.pp.filter_cells(adata, min_genes=200)
rsc.pp.filter_genes(adata, min_cells=3)

rsc.pp.normalize_total(adata, target_sum=1e4)
rsc.pp.log1p(adata)
rsc.pp.highly_variable_genes(adata, n_top_genes=2000, flavor="seurat_v3")

rsc.pp.pca(adata, n_comps=50, use_highly_variable=True)
rsc.pp.neighbors(adata, n_neighbors=15)
rsc.tl.umap(adata)
rsc.tl.leiden(adata, resolution=1.0)
rsc.tl.rank_genes_groups(adata, groupby="leiden", method="wilcoxon")

rsc.get.anndata_to_CPU(adata)
sc.pl.umap(adata, color=["leiden", "total_counts", "pct_counts_MT"])
sc.pl.rank_genes_groups(adata, n_genes=10)
```

--------------------------------

### CMake Configuration for C++ and CUDA

Source: https://github.com/scverse/rapids_singlecell/blob/main/CMakeLists.txt

This snippet configures the CMake build system for C++ and CUDA projects. It sets the minimum CMake version, project name, C++ standard, and position-independent code. It conditionally enables CUDA and finds necessary packages like Python, nanobind, and CUDAToolkit based on the RSC_BUILD_EXTENSIONS option.

```cmake
cmake_minimum_required(VERSION 3.24)

project(rapids_singlecell_cuda LANGUAGES CXX)

# Option to disable building compiled extensions (for docs/RTD)
option(RSC_BUILD_EXTENSIONS "Build CUDA/C++ extensions" ON)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)

if (RSC_BUILD_EXTENSIONS)
  enable_language(CUDA)
  find_package(Python REQUIRED COMPONENTS Interpreter Development.Module ${SKBUILD_SABI_COMPONENT})
  find_package(nanobind CONFIG REQUIRED)
  find_package(CUDAToolkit REQUIRED)
  message(STATUS "Building for CUDA architectures: ${CMAKE_CUDA_ARCHITECTURES}")
else()
  message(STATUS "RSC_BUILD_EXTENSIONS=OFF -> skipping compiled extensions for docs")
endif()
```

--------------------------------

### Embedding API

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/api/scanpy_gpu.md

Offers accelerated tools for dimensionality reduction and embedding single-cell data.

```APIDOC
## Embedding Tools

### Description
Accelerated tools for generating low-dimensional embeddings of single-cell data.

### Methods
- `tl.umap`
- `tl.tsne`
- `tl.diffmap`
- `tl.draw_graph`
- `tl.embedding_density`

### Endpoint
`/scverse/rapids_singlecell/tl`

### Parameters
(Specific parameters depend on each function, refer to individual function documentation for details.)

### Request Example
(Not applicable for these functions as they operate on AnnData objects in memory.)

### Response
(These functions typically add embedding information to the AnnData object.)
```

--------------------------------

### Doublet Detection: Scrublet with GPU Acceleration

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Detects potential doublets (two cells captured together) using the Scrublet algorithm with GPU acceleration. It's recommended to run this on raw counts before normalization. Results include doublet probability and a boolean doublet call.

```python
import rapids_singlecell as rsc

# Run on raw counts before normalization
rsc.pp.scrublet(
    adata,
    batch_key="batch",  # Run separately per batch
    expected_doublet_rate=0.06,
    n_prin_comps=30,
)

# Results in:
# - adata.obs["doublet_score"]: Doublet probability
# - adata.obs["predicted_doublet"]: Boolean doublet call

# Filter doublets
adata = adata[~adata.obs["predicted_doublet"]]
```

--------------------------------

### Register CUDA Module in CMakeLists.txt

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/contributing.md

This snippet shows how to register a new CUDA module within the project's CMake build system. It uses the `add_nb_cuda_module` helper function to compile and link the CUDA source files.

```cmake
add_nb_cuda_module(_your_module_cuda src/rapids_singlecell/_cuda/your_module/your_module.cu)
```

--------------------------------

### Ligand-Receptor Interaction Analysis

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Analyzes ligand-receptor interactions between cell types in spatial data using Squidpy. Requires pre-computed spatial neighbors and a DataFrame of ligand-receptor pairs. Supports permutation tests and alpha-value filtering.

```python
import rapids_singlecell as rsc

# Requires spatial neighbors computed
rsc.gr.ligrec(
    adata,
    cluster_key="cell_type",
    interactions=interactions_df,  # DataFrame with ligand-receptor pairs
    n_perms=1000,
    alpha=0.05,
)

# Results in adata.uns["cell_type_ligrec"]
```

--------------------------------

### Visualization Embeddings (UMAP and t-SNE)

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Generates low-dimensional embeddings for data visualization. UMAP requires a precomputed neighborhood graph, while t-SNE can be computed directly from PCA components.

```python
import rapids_singlecell as rsc

# UMAP
rsc.pp.neighbors(adata, n_neighbors=15)
rsc.tl.umap(adata, min_dist=0.5, spread=1.0)

# t-SNE
rsc.tl.tsne(adata, n_pcs=50, perplexity=30, learning_rate=200)
```

--------------------------------

### Out-of-core Preprocessing Pipeline with rapids_singlecell

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/out_of_core.md

Executes a standard preprocessing pipeline for single-cell data using the rapids_singlecell library. This includes normalization, log transformation, highly variable gene selection, scaling, and PCA. Most operations are lazy, deferring computation until explicitly requested.

```python
import rapids_singlecell as rsc
rsc.get.anndata_to_GPU(adata)
# Normalize and transform
rsc.pp.normalize_total(adata)
rsc.pp.log1p(adata)

# HVG selection
rsc.pp.highly_variable_genes(adata)
adata = adata[:, adata.var["highly_variable"]].copy()

# Scale and PCA
rsc.pp.scale(adata, zero_center=True, max_value=10)
rsc.pp.pca(adata, n_comps=50)
```

--------------------------------

### Principal Component Analysis (PCA) for Dimensionality Reduction

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Computes principal components to reduce feature space dimensionality. Supports various SVD solvers optimized for dense, sparse, or chunked out-of-core data.

```python
import rapids_singlecell as rsc

# Standard PCA
rsc.pp.pca(adata, n_comps=50, use_highly_variable=True)

# PCA with specific SVD solver for large sparse matrices
rsc.pp.pca(adata, n_comps=50, svd_solver="lanczos", use_highly_variable=True)

# Incremental PCA for very large datasets
rsc.pp.pca(adata, n_comps=50, chunked=True, chunk_size=10000)
```

--------------------------------

### Compute perturbation distances and bootstrap intervals with rapids-singlecell

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Calculates one-sided distances and bootstrap confidence intervals for perturbation analysis. These functions utilize multi-GPU support to handle large datasets efficiently.

```python
ctrl_distances = distance.onesided_distances(
    adata,
    groupby="perturbation",
    control_group="control",
    multi_gpu=True,
)

bootstrap_result = distance.bootstrap(
    adata,
    groupby="perturbation",
    control_group="control",
    n_bootstrap=100,
    multi_gpu=True,
)
```

--------------------------------

### Pathway Activity Inference: Decoupler Methods

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Infers pathway or transcription factor activities using GPU-accelerated decoupler methods. Supports various models including Univariate Linear Model (ULM), Multivariate Linear Model (MLM), AUCell, Z-score, and WAGGR.

```python
import rapids_singlecell as rsc
import decoupler as dc

# Get PROGENy pathway signatures
progeny = dc.get_progeny(organism="human", top=500)

# Univariate Linear Model (ULM) for pathway activity
rsc.dcg.ulm(
    adata,
    net=progeny,
    source="source",
    target="target",
    weight="weight",
    verbose=True,
)

# Results in:
# - adata.obsm["ulm_estimate"]: Activity scores
# - adata.obsm["ulm_pvals"]: P-values

# Alternative methods
rsc.dcg.mlm(adata, net=progeny)  # Multivariate Linear Model
rsc.dcg.aucell(adata, net=progeny)  # AUCell enrichment
rsc.dcg.zscore(adata, net=progeny)  # Z-score method
rsc.dcg.waggr(adata, net=progeny)  # Weighted aggregation
```

--------------------------------

### Load AnnData Lazily from Zarr

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/out_of_core.md

Loads an AnnData object from a Zarr store, configuring the 'X' matrix to be a Dask array for out-of-core processing. 'obs' and 'var' are read eagerly. This function handles different AnnData versions for compatibility.

```python
import anndata as ad
from packaging.version import parse as parse_version

if parse_version(ad.__version__) < parse_version("0.12.0rc1"):
    from anndata.experimental import read_elem_as_dask as read_dask
else:
    from anndata.experimental import read_elem_lazy as read_dask

import zarr

SPARSE_CHUNK_SIZE = 20_000
data_pth = "zarr/cell_atlas.zarr"  # example zarr path

f = zarr.open(data_pth)
X = f["X"]
shape = X.attrs["shape"]

adata = ad.AnnData(
    X=read_dask(X, (SPARSE_CHUNK_SIZE, shape[1])),
    obs=ad.io.read_elem(f["obs"]),
    var=ad.io.read_elem(f["var"]),
)
```

--------------------------------

### Enable Pool Allocator with RMM and Cupy

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/memory_management.md

This code snippet shows how to enable the pool allocator in rapids-singlecell using RMM. It reinitializes RMM with the pool allocator enabled and managed memory disabled, then sets the Cupy allocator to use RMM. This mode is recommended for speed when the dataset fits within GPU VRAM.

```python
import rmm
import cupy as cp
from rmm.allocators.cupy import rmm_cupy_allocator

rmm.reinitialize(
    managed_memory=False,
    pool_allocator=True,
)
cp.cuda.set_allocator(rmm_cupy_allocator)
```

--------------------------------

### Configure RMM memory management for GPU

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Configures the RAPIDS Memory Manager (RMM) to optimize GPU memory usage. Users can toggle between pool allocators for speed or managed memory for datasets exceeding VRAM capacity.

```python
import rmm
import cupy as cp
from rmm.allocators.cupy import rmm_cupy_allocator

# For datasets that fit in GPU VRAM
rmm.reinitialize(managed_memory=False, pool_allocator=True)
cp.cuda.set_allocator(rmm_cupy_allocator)

# For datasets larger than VRAM
rmm.reinitialize(managed_memory=True, pool_allocator=False)
cp.cuda.set_allocator(rmm_cupy_allocator)
```

--------------------------------

### Define CUDA Modules for Single-Cell Analysis (CMake)

Source: https://github.com/scverse/rapids_singlecell/blob/main/CMakeLists.txt

This section of the CMake script utilizes the `add_nb_cuda_module` helper function to define various CUDA-accelerated modules for single-cell data analysis. Each `add_nb_cuda_module` call compiles a specific CUDA source file (`.cu`) into a nanobind module, enabling Python access to GPU-accelerated computations. Some modules, like those for Harmony, require additional linking against `CUDA::cublas` for specific operations.

```cmake
if (RSC_BUILD_EXTENSIONS)
  # CUDA modules
  add_nb_cuda_module(_mean_var_cuda     src/rapids_singlecell/_cuda/mean_var/mean_var.cu)
  add_nb_cuda_module(_sparse2dense_cuda src/rapids_singlecell/_cuda/sparse2dense/sparse2dense.cu)
  add_nb_cuda_module(_scale_cuda        src/rapids_singlecell/_cuda/scale/scale.cu)
  add_nb_cuda_module(_qc_cuda           src/rapids_singlecell/_cuda/qc/qc.cu)
  add_nb_cuda_module(_qc_dask_cuda      src/rapids_singlecell/_cuda/qc_dask/qc_kernels_dask.cu)
  add_nb_cuda_module(_bbknn_cuda        src/rapids_singlecell/_cuda/bbknn/bbknn.cu)
  add_nb_cuda_module(_norm_cuda         src/rapids_singlecell/_cuda/norm/norm.cu)
  add_nb_cuda_module(_pr_cuda           src/rapids_singlecell/_cuda/pr/pr.cu)
  add_nb_cuda_module(_nn_descent_cuda   src/rapids_singlecell/_cuda/nn_descent/nn_descent.cu)
  add_nb_cuda_module(_aucell_cuda       src/rapids_singlecell/_cuda/aucell/aucell.cu)
  add_nb_cuda_module(_nanmean_cuda      src/rapids_singlecell/_cuda/nanmean/nanmean.cu)
  add_nb_cuda_module(_autocorr_cuda     src/rapids_singlecell/_cuda/autocorr/autocorr.cu)
  add_nb_cuda_module(_cooc_cuda         src/rapids_singlecell/_cuda/cooc/cooc.cu)
  add_nb_cuda_module(_aggr_cuda         src/rapids_singlecell/_cuda/aggr/aggr.cu)
  add_nb_cuda_module(_spca_cuda         src/rapids_singlecell/_cuda/spca/spca.cu)
  add_nb_cuda_module(_ligrec_cuda       src/rapids_singlecell/_cuda/ligrec/ligrec.cu)
  add_nb_cuda_module(_pv_cuda           src/rapids_singlecell/_cuda/pv/pv.cu)
  add_nb_cuda_module(_edistance_cuda    src/rapids_singlecell/_cuda/edistance/edistance.cu)
  add_nb_cuda_module(_hvg_cuda          src/rapids_singlecell/_cuda/hvg/hvg.cu)
  add_nb_cuda_module(_kde_cuda          src/rapids_singlecell/_cuda/kde/kde.cu)
  add_nb_cuda_module(_wilcoxon_cuda     src/rapids_singlecell/_cuda/wilcoxon/wilcoxon.cu)
  # Harmony CUDA modules
  add_nb_cuda_module(_harmony_scatter_cuda   src/rapids_singlecell/_cuda/harmony/scatter/scatter.cu)
  add_nb_cuda_module(_harmony_outer_cuda     src/rapids_singlecell/_cuda/harmony/outer/outer.cu)
  add_nb_cuda_module(_harmony_colsum_cuda    src/rapids_singlecell/_cuda/harmony/colsum/colsum.cu)
  add_nb_cuda_module(_harmony_kmeans_cuda    src/rapids_singlecell/_cuda/harmony/kmeans/kmeans.cu)
  add_nb_cuda_module(_harmony_normalize_cuda src/rapids_singlecell/_cuda/harmony/normalize/normalize.cu)
  add_nb_cuda_module(_harmony_pen_cuda       src/rapids_singlecell/_cuda/harmony/pen/pen.cu)
  add_nb_cuda_module(_harmony_clustering_cuda src/rapids_singlecell/_cuda/harmony/clustering/clustering.cu)
  target_link_libraries(_harmony_clustering_cuda PRIVATE CUDA::cublas)
  add_nb_cuda_module(_harmony_correction_cuda src/rapids_singlecell/_cuda/harmony/correction/correction_fast.cu)
  target_link_libraries(_harmony_correction_cuda PRIVATE CUDA::cublas)
  add_nb_cuda_module(_harmony_correction_batched_cuda src/rapids_singlecell/_cuda/harmony/correction/correction_batched.cu)
  target_link_libraries(_harmony_correction_batched_cuda PRIVATE CUDA::cublas)
  # Wilcoxon binned histogram CUDA module
  add_nb_cuda_module(_wilcoxon_binned_cuda   src/rapids_singlecell/_cuda/wilcoxon_binned/wilcoxon_binned.cu)
endif()
```

--------------------------------

### Preprocessing and Data Normalization with rapids_singlecell

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Standardizes gene expression data and regresses out unwanted technical variation. This is essential for removing batch effects or confounding factors like cell cycle phase before downstream analysis.

```python
# Standard scaling
rsc.pp.scale(adata, max_value=10)

# Regress out unwanted variation
rsc.pp.regress_out(adata, keys=["total_counts", "pct_counts_MT"])
rsc.pp.scale(adata, max_value=10)

# Cell cycle regression
rsc.tl.score_genes_cell_cycle(adata, s_genes=s_genes, g2m_genes=g2m_genes)
rsc.pp.regress_out(adata, keys=["S_score", "G2M_score"])
```

--------------------------------

### Spatial Autocorrelation: Moran's I and Geary's C

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Computes spatial autocorrelation statistics (Moran's I, Geary's C) for spatial transcriptomics data using Squidpy. Requires spatial neighbors to be computed first. Supports permutation tests for statistical significance.

```python
import rapids_singlecell as rsc

# Compute spatial neighbors first
rsc.pp.neighbors(adata, use_rep="spatial", key_added="spatial_neighbors")

# Compute Moran's I for all genes
rsc.gr.spatial_autocorr(
    adata,
    mode="moran",
    genes=adata.var_names[:100],  # Subset of genes
    n_perms=100,  # Permutation test
    n_jobs=1,
)

# Results in adata.uns["moranI"]
```

--------------------------------

### Perturbation Distance Analysis: Multi-GPU Support

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Computes distances between cell populations for perturbation analysis with multi-GPU support. It utilizes a Distance calculator initialized with a specified metric and observation key. Supports pairwise distance calculations grouped by perturbation.

```python
import rapids_singlecell as rsc

# Initialize distance calculator
distance = rsc.ptg.Distance(metric="edistance", obsm_key="X_pca")

# Pairwise distances between all perturbations
dist_matrix = distance.pairwise(
    adata,
    groupby="perturbation",
    multi_gpu=True,  # Use all GPUs
)

```

--------------------------------

### Calculate QC Metrics with rapids-singlecell

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Calculates essential quality control metrics for single-cell data, including gene counts per cell, total counts, and mitochondrial gene percentages. This function is typically the first preprocessing step.

```python
import rapids_singlecell as rsc
import scanpy as sc

adata = sc.read_h5ad("raw_counts.h5ad")
rsc.get.anndata_to_GPU(adata)

# Flag mitochondrial genes
rsc.pp.flag_gene_family(adata, gene_family_name="MT", gene_family_prefix="MT-")

# Calculate QC metrics including mitochondrial percentage
rsc.pp.calculate_qc_metrics(adata, qc_vars=["MT"])

# Results are stored in adata.obs:
# - n_genes_by_counts: number of genes detected per cell
# - total_counts: total UMI counts per cell
# - pct_counts_MT: percentage of mitochondrial counts
# - log1p_total_counts: log-transformed total counts

# Filter cells based on QC metrics
rsc.pp.filter_cells(adata, min_genes=200)
rsc.pp.filter_cells(adata, max_genes=5000)
rsc.pp.filter_genes(adata, min_cells=3)
```

--------------------------------

### Co-occurrence Analysis: Cell Type Interactions

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Computes co-occurrence probability of cell types across spatial distances using Squidpy. Requires spatial coordinates and a cluster key. Supports multi-GPU for faster computation and defines distance intervals for analysis.

```python
import rapids_singlecell as rsc

# Requires spatial coordinates in adata.obsm["spatial"]
rsc.gr.co_occurrence(
    adata,
    cluster_key="cell_type",
    spatial_key="spatial",
    interval=50,  # Number of distance intervals
    multi_gpu=True,  # Use all available GPUs
)

# Results in adata.uns["cell_type_co_occurrence"]["occ"]
```

--------------------------------

### Enable Managed Memory with RMM and Cupy

Source: https://github.com/scverse/rapids_singlecell/blob/main/docs/memory_management.md

This code snippet demonstrates how to enable managed memory in rapids-singlecell using RMM. It reinitializes RMM with managed memory enabled and the pool allocator disabled, then sets the Cupy allocator to use RMM. This is suitable for datasets larger than GPU VRAM.

```python
import rmm
import cupy as cp
from rmm.allocators.cupy import rmm_cupy_allocator

rmm.reinitialize(managed_memory=True, pool_allocator=False)
cp.cuda.set_allocator(rmm_cupy_allocator)
```

--------------------------------

### Diffusion Maps: Trajectory Analysis

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Computes diffusion map embedding for trajectory analysis and pseudotime inference. It requires computing neighbors first and stores the results in `adata.obsm['X_diffmap']`. Diffusion pseudotime can be derived from the first component.

```python
import rapids_singlecell as rsc

rsc.pp.neighbors(adata, n_neighbors=15)

rsc.tl.diffmap(adata, n_comps=15)

# Results stored in adata.obsm["X_diffmap"]
# Diffusion pseudotime can be computed from first component
```

--------------------------------

### Normalize and Log-Transform Data with rapids-singlecell

Source: https://context7.com/scverse/rapids_singlecell/llms.txt

Performs normalization of gene counts to a target sum and applies a log1p transformation. This is a standard preprocessing step to ensure comparability of expression values across cells. It also includes an option for Pearson residual normalization.

```python
import rapids_singlecell as rsc

# Normalize to 10,000 counts per cell
rsc.pp.normalize_total(adata, target_sum=1e4)

# Log-transform the data
rsc.pp.log1p(adata)

# Alternative: Pearson residual normalization for sparse data
# This combines normalization and variance stabilization
rsc.pp.normalize_pearson_residuals(adata, theta=100, clip=None)
```