### Run CopyKAT Analysis with Example Data

Source: https://github.com/navinlabcode/copykat/blob/master/README.md

This is a basic example of how to run the CopyKAT analysis after installation and data preparation. It shows the core function call using an example raw UMI matrix named 'exp.rawdata' and assigning a sample name 'test'. This initiates the copy number profiling and subclonal structure inference.

```r
copykat.test <- copykat(rawmat=exp.rawdata, sam.name="test")
```

--------------------------------

### Install CopyKAT R Package from GitHub

Source: https://github.com/navinlabcode/copykat/blob/master/README.md

This code snippet demonstrates how to install the CopyKAT R package directly from its GitHub repository using the devtools package. It's essential for getting the latest version of the tool. To update, users should first remove the old version before reinstalling.

```r
library(devtools)
install_github("navinlabcode/copykat")
```

```r
remove.packages("copykat")
detach("package:copykat")
```

--------------------------------

### Complete Single-Cell Tumor Analysis Workflow (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

Demonstrates an end-to-end workflow for single-cell tumor analysis, starting from 10X Genomics data. It covers data loading, quality filtering, CopyKAT analysis, subclone identification, and integration with Seurat for downstream analysis. Requires Seurat and copykat libraries.

```r
# Step 1: Load 10X Genomics CellRanger output
library(Seurat)
library(copykat)

raw_data <- Read10X(data.dir = "path/to/filtered_feature_bc_matrix")
seurat_obj <- CreateSeuratObject(
  counts = raw_data,
  project = "tumor_analysis",
  min.cells = 3,
  min.features = 200
)

# Step 2: Extract raw count matrix for CopyKAT
exp_matrix <- as.matrix(seurat_obj@assays$RNA@counts)
dim(exp_matrix)
# [1] 20000  5000  (genes x cells)
```

--------------------------------

### Run CopyKAT and Extract Predictions (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

This snippet demonstrates how to run the copykat function on an expression matrix and extract the cell predictions. It requires an expression matrix ('exp_matrix') and sets various parameters for the analysis, including gene identification, window size, significance cutoffs, sample name, distance metric, and number of cores. The output includes predictions for each cell.

```r
ck_result <- copykat(
  rawmat = exp_matrix,
  id.type = "S",
  ngene.chr = 5,
  win.size = 25,
  KS.cut = 0.1,
  sam.name = "sample_01",
  distance = "euclidean",
  n.cores = 8
)

predictions <- ck_result$prediction
predictions_filtered <- predictions[predictions$copykat.pred != "not.defined", ]
table(predictions_filtered$copykat.pred)
```

--------------------------------

### CopyKAT: Primary Copy Number Analysis Pipeline (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

Performs end-to-end analysis for copy number variations, cell classification (aneuploid/diploid), and visualization from scRNA-seq data. It includes data preprocessing, segmentation, and clustering, requiring gene expression data and genome build information.

```r
# Load library and example data
library(copykat)
data(exp.rawdata)

# Basic tumor/normal analysis with human genome
copykat.result <- copykat(
  rawmat = exp.rawdata,
  id.type = "S",                    # "S" for gene Symbol, "E" for Ensembl
  ngene.chr = 5,                    # min genes per chromosome for cell filtering
  win.size = 25,                    # min genes per segment (15-150)
  KS.cut = 0.1,                     # segmentation sensitivity (0.05-0.15)
  sam.name = "breast_tumor",
  distance = "euclidean",           # or "pearson", "spearman"
  norm.cell.names = "",             # vector of known normal cell names
  output.seg = "FALSE",             # set "TRUE" for IGV .seg files
  plot.genes = "TRUE",              # plot gene-level heatmap
  genome = "hg20",                  # or "mm10" for mouse
  n.cores = 4
)

# Extract prediction results (aneuploid/diploid classification)
predictions <- data.frame(copykat.result$prediction)
predictions <- predictions[predictions$copykat.pred %in% c("aneuploid", "diploid"), ]
head(predictions)
#   cell.names copykat.pred
# 1 cell_001   aneuploid
# 2 cell_002   diploid
# 3 cell_003   aneuploid

# Extract copy number matrix (220kb genomic bins by cells)
cna_matrix <- data.frame(copykat.result$CNAmat)
head(cna_matrix[, 1:5])
#   chrom chrompos abspos   cell_001   cell_002
# 1     1   850000 850000  0.023145 -0.012456
# 2     1  1070000 1070000  0.045678  0.001234

# Access hierarchical clustering object
hclust_obj <- copykat.result$hclustering

# Files automatically saved:
# - breast_tumor_copykat_prediction.txt
# - breast_tumor_copykat_CNA_results.txt
# - breast_tumor_copykat_heatmap.jpeg
# - breast_tumor_copykat_with_genes_heatmap.pdf
# - breast_tumor_copykat_clustering_results.rds

```

--------------------------------

### Prepare Input Matrix from 10X Genomics Output using Seurat

Source: https://github.com/navinlabcode/copykat/blob/master/README.md

This code snippet shows how to read raw count data from 10X Genomics cellranger output and prepare it as a matrix suitable for CopyKAT. It utilizes the Seurat R package to read the data, create a Seurat object, and extract the raw counts into a matrix format. The resulting matrix can then be saved for future use.

```r
library(Seurat)
raw <- Read10X(data.dir = data.path.to.cellranger.outs)
raw <- CreateSeuratObject(counts = raw, project = "copycat.test", min.cells = 0, min.features = 0)
exp.rawdata <- as.matrix(raw@assays$RNA@counts)
```

```r
write.table(exp.rawdata, file="exp.rawdata.txt", sep="\t", quote = FALSE, row.names = TRUE)
```

--------------------------------

### Define and Visualize Tumor Cell Subpopulations (R)

Source: https://github.com/navinlabcode/copykat/blob/master/README.md

Identifies and visualizes two subpopulations of tumor cells based on their copy number profiles. It filters for aneuploid cells, performs hierarchical clustering, and generates a heatmap highlighting these defined subpopulations. Requires 'heatmap.3', 'parallelDist', and 'RColorBrewer' libraries.

```r
tumor.cells <- pred.test$cell.names[which(pred.test$copykat.pred=="aneuploid")]
tumor.mat <- CNA.test[, which(colnames(CNA.test) %in% tumor.cells)]
hcc <- hclust(parallelDist::parDist(t(tumor.mat),threads =4, method = "euclidean"), method = "ward.D2")
hc.umap <- cutree(hcc,2)

rbPal6 <- colorRampPalette(RColorBrewer::brewer.pal(n = 8, name = "Dark2")[3:4])
subpop <- rbPal6(2)[as.numeric(factor(hc.umap))]
cells <- rbind(subpop,subpop)

heatmap.3(t(tumor.mat),dendrogram="r", distfun = function(x) parallelDist::parDist(x,threads =4, method = "euclidean"), hclustfun = function(x) hclust(x, method="ward.D2"),
            ColSideColors=chr1,RowSideColors=cells,Colv=NA, Rowv=TRUE,
            notecol="black",col=my_palette,breaks=col_breaks, key=TRUE,
            keysize=1, density.info="none", trace="none",
            cexRow=0.1,cexCol=0.1,cex.main=1,cex.lab=0.1,
            symm=F,symkey=F,symbreaks=T,cex=1, cex.main=4, margins=c(10,10))

  legend("topright", c("c1","c2"), pch=15,col=RColorBrewer::brewer.pal(n = 8, name = "Dark2")[3:4], cex=0.9, bty='n')
```

--------------------------------

### Standard Seurat Workflow and Visualization (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

This section outlines a standard Seurat workflow for data normalization, feature identification, scaling, dimensionality reduction (PCA and UMAP), and visualization of CopyKAT predictions. It includes plotting the UMAP colored by CopyKAT predictions and tumor subclones.

```r
seurat_obj <- NormalizeData(seurat_obj)
seurat_obj <- FindVariableFeatures(seurat_obj)
seurat_obj <- ScaleData(seurat_obj)
seurat_obj <- RunPCA(seurat_obj)
seurat_obj <- RunUMAP(seurat_obj, dims = 1:30)

DimPlot(seurat_obj, group.by = "copykat_pred", cols = c("red", "blue", "grey"))
DimPlot(seurat_obj, group.by = "subclone", cells = tumor_cells)
```

--------------------------------

### Run copykat Analysis with Default and Custom Parameters

Source: https://github.com/navinlabcode/copykat/blob/master/README.md

This R code snippet demonstrates how to execute the copykat function with specified parameters. It takes raw expression data, sets gene ID type, filters cells and genes, defines segmentation parameters, enables parallel processing, assigns a sample name, specifies a distance metric for clustering, and controls output formats for segmentation files and gene plots. The genome version is also specified.

```r
library(copykat)
copykat.test <- copykat(rawmat=exp.rawdata, id.type="S", ngene.chr=5, win.size=25, KS.cut=0.1, sam.name="test", distance="euclidean", norm.cell.names="",output.seg="FLASE", plot.genes="TRUE", genome="hg20",n.cores=1)
```

--------------------------------

### Identify subclones using hierarchical clustering and heatmap visualization

Source: https://context7.com/navinlabcode/copykat/llms.txt

This snippet outlines the process of identifying subclones from aneuploid tumor cells identified by copykat. It involves extracting tumor cells and their corresponding CNV data, performing hierarchical clustering using `parDist` and `hclust`, and then cutting the tree into subclone groups. Finally, it visualizes these subclones using a heatmap with specific color schemes for chromosomes and subclones.

```r
# Extract tumor cells only
tumor_cells <- predictions$cell.names[predictions$copykat.pred == "aneuploid"]
tumor_cna <- cna_matrix[, which(colnames(cna_matrix) %in% tumor_cells)]

# Hierarchical clustering to identify subclones
library(parallelDist)
library(RColorBrewer)
library(gplots)

hcc <- hclust(
  parDist(t(tumor_cna), threads = 4, method = "euclidean"),
  method = "ward.D2"
)

# Cut into 2 subclones (adjust k for more subpopulations)
subclones <- cutree(hcc, k = 2)
table(subclones)
# subclones
#   1   2
# 456 234

# Visualize subclone heatmap
my_palette <- colorRampPalette(rev(brewer.pal(n = 3, name = "RdBu")))(n = 999)
chr_colors <- as.numeric(cna_matrix$chrom) %% 2 + 1
CHR <- colorRampPalette(c('black', 'grey'))(2)[chr_colors]
chr_sidebar <- cbind(CHR, CHR)

subclone_colors <- colorRampPalette(brewer.pal(n = 8, name = "Dark2")[3:4])(2)[subclones]
cell_sidebar <- rbind(subclone_colors, subclone_colors)

col_breaks <- c(
  seq(-1, -0.4, length = 50),
  seq(-0.4, -0.2, length = 150),
  seq(-0.2, 0.2, length = 600),
  seq(0.2, 0.4, length = 150),
  seq(0.4, 1, length = 50)
)

heatmap.3(
  t(tumor_cna),
  dendrogram = "r",
  distfun = function(x) parDist(x, threads = 4, method = "euclidean"),
  hclustfun = function(x) hclust(x, method = "ward.D2"),
  ColSideColors = chr_sidebar,
  RowSideColors = cell_sidebar,
  Colv = NA,
  Rowv = TRUE,
  col = my_palette,
  breaks = col_breaks,
  key = TRUE,
  trace = "none",
  margins = c(10, 10)
)
```

--------------------------------

### Display First Rows of Predicted Copy Number Data (R)

Source: https://github.com/navinlabcode/copykat/blob/master/README.md

Displays the first few rows of the 'pred.test' data frame, which contains the predicted copy number status for cells. This is useful for a quick inspection of the prediction results.

```r
head(pred.test)
```

--------------------------------

### Generate Copy Number Heatmap with Cell Subpopulations (R)

Source: https://github.com/navinlabcode/copykat/blob/master/README.md

Generates a heatmap visualizing copy number variations across genomic bins for single cells. It incorporates cell type predictions (aneuploid/diploid) and chromosomal information as side colors. This function requires the 'heatmap.3' function and associated libraries.

```r
my_palette <- colorRampPalette(rev(RColorBrewer::brewer.pal(n = 3, name = "RdBu")))(n = 999)

chr <- as.numeric(CNA.test$chrom) %% 2+1
rbPal1 <- colorRampPalette(c('black','grey'))
CHR <- rbPal1(2)[as.numeric(chr)]
chr1 <- cbind(CHR,CHR)

rbPal5 <- colorRampPalette(RColorBrewer::brewer.pal(n = 8, name = "Dark2")[2:1])
com.preN <- pred.test$copykat.pred
pred <- rbPal5(2)[as.numeric(factor(com.preN))]

cells <- rbind(pred,pred)
col_breaks = c(seq(-1,-0.4,length=50),seq(-0.4,-0.2,length=150),seq(-0.2,0.2,length=600),seq(0.2,0.4,length=150),seq(0.4, 1,length=50))

heatmap.3(t(CNA.test[,4:ncol(CNA.test)]),dendrogram="r", distfun = function(x) parallelDist::parDist(x,threads =4, method = "euclidean"), hclustfun = function(x) hclust(x, method="ward.D2"),
            ColSideColors=chr1,RowSideColors=cells,Colv=NA, Rowv=TRUE,
            notecol="black",col=my_palette,breaks=col_breaks, key=TRUE,
            keysize=1, density.info="none", trace="none",
            cexRow=0.1,cexCol=0.1,cex.main=1,cex.lab=0.1,
            symm=F,symkey=F,symbreaks=T,cex=1, cex.main=4, margins=c(10,10))

legend("topright", paste("pred.",names(table(com.preN)),sep=""), pch=15,col=RColorBrewer::brewer.pal(n = 8, name = "Dark2")[2:1], cex=0.6, bty="n")
```

--------------------------------

### Export CopyKAT Predictions (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

This snippet shows how to export the filtered CopyKAT predictions to a tab-separated text file. This output can be used for further analysis or visualization in external tools.

```r
write.table(
  predictions_filtered,
  "sample_01_predictions.txt",
  sep = "\t",
  quote = FALSE,
  row.names = FALSE
)
```

--------------------------------

### Display First Columns of CNV Matrix (R)

Source: https://github.com/navinlabcode/copykat/blob/master/README.md

Shows the first five columns of the 'CNA.test' matrix, which holds the copy number alteration data. This helps in understanding the structure of the genomic data, including coordinates and bin information.

```r
head(CNA.test[ , 1:5])
```

--------------------------------

### Genome Segmentation using MCMC (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

Performs MCMC-based genome segmentation to identify copy number breakpoints in smoothed data. It utilizes Poisson-Gamma priors and Kolmogorov-Smirnov tests for segmentation. Inputs include cell cluster assignments, smoothed expression matrix, minimum genes per segment, and a correlation threshold. Outputs are segmented log-ratio values and breakpoint positions.

```r
# Internal segmentation function (called by copykat)
# Requires preprocessed, smoothed, baseline-adjusted expression data

# Example usage (simplified)
clusters <- c(rep(1, 50), rep(2, 30), rep(3, 20))
names(clusters) <- colnames(smoothed_matrix)

segments <- CNA.MCMC(
  clu = clusters,
  fttmat = smoothed_matrix,      # genes x cells, baseline-adjusted
  bins = 25,                      # minimum genes per segment
  cut.cor = 0.1,                  # KS test threshold
  n.cores = 4
)

# Extract results
segmented_matrix <- segments$logCNA       # segmented copy numbers
breakpoints <- segments$breaks            # genomic breakpoint positions
length(breakpoints)                       # number of segments
# [1] 142
```

--------------------------------

### Run copykat with known normal T cells (Human Genome)

Source: https://context7.com/navinlabcode/copykat/llms.txt

This snippet demonstrates how to run the copykat function with known normal T cell names for a human genome analysis. It assumes the raw expression matrix and T cell names are pre-defined. The function outputs segmentation files for visualization.

```r
# Assume you've identified T cells through CD3D, CD3E expression
tcell_names <- c("AAACCTGAGCAGCGTA-1", "AAACCTGAGCGATATA-1", "AAACCTGAGCTAACGG-1")

# Run with known normal cells
copykat.with_normal <- copykat(
  rawmat = exp_matrix,
  id.type = "S",
  ngene.chr = 5,
  LOW.DR = 0.05,                    # min gene detection rate for filtering
  UP.DR = 0.1,                      # min detection rate for segmentation
  win.size = 25,
  norm.cell.names = tcell_names,    # provide known normal cells
  KS.cut = 0.1,
  sam.name = "tumor_with_tcells",
  distance = "euclidean",
  output.seg = "TRUE",              # output IGV segment file
  genome = "hg20",
  n.cores = 4
)

# IGV segment file for visualization
# tumor_with_tcells_copykat_CNA_results.seg created
```

--------------------------------

### CopyKAT: Analysis with Known Normal Cells (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

Improves baseline accuracy and prediction reliability by providing identities of known normal cells, such as immune cells identified by external markers. This bypasses automatic normal cell detection, using provided cells as a diploid reference.

```r
# Prepare data from Seurat object
library(Seurat)
seurat_obj <- Read10X(data.dir = "path/to/cellranger/outs/filtered_feature_bc_matrix")
seurat_obj <- CreateSeuratObject(counts = seurat_obj, min.cells = 0, min.features = 0)
exp_matrix <- as.matrix(seurat_obj@assays$RNA@counts)

# Example usage (assuming exp_matrix is loaded and normal_cells is a vector of normal cell names)
# copykat.result.known_normal <- copykat(
#   rawmat = exp_matrix,
#   id.type = "S",
#   norm.cell.names = normal_cells,
#   sam.name = "tumor_with_known_normal",
#   genome = "hg20",
#   n.cores = 4
# )

```

--------------------------------

### Fine-tune copykat parameters for challenging datasets

Source: https://context7.com/navinlabcode/copykat/llms.txt

This snippet details advanced parameter tuning for copykat on challenging datasets, such as low-quality samples or tumors with few CNAs. Parameters like LOW.DR, UP.DR, win.size, KS.cut, and distance are adjusted to improve gene filtering, segmentation resolution, breakpoint sensitivity, and clustering performance. It also shows manual inspection of clustering results if automatic prediction fails.

```r
# Low quality data or pediatric/liquid tumors with few CNAs
copykat.tuned <- copykat(
  rawmat = challenging.data,
  id.type = "S",
  ngene.chr = 3,                    # relax chromosome gene requirement
  min.gene.per.cell = 150,          # lower threshold for sparse data
  LOW.DR = 0.03,                    # keep more lowly expressed genes
  UP.DR = 0.08,                     # lower segmentation threshold
  win.size = 15,                    # smaller bins for finer resolution
  KS.cut = 0.05,                    # higher sensitivity for breakpoints
  sam.name = "pediatric_tumor",
  distance = "spearman",            # correlation distance for noisy data
  norm.cell.names = known_normals,
  output.seg = "TRUE",
  plot.genes = "TRUE",
  genome = "hg20",
  n.cores = 8,
  timeout = 3600
)

# If automatic prediction fails, manually inspect clusters
hc_result <- copykat.tuned$hclustering
manual_clusters <- cutree(hc_result, k = 3)
```

--------------------------------

### Run copykat for mouse genome analysis (mm10)

Source: https://context7.com/navinlabcode/copykat/llms.txt

This snippet shows how to perform copykat analysis on mouse single-cell RNA-seq data using the mm10 genome annotation. Results are output in gene space, and aneuploid/diploid prediction requires manual validation. It reads mouse data from a file and specifies the 'mm10' genome.

```r
# Mouse tumor sample
mouse.data <- read.table("mouse_scrna_counts.txt", header = TRUE, row.names = 1)

copykat.mouse <- copykat(
  rawmat = mouse.data,
  id.type = "S",
  ngene.chr = 5,
  win.size = 25,
  KS.cut = 0.1,
  sam.name = "mouse_tumor",
  distance = "euclidean",
  genome = "mm10",                  # mouse genome
  n.cores = 4
)

# Output in gene-centric format
pred_mouse <- copykat.mouse$prediction
cna_mouse <- copykat.mouse$CNAmat
# Columns include: abspos, chromosome_name, start_position, mgi_symbol, then cells
head(cna_mouse[, 1:7])
```

--------------------------------

### Identify Diploid Cells by Clustering (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

Automatically identifies a cluster of diploid normal cells using integrative clustering and Gaussian mixture modeling. It calculates cluster variances, silhouette widths, and selects the cluster with the minimum variance as the diploid reference. This function is used when normal cell names are not provided.

```r
# Internal baseline detection (called by copykat when norm.cell.names not provided)

# Identify normal cells from smoothed expression data
baseline_result <- baseline.norm.cl(
  norm.mat.smooth = smoothed_data,
  min.cells = 5,
  n.cores = 4
)

# Extract components
diploid_baseline <- baseline_result$basel      # median expression of diploid cells
diploid_cells <- baseline_result$preN          # cell names predicted as diploid
cluster_assignment <- baseline_result$cl       # cluster IDs for all cells
confidence <- baseline_result$WNS              # warning if "unclassified.prediction"

# Low confidence scenarios trigger GMM fallback
if (confidence == "unclassified.prediction") {
  # Falls back to baseline.GMM for additional validation
  message("Low clustering confidence - GMM validation recommended")
}
```

--------------------------------

### CopyKAT: Cell Line Mode Analysis (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

Analyzes pure cell line samples or data lacking normal reference cells. This mode uses synthetic baselines derived from data variation, suitable for samples with only aneuploid/diploid populations. Accuracy should be independently validated.

```r
# Cell line with only tumor cells (no normal reference)
copykat.cellline <- copykat(
  rawmat = cellline.data,
  id.type = "S",
  cell.line = "yes",                # enables cell line mode
  ngene.chr = 5,
  win.size = 25,
  KS.cut = 0.1,
  sam.name = "K562_cellline",
  distance = "euclidean",
  genome = "hg20",
  n.cores = 8
)

# Results contain CNAmat and clustering only (no tumor/normal prediction)
cna_results <- copykat.cellline$CNAmat
clustering <- copykat.cellline$hclustering

```

--------------------------------

### Gaussian Mixture Model Validation for Diploid Cells (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

Serves as a fallback method for identifying diploid cells when clustering-based detection has low confidence. It fits a three-component normal mixture model to each cell's copy number profile and classifies cells based on the proportion of the neutral component. Requires smoothed copy number data and previous analysis results.

```r
# Fallback method for challenging datasets

gmm_baseline <- baseline.GMM(
  CNA.mat = smoothed_data,
  max.normal = 5,                # stop after finding 5 diploid cells
  mu.cut = 0.05,                 # neutral component threshold
  Nfraq.cut = 0.99,              # minimum neutral fraction
  RE.before = previous_result,   # previous clustering result
  n.cores = 4
)

# Returns same structure as baseline.norm.cl
diploid_baseline <- gmm_baseline$basel
diploid_cells <- gmm_baseline$preN
cluster_ids <- gmm_baseline$cl
confidence <- gmm_baseline$WNS
```

--------------------------------

### Add CopyKAT Predictions to Seurat Metadata (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

This code adds the CopyKAT predictions to the metadata of a Seurat object. It matches cell names from the CopyKAT predictions with the column names of the Seurat object and assigns the 'copykat_pred' column to the Seurat object's metadata.

```r
seurat_obj$copykat_pred <- predictions$copykat.pred[
  match(colnames(seurat_obj), predictions$cell.names)
]
```

--------------------------------

### Identify and Cluster Tumor Subclones (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

This snippet identifies tumor cells (aneuploid) and then extracts their copy number alteration (CNA) matrix. Hierarchical clustering is performed on the transpose of the tumor CNA matrix using Euclidean distance and the 'ward.D2' method to identify distinct tumor subclones. The results are then added to the Seurat object's metadata.

```r
tumor_cells <- predictions_filtered$cell.names[
  predictions_filtered$copykat.pred == "aneuploid"
]
cna_matrix <- ck_result$CNAmat
tumor_cna <- cna_matrix[, colnames(cna_matrix) %in% tumor_cells]

hcc_tumor <- hclust(
  parDist(t(tumor_cna), threads = 8, method = "euclidean"),
  method = "ward.D2"
)
tumor_subclones <- cutree(hcc_tumor, k = 2)

seurat_obj$subclone <- NA
seurat_obj$subclone[match(names(tumor_subclones), colnames(seurat_obj))] <-
  paste0("clone_", tumor_subclones)
```

--------------------------------

### Annotate Genes with Genomic Coordinates (R)

Source: https://context7.com/navinlabcode/copykat/llms.txt

Maps gene symbols or Ensembl IDs to genomic coordinates (chromosome, start/end positions, cytoband) using a built-in annotation database. This function is useful for integrating expression data with genomic location information. It takes a gene expression matrix and an ID type as input.

```r
# Internal function called by copykat(), can be used independently
data(full.anno)  # built-in annotation for 56,051 genes

# Annotate with gene symbols
raw_matrix <- matrix(rnorm(1000 * 100), nrow = 1000)
rownames(raw_matrix) <- full.anno$hgnc_symbol[1:1000]
colnames(raw_matrix) <- paste0("cell_", 1:100)

annotated <- annotateGenes.hg20(mat = raw_matrix, ID.type = "S")
head(annotated[, 1:10])
#   abspos chrom start_position end_position hgnc_symbol ensembl_gene_id band cell_1 cell_2
# 1  67072     1        67072      67072       OR4F5   ENSG00000000003  1p36.33  0.12  -0.45
```

--------------------------------

### Extract copykat Prediction and Copy Number Matrix

Source: https://github.com/navinlabcode/copykat/blob/master/README.md

This R code snippet shows how to extract the prediction results (tumor/normal classification) and the copy number alteration matrix from the output object of the copykat function. It filters the predictions to include only cells classified as 'aneuploid' or 'diploid' and then converts both the filtered predictions and the copy number matrix into data frames for further analysis.

```r
pred.test <- data.frame(copykat.test$prediction)
pred.test <- pred.test[which(pred.test$copykat.pred %in% c("aneuploid","diploid")),
CNA.test <- data.frame(copykat.test$CNAmat)
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.