### Run CopyKAT Analysis with Example Data Source: https://github.com/navinlabcode/copykat/blob/master/README.md This is a basic example of how to run the CopyKAT analysis after installation and data preparation. It shows the core function call using an example raw UMI matrix named 'exp.rawdata' and assigning a sample name 'test'. This initiates the copy number profiling and subclonal structure inference. ```r copykat.test <- copykat(rawmat=exp.rawdata, sam.name="test") ``` -------------------------------- ### Install CopyKAT R Package from GitHub Source: https://github.com/navinlabcode/copykat/blob/master/README.md This code snippet demonstrates how to install the CopyKAT R package directly from its GitHub repository using the devtools package. It's essential for getting the latest version of the tool. To update, users should first remove the old version before reinstalling. ```r library(devtools) install_github("navinlabcode/copykat") ``` ```r remove.packages("copykat") detach("package:copykat") ``` -------------------------------- ### Complete Single-Cell Tumor Analysis Workflow (R) Source: https://context7.com/navinlabcode/copykat/llms.txt Demonstrates an end-to-end workflow for single-cell tumor analysis, starting from 10X Genomics data. It covers data loading, quality filtering, CopyKAT analysis, subclone identification, and integration with Seurat for downstream analysis. Requires Seurat and copykat libraries. ```r # Step 1: Load 10X Genomics CellRanger output library(Seurat) library(copykat) raw_data <- Read10X(data.dir = "path/to/filtered_feature_bc_matrix") seurat_obj <- CreateSeuratObject( counts = raw_data, project = "tumor_analysis", min.cells = 3, min.features = 200 ) # Step 2: Extract raw count matrix for CopyKAT exp_matrix <- as.matrix(seurat_obj@assays$RNA@counts) dim(exp_matrix) # [1] 20000 5000 (genes x cells) ``` -------------------------------- ### Run CopyKAT and Extract Predictions (R) Source: https://context7.com/navinlabcode/copykat/llms.txt This snippet demonstrates how to run the copykat function on an expression matrix and extract the cell predictions. It requires an expression matrix ('exp_matrix') and sets various parameters for the analysis, including gene identification, window size, significance cutoffs, sample name, distance metric, and number of cores. The output includes predictions for each cell. ```r ck_result <- copykat( rawmat = exp_matrix, id.type = "S", ngene.chr = 5, win.size = 25, KS.cut = 0.1, sam.name = "sample_01", distance = "euclidean", n.cores = 8 ) predictions <- ck_result$prediction predictions_filtered <- predictions[predictions$copykat.pred != "not.defined", ] table(predictions_filtered$copykat.pred) ``` -------------------------------- ### CopyKAT: Primary Copy Number Analysis Pipeline (R) Source: https://context7.com/navinlabcode/copykat/llms.txt Performs end-to-end analysis for copy number variations, cell classification (aneuploid/diploid), and visualization from scRNA-seq data. It includes data preprocessing, segmentation, and clustering, requiring gene expression data and genome build information. ```r # Load library and example data library(copykat) data(exp.rawdata) # Basic tumor/normal analysis with human genome copykat.result <- copykat( rawmat = exp.rawdata, id.type = "S", # "S" for gene Symbol, "E" for Ensembl ngene.chr = 5, # min genes per chromosome for cell filtering win.size = 25, # min genes per segment (15-150) KS.cut = 0.1, # segmentation sensitivity (0.05-0.15) sam.name = "breast_tumor", distance = "euclidean", # or "pearson", "spearman" norm.cell.names = "", # vector of known normal cell names output.seg = "FALSE", # set "TRUE" for IGV .seg files plot.genes = "TRUE", # plot gene-level heatmap genome = "hg20", # or "mm10" for mouse n.cores = 4 ) # Extract prediction results (aneuploid/diploid classification) predictions <- data.frame(copykat.result$prediction) predictions <- predictions[predictions$copykat.pred %in% c("aneuploid", "diploid"), ] head(predictions) # cell.names copykat.pred # 1 cell_001 aneuploid # 2 cell_002 diploid # 3 cell_003 aneuploid # Extract copy number matrix (220kb genomic bins by cells) cna_matrix <- data.frame(copykat.result$CNAmat) head(cna_matrix[, 1:5]) # chrom chrompos abspos cell_001 cell_002 # 1 1 850000 850000 0.023145 -0.012456 # 2 1 1070000 1070000 0.045678 0.001234 # Access hierarchical clustering object hclust_obj <- copykat.result$hclustering # Files automatically saved: # - breast_tumor_copykat_prediction.txt # - breast_tumor_copykat_CNA_results.txt # - breast_tumor_copykat_heatmap.jpeg # - breast_tumor_copykat_with_genes_heatmap.pdf # - breast_tumor_copykat_clustering_results.rds ``` -------------------------------- ### Prepare Input Matrix from 10X Genomics Output using Seurat Source: https://github.com/navinlabcode/copykat/blob/master/README.md This code snippet shows how to read raw count data from 10X Genomics cellranger output and prepare it as a matrix suitable for CopyKAT. It utilizes the Seurat R package to read the data, create a Seurat object, and extract the raw counts into a matrix format. The resulting matrix can then be saved for future use. ```r library(Seurat) raw <- Read10X(data.dir = data.path.to.cellranger.outs) raw <- CreateSeuratObject(counts = raw, project = "copycat.test", min.cells = 0, min.features = 0) exp.rawdata <- as.matrix(raw@assays$RNA@counts) ``` ```r write.table(exp.rawdata, file="exp.rawdata.txt", sep="\t", quote = FALSE, row.names = TRUE) ``` -------------------------------- ### Define and Visualize Tumor Cell Subpopulations (R) Source: https://github.com/navinlabcode/copykat/blob/master/README.md Identifies and visualizes two subpopulations of tumor cells based on their copy number profiles. It filters for aneuploid cells, performs hierarchical clustering, and generates a heatmap highlighting these defined subpopulations. Requires 'heatmap.3', 'parallelDist', and 'RColorBrewer' libraries. ```r tumor.cells <- pred.test$cell.names[which(pred.test$copykat.pred=="aneuploid")] tumor.mat <- CNA.test[, which(colnames(CNA.test) %in% tumor.cells)] hcc <- hclust(parallelDist::parDist(t(tumor.mat),threads =4, method = "euclidean"), method = "ward.D2") hc.umap <- cutree(hcc,2) rbPal6 <- colorRampPalette(RColorBrewer::brewer.pal(n = 8, name = "Dark2")[3:4]) subpop <- rbPal6(2)[as.numeric(factor(hc.umap))] cells <- rbind(subpop,subpop) heatmap.3(t(tumor.mat),dendrogram="r", distfun = function(x) parallelDist::parDist(x,threads =4, method = "euclidean"), hclustfun = function(x) hclust(x, method="ward.D2"), ColSideColors=chr1,RowSideColors=cells,Colv=NA, Rowv=TRUE, notecol="black",col=my_palette,breaks=col_breaks, key=TRUE, keysize=1, density.info="none", trace="none", cexRow=0.1,cexCol=0.1,cex.main=1,cex.lab=0.1, symm=F,symkey=F,symbreaks=T,cex=1, cex.main=4, margins=c(10,10)) legend("topright", c("c1","c2"), pch=15,col=RColorBrewer::brewer.pal(n = 8, name = "Dark2")[3:4], cex=0.9, bty='n') ``` -------------------------------- ### Standard Seurat Workflow and Visualization (R) Source: https://context7.com/navinlabcode/copykat/llms.txt This section outlines a standard Seurat workflow for data normalization, feature identification, scaling, dimensionality reduction (PCA and UMAP), and visualization of CopyKAT predictions. It includes plotting the UMAP colored by CopyKAT predictions and tumor subclones. ```r seurat_obj <- NormalizeData(seurat_obj) seurat_obj <- FindVariableFeatures(seurat_obj) seurat_obj <- ScaleData(seurat_obj) seurat_obj <- RunPCA(seurat_obj) seurat_obj <- RunUMAP(seurat_obj, dims = 1:30) DimPlot(seurat_obj, group.by = "copykat_pred", cols = c("red", "blue", "grey")) DimPlot(seurat_obj, group.by = "subclone", cells = tumor_cells) ``` -------------------------------- ### Run copykat Analysis with Default and Custom Parameters Source: https://github.com/navinlabcode/copykat/blob/master/README.md This R code snippet demonstrates how to execute the copykat function with specified parameters. It takes raw expression data, sets gene ID type, filters cells and genes, defines segmentation parameters, enables parallel processing, assigns a sample name, specifies a distance metric for clustering, and controls output formats for segmentation files and gene plots. The genome version is also specified. ```r library(copykat) copykat.test <- copykat(rawmat=exp.rawdata, id.type="S", ngene.chr=5, win.size=25, KS.cut=0.1, sam.name="test", distance="euclidean", norm.cell.names="",output.seg="FLASE", plot.genes="TRUE", genome="hg20",n.cores=1) ``` -------------------------------- ### Identify subclones using hierarchical clustering and heatmap visualization Source: https://context7.com/navinlabcode/copykat/llms.txt This snippet outlines the process of identifying subclones from aneuploid tumor cells identified by copykat. It involves extracting tumor cells and their corresponding CNV data, performing hierarchical clustering using `parDist` and `hclust`, and then cutting the tree into subclone groups. Finally, it visualizes these subclones using a heatmap with specific color schemes for chromosomes and subclones. ```r # Extract tumor cells only tumor_cells <- predictions$cell.names[predictions$copykat.pred == "aneuploid"] tumor_cna <- cna_matrix[, which(colnames(cna_matrix) %in% tumor_cells)] # Hierarchical clustering to identify subclones library(parallelDist) library(RColorBrewer) library(gplots) hcc <- hclust( parDist(t(tumor_cna), threads = 4, method = "euclidean"), method = "ward.D2" ) # Cut into 2 subclones (adjust k for more subpopulations) subclones <- cutree(hcc, k = 2) table(subclones) # subclones # 1 2 # 456 234 # Visualize subclone heatmap my_palette <- colorRampPalette(rev(brewer.pal(n = 3, name = "RdBu")))(n = 999) chr_colors <- as.numeric(cna_matrix$chrom) %% 2 + 1 CHR <- colorRampPalette(c('black', 'grey'))(2)[chr_colors] chr_sidebar <- cbind(CHR, CHR) subclone_colors <- colorRampPalette(brewer.pal(n = 8, name = "Dark2")[3:4])(2)[subclones] cell_sidebar <- rbind(subclone_colors, subclone_colors) col_breaks <- c( seq(-1, -0.4, length = 50), seq(-0.4, -0.2, length = 150), seq(-0.2, 0.2, length = 600), seq(0.2, 0.4, length = 150), seq(0.4, 1, length = 50) ) heatmap.3( t(tumor_cna), dendrogram = "r", distfun = function(x) parDist(x, threads = 4, method = "euclidean"), hclustfun = function(x) hclust(x, method = "ward.D2"), ColSideColors = chr_sidebar, RowSideColors = cell_sidebar, Colv = NA, Rowv = TRUE, col = my_palette, breaks = col_breaks, key = TRUE, trace = "none", margins = c(10, 10) ) ``` -------------------------------- ### Display First Rows of Predicted Copy Number Data (R) Source: https://github.com/navinlabcode/copykat/blob/master/README.md Displays the first few rows of the 'pred.test' data frame, which contains the predicted copy number status for cells. This is useful for a quick inspection of the prediction results. ```r head(pred.test) ``` -------------------------------- ### Generate Copy Number Heatmap with Cell Subpopulations (R) Source: https://github.com/navinlabcode/copykat/blob/master/README.md Generates a heatmap visualizing copy number variations across genomic bins for single cells. It incorporates cell type predictions (aneuploid/diploid) and chromosomal information as side colors. This function requires the 'heatmap.3' function and associated libraries. ```r my_palette <- colorRampPalette(rev(RColorBrewer::brewer.pal(n = 3, name = "RdBu")))(n = 999) chr <- as.numeric(CNA.test$chrom) %% 2+1 rbPal1 <- colorRampPalette(c('black','grey')) CHR <- rbPal1(2)[as.numeric(chr)] chr1 <- cbind(CHR,CHR) rbPal5 <- colorRampPalette(RColorBrewer::brewer.pal(n = 8, name = "Dark2")[2:1]) com.preN <- pred.test$copykat.pred pred <- rbPal5(2)[as.numeric(factor(com.preN))] cells <- rbind(pred,pred) col_breaks = c(seq(-1,-0.4,length=50),seq(-0.4,-0.2,length=150),seq(-0.2,0.2,length=600),seq(0.2,0.4,length=150),seq(0.4, 1,length=50)) heatmap.3(t(CNA.test[,4:ncol(CNA.test)]),dendrogram="r", distfun = function(x) parallelDist::parDist(x,threads =4, method = "euclidean"), hclustfun = function(x) hclust(x, method="ward.D2"), ColSideColors=chr1,RowSideColors=cells,Colv=NA, Rowv=TRUE, notecol="black",col=my_palette,breaks=col_breaks, key=TRUE, keysize=1, density.info="none", trace="none", cexRow=0.1,cexCol=0.1,cex.main=1,cex.lab=0.1, symm=F,symkey=F,symbreaks=T,cex=1, cex.main=4, margins=c(10,10)) legend("topright", paste("pred.",names(table(com.preN)),sep=""), pch=15,col=RColorBrewer::brewer.pal(n = 8, name = "Dark2")[2:1], cex=0.6, bty="n") ``` -------------------------------- ### Export CopyKAT Predictions (R) Source: https://context7.com/navinlabcode/copykat/llms.txt This snippet shows how to export the filtered CopyKAT predictions to a tab-separated text file. This output can be used for further analysis or visualization in external tools. ```r write.table( predictions_filtered, "sample_01_predictions.txt", sep = "\t", quote = FALSE, row.names = FALSE ) ``` -------------------------------- ### Display First Columns of CNV Matrix (R) Source: https://github.com/navinlabcode/copykat/blob/master/README.md Shows the first five columns of the 'CNA.test' matrix, which holds the copy number alteration data. This helps in understanding the structure of the genomic data, including coordinates and bin information. ```r head(CNA.test[ , 1:5]) ``` -------------------------------- ### Genome Segmentation using MCMC (R) Source: https://context7.com/navinlabcode/copykat/llms.txt Performs MCMC-based genome segmentation to identify copy number breakpoints in smoothed data. It utilizes Poisson-Gamma priors and Kolmogorov-Smirnov tests for segmentation. Inputs include cell cluster assignments, smoothed expression matrix, minimum genes per segment, and a correlation threshold. Outputs are segmented log-ratio values and breakpoint positions. ```r # Internal segmentation function (called by copykat) # Requires preprocessed, smoothed, baseline-adjusted expression data # Example usage (simplified) clusters <- c(rep(1, 50), rep(2, 30), rep(3, 20)) names(clusters) <- colnames(smoothed_matrix) segments <- CNA.MCMC( clu = clusters, fttmat = smoothed_matrix, # genes x cells, baseline-adjusted bins = 25, # minimum genes per segment cut.cor = 0.1, # KS test threshold n.cores = 4 ) # Extract results segmented_matrix <- segments$logCNA # segmented copy numbers breakpoints <- segments$breaks # genomic breakpoint positions length(breakpoints) # number of segments # [1] 142 ``` -------------------------------- ### Run copykat with known normal T cells (Human Genome) Source: https://context7.com/navinlabcode/copykat/llms.txt This snippet demonstrates how to run the copykat function with known normal T cell names for a human genome analysis. It assumes the raw expression matrix and T cell names are pre-defined. The function outputs segmentation files for visualization. ```r # Assume you've identified T cells through CD3D, CD3E expression tcell_names <- c("AAACCTGAGCAGCGTA-1", "AAACCTGAGCGATATA-1", "AAACCTGAGCTAACGG-1") # Run with known normal cells copykat.with_normal <- copykat( rawmat = exp_matrix, id.type = "S", ngene.chr = 5, LOW.DR = 0.05, # min gene detection rate for filtering UP.DR = 0.1, # min detection rate for segmentation win.size = 25, norm.cell.names = tcell_names, # provide known normal cells KS.cut = 0.1, sam.name = "tumor_with_tcells", distance = "euclidean", output.seg = "TRUE", # output IGV segment file genome = "hg20", n.cores = 4 ) # IGV segment file for visualization # tumor_with_tcells_copykat_CNA_results.seg created ``` -------------------------------- ### CopyKAT: Analysis with Known Normal Cells (R) Source: https://context7.com/navinlabcode/copykat/llms.txt Improves baseline accuracy and prediction reliability by providing identities of known normal cells, such as immune cells identified by external markers. This bypasses automatic normal cell detection, using provided cells as a diploid reference. ```r # Prepare data from Seurat object library(Seurat) seurat_obj <- Read10X(data.dir = "path/to/cellranger/outs/filtered_feature_bc_matrix") seurat_obj <- CreateSeuratObject(counts = seurat_obj, min.cells = 0, min.features = 0) exp_matrix <- as.matrix(seurat_obj@assays$RNA@counts) # Example usage (assuming exp_matrix is loaded and normal_cells is a vector of normal cell names) # copykat.result.known_normal <- copykat( # rawmat = exp_matrix, # id.type = "S", # norm.cell.names = normal_cells, # sam.name = "tumor_with_known_normal", # genome = "hg20", # n.cores = 4 # ) ``` -------------------------------- ### Fine-tune copykat parameters for challenging datasets Source: https://context7.com/navinlabcode/copykat/llms.txt This snippet details advanced parameter tuning for copykat on challenging datasets, such as low-quality samples or tumors with few CNAs. Parameters like LOW.DR, UP.DR, win.size, KS.cut, and distance are adjusted to improve gene filtering, segmentation resolution, breakpoint sensitivity, and clustering performance. It also shows manual inspection of clustering results if automatic prediction fails. ```r # Low quality data or pediatric/liquid tumors with few CNAs copykat.tuned <- copykat( rawmat = challenging.data, id.type = "S", ngene.chr = 3, # relax chromosome gene requirement min.gene.per.cell = 150, # lower threshold for sparse data LOW.DR = 0.03, # keep more lowly expressed genes UP.DR = 0.08, # lower segmentation threshold win.size = 15, # smaller bins for finer resolution KS.cut = 0.05, # higher sensitivity for breakpoints sam.name = "pediatric_tumor", distance = "spearman", # correlation distance for noisy data norm.cell.names = known_normals, output.seg = "TRUE", plot.genes = "TRUE", genome = "hg20", n.cores = 8, timeout = 3600 ) # If automatic prediction fails, manually inspect clusters hc_result <- copykat.tuned$hclustering manual_clusters <- cutree(hc_result, k = 3) ``` -------------------------------- ### Run copykat for mouse genome analysis (mm10) Source: https://context7.com/navinlabcode/copykat/llms.txt This snippet shows how to perform copykat analysis on mouse single-cell RNA-seq data using the mm10 genome annotation. Results are output in gene space, and aneuploid/diploid prediction requires manual validation. It reads mouse data from a file and specifies the 'mm10' genome. ```r # Mouse tumor sample mouse.data <- read.table("mouse_scrna_counts.txt", header = TRUE, row.names = 1) copykat.mouse <- copykat( rawmat = mouse.data, id.type = "S", ngene.chr = 5, win.size = 25, KS.cut = 0.1, sam.name = "mouse_tumor", distance = "euclidean", genome = "mm10", # mouse genome n.cores = 4 ) # Output in gene-centric format pred_mouse <- copykat.mouse$prediction cna_mouse <- copykat.mouse$CNAmat # Columns include: abspos, chromosome_name, start_position, mgi_symbol, then cells head(cna_mouse[, 1:7]) ``` -------------------------------- ### Identify Diploid Cells by Clustering (R) Source: https://context7.com/navinlabcode/copykat/llms.txt Automatically identifies a cluster of diploid normal cells using integrative clustering and Gaussian mixture modeling. It calculates cluster variances, silhouette widths, and selects the cluster with the minimum variance as the diploid reference. This function is used when normal cell names are not provided. ```r # Internal baseline detection (called by copykat when norm.cell.names not provided) # Identify normal cells from smoothed expression data baseline_result <- baseline.norm.cl( norm.mat.smooth = smoothed_data, min.cells = 5, n.cores = 4 ) # Extract components diploid_baseline <- baseline_result$basel # median expression of diploid cells diploid_cells <- baseline_result$preN # cell names predicted as diploid cluster_assignment <- baseline_result$cl # cluster IDs for all cells confidence <- baseline_result$WNS # warning if "unclassified.prediction" # Low confidence scenarios trigger GMM fallback if (confidence == "unclassified.prediction") { # Falls back to baseline.GMM for additional validation message("Low clustering confidence - GMM validation recommended") } ``` -------------------------------- ### CopyKAT: Cell Line Mode Analysis (R) Source: https://context7.com/navinlabcode/copykat/llms.txt Analyzes pure cell line samples or data lacking normal reference cells. This mode uses synthetic baselines derived from data variation, suitable for samples with only aneuploid/diploid populations. Accuracy should be independently validated. ```r # Cell line with only tumor cells (no normal reference) copykat.cellline <- copykat( rawmat = cellline.data, id.type = "S", cell.line = "yes", # enables cell line mode ngene.chr = 5, win.size = 25, KS.cut = 0.1, sam.name = "K562_cellline", distance = "euclidean", genome = "hg20", n.cores = 8 ) # Results contain CNAmat and clustering only (no tumor/normal prediction) cna_results <- copykat.cellline$CNAmat clustering <- copykat.cellline$hclustering ``` -------------------------------- ### Gaussian Mixture Model Validation for Diploid Cells (R) Source: https://context7.com/navinlabcode/copykat/llms.txt Serves as a fallback method for identifying diploid cells when clustering-based detection has low confidence. It fits a three-component normal mixture model to each cell's copy number profile and classifies cells based on the proportion of the neutral component. Requires smoothed copy number data and previous analysis results. ```r # Fallback method for challenging datasets gmm_baseline <- baseline.GMM( CNA.mat = smoothed_data, max.normal = 5, # stop after finding 5 diploid cells mu.cut = 0.05, # neutral component threshold Nfraq.cut = 0.99, # minimum neutral fraction RE.before = previous_result, # previous clustering result n.cores = 4 ) # Returns same structure as baseline.norm.cl diploid_baseline <- gmm_baseline$basel diploid_cells <- gmm_baseline$preN cluster_ids <- gmm_baseline$cl confidence <- gmm_baseline$WNS ``` -------------------------------- ### Add CopyKAT Predictions to Seurat Metadata (R) Source: https://context7.com/navinlabcode/copykat/llms.txt This code adds the CopyKAT predictions to the metadata of a Seurat object. It matches cell names from the CopyKAT predictions with the column names of the Seurat object and assigns the 'copykat_pred' column to the Seurat object's metadata. ```r seurat_obj$copykat_pred <- predictions$copykat.pred[ match(colnames(seurat_obj), predictions$cell.names) ] ``` -------------------------------- ### Identify and Cluster Tumor Subclones (R) Source: https://context7.com/navinlabcode/copykat/llms.txt This snippet identifies tumor cells (aneuploid) and then extracts their copy number alteration (CNA) matrix. Hierarchical clustering is performed on the transpose of the tumor CNA matrix using Euclidean distance and the 'ward.D2' method to identify distinct tumor subclones. The results are then added to the Seurat object's metadata. ```r tumor_cells <- predictions_filtered$cell.names[ predictions_filtered$copykat.pred == "aneuploid" ] cna_matrix <- ck_result$CNAmat tumor_cna <- cna_matrix[, colnames(cna_matrix) %in% tumor_cells] hcc_tumor <- hclust( parDist(t(tumor_cna), threads = 8, method = "euclidean"), method = "ward.D2" ) tumor_subclones <- cutree(hcc_tumor, k = 2) seurat_obj$subclone <- NA seurat_obj$subclone[match(names(tumor_subclones), colnames(seurat_obj))] <- paste0("clone_", tumor_subclones) ``` -------------------------------- ### Annotate Genes with Genomic Coordinates (R) Source: https://context7.com/navinlabcode/copykat/llms.txt Maps gene symbols or Ensembl IDs to genomic coordinates (chromosome, start/end positions, cytoband) using a built-in annotation database. This function is useful for integrating expression data with genomic location information. It takes a gene expression matrix and an ID type as input. ```r # Internal function called by copykat(), can be used independently data(full.anno) # built-in annotation for 56,051 genes # Annotate with gene symbols raw_matrix <- matrix(rnorm(1000 * 100), nrow = 1000) rownames(raw_matrix) <- full.anno$hgnc_symbol[1:1000] colnames(raw_matrix) <- paste0("cell_", 1:100) annotated <- annotateGenes.hg20(mat = raw_matrix, ID.type = "S") head(annotated[, 1:10]) # abspos chrom start_position end_position hgnc_symbol ensembl_gene_id band cell_1 cell_2 # 1 67072 1 67072 67072 OR4F5 ENSG00000000003 1p36.33 0.12 -0.45 ``` -------------------------------- ### Extract copykat Prediction and Copy Number Matrix Source: https://github.com/navinlabcode/copykat/blob/master/README.md This R code snippet shows how to extract the prediction results (tumor/normal classification) and the copy number alteration matrix from the output object of the copykat function. It filters the predictions to include only cells classified as 'aneuploid' or 'diploid' and then converts both the filtered predictions and the copy number matrix into data frames for further analysis. ```r pred.test <- data.frame(copykat.test$prediction) pred.test <- pred.test[which(pred.test$copykat.pred %in% c("aneuploid","diploid")), CNA.test <- data.frame(copykat.test$CNAmat) ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.