Try Live
Add Docs
Rankings
Pricing
Docs
Install
Install
Docs
Pricing
More...
More...
Try Live
Rankings
Enterprise
Create API Key
Add Docs
scTenifoldKnk
https://github.com/cailab-tamu/sctenifoldknk
Admin
scTenifoldKnk is an R package for performing in-silico knockout experiments on single-cell gene
...
Tokens:
3,457
Snippets:
21
Trust Score:
9.1
Update:
1 week ago
Context
Skills
Chat
Benchmark
90.6
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# scTenifoldKnk scTenifoldKnk is a machine learning workflow for performing virtual (in-silico) knockout experiments using single-cell RNA sequencing (scRNA-seq) data. The package constructs single-cell gene regulatory networks (scGRNs) from wild-type control samples, simulates gene knockouts by zeroing outgoing edges from target genes in the network adjacency matrix, and then identifies differentially regulated genes through manifold alignment comparison between the original and knocked-out networks. Built on top of the scTenifoldNet framework, scTenifoldKnk enables researchers to assess the functional impact of gene knockouts without requiring actual knockout experiments. The workflow applies CANDECOMP/PARAFAC (CP) tensor decomposition to denoise network ensembles, uses non-linear manifold alignment to compare network states, and employs statistical testing (Box-Cox transformation with chi-square distribution) to identify genes significantly perturbed by the simulated knockout. The package is available in R (primary implementation), Python, and MATLAB versions. ## scTenifoldKnk - Main Virtual Knockout Function The primary function that performs the complete virtual knockout experiment workflow. It takes a gene expression count matrix and a target gene, then constructs gene regulatory networks, simulates the knockout, aligns manifolds, and computes differential regulation statistics. ```r # Install the package library(remotes) install_github('cailab-tamu/scTenifoldKnk') library(scTenifoldKnk) # Load single-cell RNA-seq count matrix (genes x cells) # Rows: gene names, Columns: cell barcodes scRNAseq <- read.csv("expression_matrix.csv", row.names = 1) # Run virtual knockout experiment for gene "Trem2" # With quality control enabled (default) result <- scTenifoldKnk( countMatrix = scRNAseq, gKO = "Trem2", # Gene to knock out qc = TRUE, # Enable quality control filtering qc_mtThreshold = 0.1, # Max 10% mitochondrial reads per cell qc_minLSize = 1000, # Minimum library size per cell qc_minCells = 25, # Gene must be expressed in >= 25 cells nc_nNet = 10, # Number of networks to generate nc_nCells = 500, # Cells to subsample per network nc_nComp = 3, # PCA components for network construction nc_q = 0.9, # Top 90% edges retained nc_lambda = 0, # Directionality enforcement (0-1) td_K = 3, # Tensor decomposition rank td_maxIter = 1000, # Max iterations for tensor decomposition td_maxError = 1e-05, # Error tolerance ma_nDim = 2, # Manifold alignment dimensions nCores = parallel::detectCores() # Use all available cores ) # Access results # 1. Reconstructed gene regulatory networks (sparse matrices) wt_network <- result$tensorNetworks$WT # Wild-type network ko_network <- result$tensorNetworks$KO # Knocked-out network # 2. Manifold alignment coordinates manifold <- result$manifoldAlignment # 3. Differential regulation results (data.frame) diff_reg <- result$diffRegulation # Columns: gene, distance, Z, FC, p.value, p.adj # Get significantly perturbed genes (FDR < 0.05) perturbed_genes <- diff_reg$gene[diff_reg$p.adj < 0.05] print(head(diff_reg[diff_reg$p.adj < 0.05, ])) #> gene distance Z FC p.value p.adj #> 1 Trem2 2.451623 5.234512 15.23456 9.45e-08 3.21e-05 #> 2 Apoe 1.823451 4.123456 10.12345 1.23e-06 2.15e-04 #> 3 Lpl 1.654321 3.876543 8.76543 3.45e-06 4.56e-04 # Save results to CSV write.csv(diff_reg, "knockout_results.csv", row.names = FALSE) ``` ## scQC - Single-Cell Quality Control Internal function that filters cells based on library size, mitochondrial read ratio, and gene detection. Supports both raw matrices and Seurat objects. ```r library(scTenifoldKnk) library(Matrix) # Load sparse matrix from 10X Genomics format countMatrix <- readMM("matrix.mtx") rownames(countMatrix) <- readLines("genes.txt") colnames(countMatrix) <- readLines("barcodes.txt") # Apply quality control filtering # Removes cells with: # - Library size < 1000 reads # - Mitochondrial ratio > 10% # - Outlier gene detection patterns filtered_matrix <- scQC( X = countMatrix, mtThreshold = 0.1, # Max 10% mitochondrial reads minLSize = 1000 # Minimum 1000 reads per cell ) # Check dimensions before/after filtering cat("Before QC:", ncol(countMatrix), "cells\n") cat("After QC:", ncol(filtered_matrix), "cells\n") # Works with Seurat objects too library(Seurat) seurat_obj <- CreateSeuratObject(countMatrix) filtered_seurat <- scQC(seurat_obj, mtThreshold = 0.1, minLSize = 1000) ``` ## plotKO - Knockout Network Visualization Generates an interactive network plot centered on the knocked-out gene, showing differentially regulated genes and their interactions. Optionally performs pathway enrichment analysis and overlays functional annotations. ```r library(scTenifoldKnk) # Run knockout experiment first result <- scTenifoldKnk(countMatrix = scRNAseq, gKO = "Cftr", qc_minLSize = 0) # Plot the knockout-centered subnetwork with enrichment annotation png("knockout_network.png", width = 3000, height = 3000, res = 300) plotKO( X = result, # Output from scTenifoldKnk gKO = "Cftr", # Knocked out gene (center of network) q = 0.99, # Keep top 1% strongest edges annotate = TRUE, # Perform enrichment analysis nCategories = 20, # Max enrichment categories in legend fdrThreshold = 0.05 # FDR threshold for enrichment terms ) dev.off() # Plot without enrichment annotation (faster) png("knockout_network_simple.png", width = 2000, height = 2000, res = 300) plotKO(result, gKO = "Cftr", annotate = FALSE) dev.off() # Enrichment queries these databases: # - KEGG_2019_Human # - GO_Biological_Process_2018 # - GO_Cellular_Component_2018 # - GO_Molecular_Function_2018 # - BioPlanet_2019 # - WikiPathways_2019_Human # - Reactome_2016 ``` ## dRegulation - Differential Regulation Analysis Internal function that computes differential regulation statistics from manifold alignment output. Calculates Euclidean distances between gene positions in WT vs KO manifold spaces and applies Box-Cox transformation for statistical testing. ```r # This function is called internally by scTenifoldKnk # but can be used separately for custom analyses # After obtaining manifold alignment from scTenifoldNet library(scTenifoldNet) # Build networks manually WT_networks <- makeNetworks(countMatrix, q = 0.9, nNet = 10) WT_tensor <- tensorDecomposition(WT_networks, K = 3) # Create knockout by zeroing gene's outgoing edges WT_matrix <- as.matrix(WT_tensor$X) KO_matrix <- WT_matrix KO_matrix["TargetGene", ] <- 0 # Perform manifold alignment MA <- manifoldAlignment(WT_matrix, KO_matrix, d = 2) # Compute differential regulation # Returns data.frame with columns: # - gene: Gene identifier # - distance: Euclidean distance in manifold space # - Z: Z-score after Box-Cox transformation # - FC: Fold change relative to expectation # - p.value: Chi-square p-value # - p.adj: FDR-adjusted p-value DR <- dRegulation(MA, gKO = "TargetGene") # Interpret results significant_genes <- DR[DR$p.adj < 0.05, ] print(significant_genes[order(significant_genes$distance, decreasing = TRUE), ]) ``` ## Complete Workflow with Downstream Analysis Full analysis pipeline including knockout, result extraction, enrichment analysis with fgsea, and visualization. ```r library(scTenifoldKnk) library(Matrix) library(fgsea) library(ggplot2) # Load and preprocess data countMatrix <- readMM("matrix.mtx") rownames(countMatrix) <- readLines("genes.txt") colnames(countMatrix) <- readLines("barcodes.txt") # Filter ribosomal and mitochondrial genes (optional preprocessing) countMatrix <- countMatrix[!grepl('^Rpl|^Rps|^Mt-', rownames(countMatrix), ignore.case = TRUE), ] # Keep genes expressed in >5% of cells countMatrix <- countMatrix[rowMeans(countMatrix != 0) > 0.05, ] # Run virtual knockout result <- scTenifoldKnk( countMatrix = countMatrix, gKO = "Trem2", qc = TRUE, qc_minLSize = 1000, nc_nNet = 10, nc_nCells = 500 ) # Extract Z-scores for enrichment analysis Z_scores <- result$diffRegulation$Z names(Z_scores) <- toupper(result$diffRegulation$gene) # Load pathway databases from Enrichr GO_BP <- gmtPathways('https://amp.pharm.mssm.edu/Enrichr/geneSetLibrary?mode=text&libraryName=GO_Biological_Process_2018') REACTOME <- gmtPathways('https://amp.pharm.mssm.edu/Enrichr/geneSetLibrary?mode=text&libraryName=Reactome_2016') # Run gene set enrichment analysis set.seed(1) enrichment <- fgseaMultilevel(GO_BP, Z_scores) # Filter significant pathways (positive enrichment, FDR < 0.05) enrichment <- enrichment[enrichment$NES > 0 & enrichment$padj < 0.05, ] enrichment <- enrichment[order(enrichment$padj), ] # Format leading edge genes enrichment$leadingEdge <- sapply(enrichment$leadingEdge, function(x) paste0(x, collapse = ";")) # Save enrichment results write.csv(enrichment, "pathway_enrichment.csv", row.names = FALSE) # Plot enrichment for specific pathway png("enrichment_plot.png", width = 1000, height = 1000, res = 300) pathway_name <- enrichment$pathway[1] # Most significant pathway plotEnrichment(GO_BP[[pathway_name]], Z_scores) + labs( title = pathway_name, subtitle = paste0("FDR = ", formatC(enrichment$padj[1], format = "e", digits = 2)) ) + xlab("Gene rank") + ylab("Enrichment Score") + theme_bw() dev.off() # Generate knockout network visualization png("knockout_network_annotated.png", width = 3500, height = 3500, res = 300) plotKO(result, gKO = "Trem2", nCategories = 10, fdrThreshold = 0.05) dev.off() # Export significant perturbed genes sig_genes <- result$diffRegulation[result$diffRegulation$p.adj < 0.05, ] write.csv(sig_genes, "significant_perturbed_genes.csv", row.names = FALSE) cat("Found", nrow(sig_genes), "significantly perturbed genes\n") ``` ## Summary scTenifoldKnk is designed for researchers studying gene function in single-cell contexts, particularly useful for validating knockout candidates before wet-lab experiments, understanding gene regulatory relationships, and identifying potential therapeutic targets. The main use cases include: (1) predicting the downstream effects of knocking out transcription factors, disease genes, or drug targets; (2) comparing virtual knockout perturbations with actual experimental knockout data; and (3) identifying genes and pathways functionally connected to genes of interest. Integration patterns typically involve preprocessing scRNA-seq data (filtering low-quality cells and lowly-expressed genes), running the scTenifoldKnk workflow, and performing downstream enrichment analysis with packages like fgsea, enrichR, or clusterProfiler. The output seamlessly integrates with standard bioinformatics visualization tools (ggplot2, igraph) and can be combined with differential expression results from Seurat or other single-cell analysis frameworks to validate findings across computational and experimental approaches.