### Install Cell2net from PyPI Source: https://github.com/pinellolab/cell2net/blob/main/docs/install.md Use this command to install the latest stable version of cell2net from the Python Package Index. This is the recommended method for most users. ```bash pip install cell2net ``` -------------------------------- ### Install Cell2net from GitHub Source: https://github.com/pinellolab/cell2net/blob/main/docs/install.md Install the latest development version of cell2net directly from its GitHub repository. This is useful for accessing the newest features or bug fixes before they are released on PyPI. ```bash pip install git+https://github.com/pinellolab/cell2net.git ``` -------------------------------- ### Start Training for ZHX3 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of model training for the ZHX3 gene. ```text Output: Training model for gene: ZHX3 ``` -------------------------------- ### Setup Directories and Paths Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/03_train.ipynb Defines the input path for multiome data and the output directory for training artifacts. Ensures the output directory exists. ```python input_data = "./02_prepare_data/mdata.h5mu" out_dir = './03_train' os.makedirs(out_dir, exist_ok=True) ``` -------------------------------- ### Start Training for AL035458.2 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of model training for the AL035458.2 gene. ```text Output: Training model for gene: AL035458.2 ``` -------------------------------- ### Setup File Paths and Output Directory Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/05_tf_to_gene.ipynb Configure input and output directories for TF attribution analysis. Ensure the output directory exists. ```python data_dir = "./02_prepare_data/mdata.h5mu" in_dir = "./03_train_cell2net" out_dir = "./05_to_gene" os.makedirs(out_dir, exist_ok=True) ``` -------------------------------- ### Start Training for GPCPD1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of model training for the GPCPD1 gene. ```text Output: Training model for gene: GPCPD1 ``` -------------------------------- ### Setup Output Directories Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/03_train.ipynb Creates necessary directories for organizing training outputs, including model checkpoints, plots, and predictions. The `exist_ok=True` argument prevents errors if directories already exist. ```python os.makedirs(f"{out_dir}/model", exist_ok=True) os.makedirs(f"{out_dir}/plot", exist_ok=True) os.makedirs(f"{out_dir}/prediction", exist_ok=True) ``` -------------------------------- ### Start Training for SULF2 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Logs the initiation of the training process for the SULF2 gene. ```text Output: Training model for gene: SULF2 ``` -------------------------------- ### Setup Output Directory Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/05_process_p2g.ipynb Creates a directory to store processed and standardized results for evaluation. The `exist_ok=True` argument prevents an error if the directory already exists. ```python out_dir = "./05_process_p2g" os.makedirs(out_dir, exist_ok=True) ``` -------------------------------- ### Setup Output Directory Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Creates the necessary output directories for storing trained models, plots, and predictions. The `exist_ok=True` argument prevents errors if directories already exist. ```python out_dir = './03_train_cell2net' os.makedirs(out_dir, exist_ok=True) os.makedirs(f"{out_dir}/model", exist_ok=True) os.makedirs(f"{out_dir}/plot", exist_ok=True) os.makedirs(f"{out_dir}/prediction", exist_ok=True) ``` -------------------------------- ### Start Training for AL110114.1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Initiates the training process for a specific gene. This output indicates the start of model training. ```text Output: Training model for gene: AL110114.1 ``` -------------------------------- ### Import Libraries and Set Options Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/05_process_p2g.ipynb Imports necessary libraries for data processing and sets mudata options. Ensure these libraries are installed before running. ```python import os import pandas as pd import mudata as md import numpy as np md.set_options(pull_on_update=False) ``` -------------------------------- ### Start Training for TMX4 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Logs the initiation of the training process for the TMX4 gene. ```text Output: Training model for gene: TMX4 ``` -------------------------------- ### Start Training for MAFB Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Logs the initiation of the training process for the MAFB gene. ```text Output: Training model for gene: MAFB ``` -------------------------------- ### Setup File Paths for Correlation Analysis Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/04_get_correlation.ipynb Defines the input paths for the multiome dataset and trained Cell2Net models, and specifies the output directory for correlation analysis results. Creates the output directory if it does not exist. ```python input_data = "./02_prepare_data/mdata.h5mu" in_dir = './03_train_cell2net' out_dir = './04_get_correlation' os.makedirs(out_dir, exist_ok=True) ``` -------------------------------- ### Import Libraries and Set Options Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Imports necessary libraries for data manipulation, modeling, and visualization, and sets global options for the mudata library. Ensure these libraries are installed before running. ```python import warnings warnings.filterwarnings("ignore") import os import mudata as md import cell2net as cn import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split md.set_options(pull_on_update=False) import torch ``` -------------------------------- ### Cell2Net Training Start for BACH1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, BACH1. ```text Output: Training model for gene: BACH1 ``` -------------------------------- ### Start Training for B4GALT5 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of model training for the B4GALT5 gene. This output signifies the beginning of model training. ```text Output: Training model for gene: B4GALT5 ``` -------------------------------- ### Cell2Net Training Start for EVA1C Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, EVA1C. ```text Output: Training model for gene: EVA1C ``` -------------------------------- ### Start Training for CST3 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of model training for the CST3 gene, including initial data split information. ```text Output: Training model for gene: CST3 2025-11-21 02:43:01 INFO Training size is provided: 0.8 2025-11-21 02:43:01 INFO Split the data for training and validation ``` -------------------------------- ### Cell2Net Training Start for AP000317.1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, AP000317.1. ```text Output: Training model for gene: AP000317.1 ``` -------------------------------- ### Cell2Net Training Start for CTSZ Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, CTSZ. ```text Output: Training model for gene: CTSZ ``` -------------------------------- ### Start Training for RNF24 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Logs the initiation of the training process for the RNF24 gene. ```text Output: Training model for gene: RNF24 ``` -------------------------------- ### Import Libraries and Set Options Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/misc/01_create_mudata.ipynb Imports necessary libraries and sets Muon options for efficient data handling. This is a common setup for MuData workflows. ```python import warnings warnings.filterwarnings("ignore") import mudata as md import muon as mu import scanpy as sc md.set_options(pull_on_update=False) ``` -------------------------------- ### Start Training for ZNF341-AS1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Logs the initiation of the training process for the ZNF341-AS1 gene. ```text Output: Training model for gene: ZNF341-AS1 ``` -------------------------------- ### Install Cell2net in Development Mode Source: https://github.com/pinellolab/cell2net/blob/main/docs/install.md Install cell2net in editable mode for development. This method is suitable for contributors or users who plan to modify the source code, as changes will be reflected immediately without reinstallation. ```bash git clone https://github.com/pinellolab/cell2net.git cd cell2net pip install -e . ``` -------------------------------- ### Cell2Net Training Start for CHODL Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, CHODL. ```text Output: Training model for gene: CHODL ``` -------------------------------- ### Import Libraries and Warnings Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Imports necessary Python libraries and configures warning filters for the project. This setup is standard for data analysis and machine learning tasks. ```python import warnings warnings.filterwarnings("ignore") import numpy as np from scipy import stats import torch import cell2net as cn import mudata as md import scanpy as sc import matplotlib.pyplot as plt import seaborn as sns md.set_options(pull_on_update=False) ``` -------------------------------- ### Start Training for SIGLEC1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of model training for the SIGLEC1 gene. ```text Output: Training model for gene: SIGLEC1 ``` -------------------------------- ### Cell2Net Training Start for PMEPA1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, PMEPA1. ```text Output: Training model for gene: PMEPA1 ``` -------------------------------- ### Start Training for SIRPB2 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Initiates the training process for the SIRPB2 gene. This output signifies the beginning of model training. ```text Output: Training model for gene: SIRPB2 ``` -------------------------------- ### Start Training for C20orf194 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Logs the initiation of the training process for the C20orf194 gene. ```text Output: Training model for gene: C20orf194 ``` -------------------------------- ### Cell2Net Training Start for ITSN1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, ITSN1. ```text Output: Training model for gene: ITSN1 ``` -------------------------------- ### Check Cell2Net Version Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/03_train.ipynb Retrieves and displays the installed version of the cell2net library. ```python cn.__version__ ``` -------------------------------- ### Cell2Net Training Start for ZBTB46 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, ZBTB46. ```text Output: Training model for gene: ZBTB46 ``` -------------------------------- ### Cell2Net Training Start for AP001347.1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, AP001347.1. ```text Output: Training model for gene: AP001347.1 ``` -------------------------------- ### Cell2Net Training Start for BCAS1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, BCAS1. ```text Output: Training model for gene: BCAS1 ``` -------------------------------- ### Load Required R Libraries Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/06_evaluate_p2g.ipynb Loads necessary libraries for data manipulation, visualization, and analysis in R. Ensure these packages are installed before running. ```r suppressMessages(library(cicero)) suppressMessages(library(ggplot2)) suppressMessages(library(cowplot)) suppressMessages(library(gridExtra)) suppressMessages(library(dplyr)) ``` -------------------------------- ### Cell2Net Training Start for AL109930.1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, AL109930.1. ```text Output: Training model for gene: AL109930.1 ``` -------------------------------- ### Cell2Net Training Output Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb This output indicates the start of training for a specific gene. It is part of the iterative training process for each gene in the dataset. ```text Output: Training model for gene: KCNJ15 ``` ```text Output: Training model for gene: BACE2 ``` ```text Output: Training model for gene: MX2 ``` ```text Output: Training model for gene: MX1 ``` ```text Output: Training model for gene: TRPM2 ``` ```text Output: Training model for gene: PCBP3 ``` ```text Output: Training model for gene: COL6A2 ``` ```text Output: Training model for gene: SAMSN1 ``` ```text Output: Training model for gene: NRIP1 ``` ```text Output: Training model for gene: AF130417.1 ``` ```text Output: Training model for gene: APP ``` ```text Output: Training model for gene: ADAMTS5 ``` ```text Output: Training model for gene: AF165147.1 ``` ```text Output: Training model for gene: TIAM1 ``` ```text Output: Training model for gene: LINC00159 ``` -------------------------------- ### Start Training for SIRPD Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the commencement of model training for the SIRPD gene. ```text Output: Training model for gene: SIRPD ``` -------------------------------- ### Cell2Net Training Start for MIR155HG Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, MIR155HG. ```text Output: Training model for gene: MIR155HG ``` -------------------------------- ### Cell2Net Training Start for LINC01684 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, LINC01684. ```text Output: Training model for gene: LINC01684 ``` -------------------------------- ### Create Output Directory Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/02_prepare_data.ipynb Create a directory to store the processed dataset and other output files. The 'exist_ok=True' argument prevents an error if the directory already exists. ```python out_dir = "./02_prepare_data" os.makedirs(out_dir, exist_ok=True) ``` -------------------------------- ### Cell2Net Training Start for MIR99AHG Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Indicates the start of the training process for a specific gene, MIR99AHG. ```text Output: Training model for gene: MIR99AHG ``` -------------------------------- ### Load Pre-trained Sequence Encoder Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Initializes a pre-trained sequence encoder from a specified file path. Loads the state dictionary, mapping it to the CPU. ```python pretrained_model_path = "./pretrained_seq2acc.pth" pretrained_state_dict = torch.load(pretrained_model_path, map_location="cpu") ``` -------------------------------- ### Define Data Directories Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/04_peak_to_gene.ipynb Configures input and output directory paths for data preprocessing, model training, and peak-to-gene analysis. Ensures the output directory exists. ```python data_dir = "./02_prepare_data/mdata.h5mu" in_dir = "./03_train" out_dir = "./04_peak_to_gene" os.makedirs(out_dir, exist_ok=True) ``` -------------------------------- ### Display Prepared Training Data Head Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Displays the first few rows of the prepared training data DataFrame, showing sequence and accessibility information. ```python df_seq.head() ``` -------------------------------- ### Select Essential Columns for Training Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Prepares the training and validation dataframes by selecting only the 'sequence' and 'acc' columns and resetting the index. ```python df_train = df_train[['sequence', 'acc']].reset_index(drop=True) df_valid = df_valid[['sequence', 'acc']].reset_index(drop=True) ``` -------------------------------- ### Import Libraries and Set Options Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/03_train.ipynb Imports necessary libraries for data manipulation, modeling, and visualization. Sets Mudata options to disable pull on update for efficiency. ```python import warnings warnings.filterwarnings("ignore") import os import mudata as md import cell2net as cn import torch import seaborn as sns import matplotlib.pyplot as plt import pandas as pd import numpy as np from sklearn.model_selection import train_test_split md.set_options(pull_on_update=False) ``` -------------------------------- ### Import Libraries and Set Options Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/04_peak_to_gene.ipynb Imports necessary Python libraries for data manipulation, modeling, and Cell2Net. Sets global options for mudata and filters warnings. ```python import os import warnings warnings.filterwarnings("ignore") import numpy as np import mudata as md import pandas as pd from tqdm import tqdm import cell2net as cn md.set_options(pull_on_update=False) ``` -------------------------------- ### Read MuData Object Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/misc/01_create_mudata.ipynb Demonstrates the basic command to read a MuData object. Ensure the MuData object is correctly formatted before reading. ```python mu.read ``` -------------------------------- ### Training Completion for AL035458.2 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Reports the successful training of the AL035458.2 model, including the best epoch, validation correlation, and confirmation of the training data split. ```text Output: 2025-11-21 02:45:22 INFO Training finished 2025-11-21 02:45:22 INFO Find best model at epoch 22 with valid correation 0.452 2025-11-21 02:45:23 INFO Training size is provided: 0.8 2025-11-21 02:45:23 INFO Split the data for training and validation ``` -------------------------------- ### Display SCENT DataFrame Head Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/05_process_p2g.ipynb Shows the first two rows of the SCENT peak-to-gene prediction DataFrame to inspect its structure and content. ```python df_scent.head(n=2) ``` -------------------------------- ### Prepare Training Data for Sequence Encoder Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Adds peak regions with specified window size and extracts DNA sequences from a reference genome. Requires 'cell2net' imported as 'cn' and 'numpy' as 'np'. Ensure the reference FASTA file path is correct. ```python cn.pp.add_peaks(mdata, mod_name='atac', peak_len=256) cn.pp.add_dna_sequence(mdata, ref_fasta='../../../../data/refdata-gex-GRCh38-2020-A/fasta/genome.fa') ``` -------------------------------- ### Training Completion for SIRPB2 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Logs the completion of training for SIRPB2, detailing the best epoch and validation correlation, and confirming the data split for training and validation. ```text Output: 2025-11-21 02:37:53 INFO Training finished 2025-11-21 02:37:53 INFO Find best model at epoch 0 with valid correation 0.843 2025-11-21 02:37:55 INFO Training size is provided: 0.8 2025-11-21 02:37:55 INFO Split the data for training and validation ``` -------------------------------- ### Load and Preprocess Multiome Data Source: https://github.com/pinellolab/cell2net/blob/main/README.md Load multiome data using MuData and perform basic preprocessing steps including adding peak information and DNA sequences. Ensure you have the 'genome.fa' file available. ```python import cell2net as cn import mudata as md # Load example data mdata = md.read_h5mu("path/to/multiome_data.h5mu") # Basic preprocessing cn.pp.add_peaks(mdata, mod_name='atac') cn.pp.add_dna_sequence(mdata, ref_fasta='genome.fa') # Create and train a model model = cn.tl.Cell2Net(mdata, gene='GENE_OF_INTEREST') model.train() ``` -------------------------------- ### Import Libraries for Cell2Net Analysis Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/05_tf_to_gene.ipynb Imports necessary Python libraries for data manipulation, Cell2Net analysis, and visualization. Sets options for mudata. Warnings are filtered for cleaner output. ```python import warnings warnings.filterwarnings("ignore") import os import numpy as np import mudata as md import cell2net as cn from tqdm import tqdm import pandas as pd md.set_options(pull_on_update=False) ``` -------------------------------- ### Import Libraries and Filter Warnings Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/06_tf_activity.ipynb Imports necessary libraries for data manipulation, analysis, and Cell2Net. Filters out warnings to keep the output clean. ```python import warnings warnings.filterwarnings("ignore") import pandas as pd import scanpy as sc import mudata as md import pandas as pd import cell2net as cn md.set_options(pull_on_update=False) ``` -------------------------------- ### Define Input Directory Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/06_evaluate_p2g.ipynb Set the path to the directory containing processed peak-to-gene predictions from a previous step. ```r in_dir <- "./05_process_p2g" ``` -------------------------------- ### Load Pretrained Sequence Encoder Weights Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/03_train.ipynb Loads pretrained weights for the sequence encoder from a specified file. This is useful for transfer learning to improve model convergence and performance. ```python pretrained_model_path = "./pretrained_seq2acc.pth" pretrained_state_dict = torch.load(pretrained_model_path, map_location="cpu") ``` -------------------------------- ### Training Completion for SULF2 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Confirms the completion of training for SULF2, providing the best epoch, validation correlation, and details on the training data split. ```text Output: 2025-11-21 02:47:38 INFO Training finished 2025-11-21 02:47:38 INFO Find best model at epoch 1 with valid correation 0.872 2025-11-21 02:47:39 INFO Training size is provided: 0.8 2025-11-21 02:47:39 INFO Split the data for training and validation ``` -------------------------------- ### Training Completion for ZNF341-AS1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Confirms the completion of training for ZNF341-AS1, providing the best epoch, validation correlation, and details on the training data split. ```text Output: 2025-11-21 02:44:33 INFO Training finished 2025-11-21 02:44:33 INFO Find best model at epoch 30 with valid correation 0.392 2025-11-21 02:44:34 INFO Training size is provided: 0.8 2025-11-21 02:44:34 INFO Split the data for training and validation ``` -------------------------------- ### Training Completion for ZHX3 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Reports the successful training of the ZHX3 model, including the best epoch, validation correlation, and confirmation of the training data split. ```text Output: 2025-11-21 02:46:53 INFO Training finished 2025-11-21 02:46:53 INFO Find best model at epoch 7 with valid correation 0.569 2025-11-21 02:46:54 INFO Training size is provided: 0.8 2025-11-21 02:46:54 INFO Split the data for training and validation ``` -------------------------------- ### Initialize Seq2Acc Model Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Initializes a Seq2Acc model with a specified input sequence length (peak_len) and dropout rate for regularization. This model uses CNNs to learn sequence patterns for predicting chromatin accessibility. ```python model = cn.pd.model.Seq2Acc(peak_len=256, dropout_rate=0.25) ``` -------------------------------- ### Display Signac DataFrame Head Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/05_process_p2g.ipynb Shows the first two rows of the Signac peak-to-gene prediction DataFrame to inspect its structure and content. ```python df_signac.head(n=2) ``` -------------------------------- ### Visualize RNA UMAP Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Generates a UMAP visualization for RNA data. Ensure 'scanpy' is imported as 'sc'. ```python sc.pl.umap(mdata["rna"]) ``` -------------------------------- ### Training Completion for TMX4 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Confirms the completion of training for TMX4, providing the best epoch and validation correlation. Note: Data split information is not logged in this instance. ```text Output: 2025-11-21 02:42:59 INFO Training finished 2025-11-21 02:42:59 INFO Find best model at epoch 29 with valid correation 0.743 ``` -------------------------------- ### Training Completion for CST3 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Reports the successful training of the CST3 model, including the best epoch, validation correlation, and confirmation of the training data split. ```text Output: 2025-11-21 02:43:44 INFO Training finished 2025-11-21 02:43:44 INFO Find best model at epoch 8 with valid correation 0.903 2025-11-21 02:43:46 INFO Training size is provided: 0.8 2025-11-21 02:43:46 INFO Split the data for training and validation ``` -------------------------------- ### Training Completion for GPCPD1 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Reports the successful training of the GPCPD1 model, including the best epoch, validation correlation, and confirmation of the training data split. ```text Output: 2025-11-21 02:42:17 INFO Training finished 2025-11-21 02:42:17 INFO Find best model at epoch 24 with valid correation 0.956 2025-11-21 02:42:19 INFO Training size is provided: 0.8 2025-11-21 02:42:19 INFO Split the data for training and validation ``` -------------------------------- ### Train Cell2Net Models for Genes Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/03_train.ipynb Iterates through a list of genes to train Cell2Net models. It initializes the model, loads pretrained weights, trains for a specified number of epochs with early stopping, evaluates performance, and saves results. This snippet is designed for training individual gene models and includes evaluation and saving steps. ```python # We only train Cell2net for one gene to save time for gene in genes[:1]: print('Training model for gene:', gene) cn.utils.set_random_seed(42) model = cn.pd.model.Cell2Net(mdata=mdata, gene=gene, covariates=['total_counts_rna_log', 'total_counts_atac_log']) # load pretrained weights for the sequence encoder model.module.seq_encoder.load_state_dict(pretrained_state_dict) model.train(max_epochs=40, device_name='cuda:1', batch_size=16, train_idx=train_idx, valid_idx=valid_idx, num_workers=4, verbose=False) model.save(dir_path=f"{out_dir}/model") # set the model with the best checkpoint model.module.load_state_dict(model.check_point) # Evaluate the model for training and validation dataset train_pred = model.predict(model.mdata[train_idx]) train_true = model.mdata[train_idx]["rna"].layers["counts"].todense().A1 valid_pred = model.predict(model.mdata[valid_idx]) valid_true = model.mdata[valid_idx]["rna"].layers["counts"].todense().A1 df_train = pd.DataFrame({'true': train_true, 'pred': train_pred, 'data': 'train'}) df_valid = pd.DataFrame({'true': valid_true, 'pred': valid_pred, 'data': 'valid'}) df_train['true'] = np.log1p(df_train['true']) df_valid['true'] = np.log1p(df_valid['true']) df_train['pred'] = np.log1p(df_train['pred']) df_valid['pred'] = np.log1p(df_valid['pred']) fig, axes = plt.subplots(1, 2, figsize=(8, 4)) sns.scatterplot(data=df_train, x='true', y='pred', ax=axes[0], label='train') sns.scatterplot(data=df_valid, x='true', y='pred', ax=axes[1], label='valid') axes[0].set_xlabel("Observation (log1p)") axes[0].set_ylabel("Prediction (log1p)") axes[1].set_xlabel("Observation (log1p)") axes[1].set_ylabel("Prediction (log1p)") # save figure plt.savefig(f'{out_dir}/plot/{gene}.png', dpi=300) plt.close() np.savez(f'{out_dir}/prediction/{gene}.npz', best_valid_corr=model.best_valid_corr, train_pred=train_pred, train_true=train_true, valid_pred=valid_pred, valid_true=valid_true) ``` -------------------------------- ### Load Peak-to-Gene DataFrames Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/05_process_p2g.ipynb Reads CSV files containing peak-to-gene links from Signac, SCENT, SCARlink, and cell2net into pandas DataFrames. Note that cell2net uses a different index column. ```python # read peak-to-gene links from Signac, SCENT, SCARlink, and cell2net df_signac = pd.read_csv(f"./signac.csv") df_scent = pd.read_csv(f"./scent.csv") df_scarlink = pd.read_csv(f"./scarlink.csv") df_cell2net = pd.read_csv(f"./cell2net.csv", index_col=0) ``` -------------------------------- ### Cell2Net Training Output Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb This snippet shows the typical output logs during Cell2Net model training, including information about data splitting, training progress, and the best validation correlation found. ```log Training model for gene: ATRN ``` ```log 2025-11-21 02:19:49 INFO Training size is provided: 0.8 2025-11-21 02:19:49 INFO Split the data for training and validation 2025-11-21 02:20:40 INFO Training finished 2025-11-21 02:20:40 INFO Find best model at epoch 37 with valid correation 0.812 2025-11-21 02:20:41 INFO Training size is provided: 0.8 2025-11-21 02:20:41 INFO Split the data for training and validation ``` ```log Training model for gene: PLCB1 ``` ```log 2025-11-21 02:21:20 INFO Training finished 2025-11-21 02:21:20 INFO Find best model at epoch 32 with valid correation 0.932 2025-11-21 02:21:21 INFO Training size is provided: 0.8 2025-11-21 02:21:21 INFO Split the data for training and validation ``` ```log Training model for gene: LAMP5 ``` ```log 2025-11-21 02:21:56 INFO Training finished 2025-11-21 02:21:56 INFO Find best model at epoch 25 with valid correation 0.441 2025-11-21 02:21:57 INFO Training size is provided: 0.8 2025-11-21 02:21:57 INFO Split the data for training and validation ``` ```log Training model for gene: AL050403.2 ``` ```log 2025-11-21 02:22:33 INFO Training finished 2025-11-21 02:22:33 INFO Find best model at epoch 36 with valid correation 0.693 2025-11-21 02:22:34 INFO Training size is provided: 0.8 2025-11-21 02:22:34 INFO Split the data for training and validation ``` ```log Training model for gene: ISM1 ``` ```log 2025-11-21 02:23:08 INFO Training finished 2025-11-21 02:23:08 INFO Find best model at epoch 12 with valid correation 0.382 2025-11-21 02:23:10 INFO Training size is provided: 0.8 2025-11-21 02:23:10 INFO Split the data for training and validation ``` ```log Training model for gene: MACROD2 ``` ```log 2025-11-21 02:23:46 INFO Training finished 2025-11-21 02:23:46 INFO Find best model at epoch 15 with valid correation 0.514 2025-11-21 02:23:47 INFO Training size is provided: 0.8 2025-11-21 02:23:47 INFO Split the data for training and validation ``` ```log Training model for gene: SLC24A3 ``` ```log 2025-11-21 02:24:23 INFO Training finished 2025-11-21 02:24:23 INFO Find best model at epoch 17 with valid correation 0.167 2025-11-21 02:24:25 INFO Training size is provided: 0.8 2025-11-21 02:24:25 INFO Split the data for training and validation ``` ```log Training model for gene: RIN2 ``` -------------------------------- ### Process Metacells for Visualization Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/02_prepare_data.ipynb Applies normalization, PCA, neighbor graph construction, and UMAP embedding to metacells for visualization and quality assessment. Requires the scanpy library. ```python sc.pp.normalize_total(mdata_bulk['rna']) sc.tl.pca(mdata_bulk['rna'], n_comps=30, use_highly_variable=True) sc.pp.neighbors(mdata_bulk['rna']) sc.tl.umap(mdata_bulk['rna']) ``` -------------------------------- ### Load Pretrained Sequence Encoder for Evaluation Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Loads the saved sequence encoder weights from a file. This is used to reload the best model for final evaluation and prediction visualization. ```python # load best model weights pretrained_state_dict = torch.load('./pretrained_seq2acc.pth') model.module.seq_encoder.load_state_dict(pretrained_state_dict) ``` -------------------------------- ### Examine MuData Object Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Displays the structure and summary of the loaded MuData object. This helps in understanding the dimensions and modalities of the dataset. ```python mdata ``` -------------------------------- ### Display SCARlink DataFrame Head Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/05_process_p2g.ipynb Shows the first two rows of the SCARlink peak-to-gene prediction DataFrame to inspect its structure and content. ```python df_scarlink.head(n=2) ``` -------------------------------- ### Training Completion for MAFB Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Confirms the completion of training for MAFB, providing the best epoch, validation correlation, and details on the training data split. ```text Output: 2025-11-21 02:46:08 INFO Training finished 2025-11-21 02:46:08 INFO Find best model at epoch 3 with valid correation 0.838 2025-11-21 02:46:10 INFO Training size is provided: 0.8 2025-11-21 02:46:10 INFO Split the data for training and validation ``` -------------------------------- ### Training Completion for RNF24 Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Confirms the completion of training for RNF24, providing the best epoch, validation correlation, and details on the training data split. ```text Output: 2025-11-21 02:41:27 INFO Training finished 2025-11-21 02:41:27 INFO Find best model at epoch 14 with valid correation 0.873 2025-11-21 02:41:28 INFO Training size is provided: 0.8 2025-11-21 02:41:28 INFO Split the data for training and validation ``` -------------------------------- ### Create Performance Comparison Plot Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/06_evaluate_p2g.ipynb Generates a line and point plot comparing method performance. Requires the ggplot2 and cowplot libraries. Adjust plot dimensions using options(). ```python options(repr.plot.height = 4, repr.plot.width = 10) p <- ggplot(data = df_final, aes(x = n_links, y = odds_ratio)) + geom_line(aes(color = method)) + geom_point(aes(color = method)) + facet_wrap(~data, scales = "free") + xlab("") + ylab("") + theme_cowplot() + theme(legend.title = element_blank()) print(p) ``` -------------------------------- ### Cell2Net Training Output for AP000317.1 (Second Instance) Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/03_train_cell2net.ipynb Shows the training completion, best model epoch, and validation correlation for gene AP000317.1. Indicates the training size and data split for validation. ```text Output: 2025-11-21 02:59:51 INFO Training finished 2025-11-21 02:59:51 INFO Find best model at epoch 8 with valid correation 0.299 2025-11-21 02:59:53 INFO Training size is provided: 0.8 2025-11-21 02:59:53 INFO Split the data for training and validation ``` -------------------------------- ### Load Data and Extract Target Genes Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/03_train.ipynb Loads preprocessed multiome data from a specified file and extracts a unique list of genes that have valid peak-to-gene associations. These genes will serve as the targets for model training. ```python mdata = md.read_h5mu(input_data) genes = mdata.uns['peak_to_gene']['gene'].unique().tolist() ``` -------------------------------- ### Create Train/Validation Split Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/03_train.ipynb Splits the cell observations into training (80%) and validation (20%) sets using a fixed random state for reproducibility. This is crucial for evaluating model performance and preventing overfitting. ```python train_idx, valid_idx = train_test_split( mdata.obs_names.values.tolist(), train_size=0.8, random_state=42) ``` -------------------------------- ### Display cell2net DataFrame Head Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/05_process_p2g.ipynb Shows the first five rows of the cell2net peak-to-gene prediction DataFrame to inspect its structure and content. ```python df_cell2net.head(n=5) ``` -------------------------------- ### RNA Data Preprocessing Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/02_prepare_data.ipynb Applies total count normalization, log transformation, and PCA to RNA data. Ensure the AnnData object `mdata['rna']` is properly loaded. ```python sc.pp.normalize_total(mdata['rna']) sc.pp.log1p(mdata['rna']) sc.tl.pca(mdata['rna'], n_comps=30, use_highly_variable=True) ``` -------------------------------- ### Import Libraries for Cell2Net Correlation Analysis Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/pbmc/04_get_correlation.ipynb Imports necessary Python libraries for data manipulation, numerical operations, statistical analysis, and plotting. Ensures warnings are ignored and sets Mudata options. ```python import warnings warnings.filterwarnings("ignore") import os import mudata as md import pandas as pd import numpy as np import seaborn as sns from scipy.stats import spearmanr import matplotlib.pyplot as plt md.set_options(pull_on_update=False) ``` -------------------------------- ### Extract Essential Columns Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/05_process_p2g.ipynb Selects 'Peak1', 'Peak2', and 'score' columns from Signac, Scent, ScarLink, and Cell2Net dataframes for evaluation. ```python df_signac = df_signac[['Peak1', 'Peak2', 'score']] df_scent = df_scent[['Peak1', 'Peak2', 'score']] df_scarlink = df_scarlink[['Peak1', 'Peak2', 'score']] df_cell2net = df_cell2net[['Peak1', 'Peak2', 'score']] ``` -------------------------------- ### Train Sequence Encoder Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Initiates the training process for the sequence encoder model using specified training and validation dataframes and hyperparameters. Ensure df_train and df_valid are pre-loaded. ```python model.train(df_train=df_train, df_valid=df_valid, max_epochs=100, batch_size=512, weight_decay=1e-04, lr=3e-04) ``` -------------------------------- ### Visualize ATAC UMAP Source: https://github.com/pinellolab/cell2net/blob/main/docs/tutorials/k562/01_pretrain_seq_encoder.ipynb Generates a UMAP visualization for ATAC data. Ensure 'scanpy' is imported as 'sc'. ```python sc.pl.umap(mdata["atac"]) ```