### Install Custom ALLCools and Bismark Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Clones and installs custom versions of ALLCools and Bismark from GitHub repositories. These modified tools are essential for UMI deduplication, methylation calculation, and adding specific tags to BAM files. ```bash # Clone and install custom ALLCools conda activate seeksoulmethyl git clone https://github.com/seekgene/ALLCools.git && \ pip install ./ALLCools && \ rm -rf ./ALLCools # Clone and install custom Bismark git clone https://github.com/seekgene/Bismark.git && \ bin_path=$(dirname `which python`) cp -r ./Bismark/* $bin_path/ && \ chmod +x $bin_path/bismark* && \ chmod +x $bin_path/deduplicate_bismark && \ rm -rf ./Bismark ``` -------------------------------- ### Environment Installation (Bash) Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Sets up the conda environment and installs the SeekSoulMethyl package and its dependencies. It includes cloning the repository, creating conda environments using different YAML files for international and China users, and installing specific Python packages via pip. Dependencies include git, conda, and pip. ```bash # Clone the repository git clone https://github.com/seekgene/SeekSoulMethyl.git cd SeekSoulMethyl # Create conda environment (international users) conda env create -n seeksoulmethyl -f conda_dependencies.yml conda activate seeksoulmethyl # Create conda environment (China users) conda env create -n seeksoulmethyl -f conda_dependencies.zh.yml conda activate seeksoulmethyl # Install SeekSoulMethyl package cd dependence pip install . \ simpleqc/target/wheels/simpleqc-0.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl \ search-pattern/target/wheels/search_pattern-0.1.0-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl cd .. # Install custom ALLCools with UMI support git clone https://github.com/seekgene/ALLCools.git pip install ./ALLCools rm -rf ./ALLCools ``` -------------------------------- ### Reference Database Setup (Bash) Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Downloads and prepares reference genome databases for human (GRCh38) and mouse (GRCm39) using wget. It then extracts the compressed tar archives to create the necessary directory structure for genome analysis. Dependencies include wget and tar. ```bash # Download human reference genome wget -dc -O human-reference-GRCh38.tar.gz \ "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/v1.1/human-reference-GRCh38.tar.gz" wget -dc -O human-reference-GRCh38.tar.gz.md5 \ "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/v1.1/human-reference-GRCh38.tar.gz.md5" # Download mouse reference genome wget -dc -O mouse-reference-GRCm39.tar.gz \ "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/v1.1/mouse-reference-GRCm39.tar.gz" # Extract reference genomes tar -xzf human-reference-GRCh38.tar.gz tar -xzf mouse-reference-GRCm39.tar.gz # Reference directory structure: # database_dir/ # fasta/genome.fa # Reference genome FASTA # genes/genes.gtf # Gene annotation # star/ # STAR index for RNA-seq # bed/chr_len.bed # Chromosome lengths # bed/chr_nochrM.bed # Chromosome lengths without mitochondria ``` -------------------------------- ### Run Dual-omics Analysis with Multiple FASTQ Files (Shell Script) Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/README.md This example demonstrates running the `sc_methy_workflow.sh` script when a sample has multiple FASTQ files for expression and methylation data. File paths for each type should be comma-separated and listed in the correct order. ```bash bash sc_methy_workflow.sh \ /path/to/WTJW969_E_L003_R1.fq.gz,/path/to/WTJW969_E_L004_R1.fq.gz \ /path/to/WTJW969_E_L003_R2.fq.gz,/path/to/WTJW969_E_L004_R2.fq.gz \ /path/to/WTJW969_Met_L000_R1.fq.gz,/path/to/WTJW969_Met_L001_R1.fq.gz,/path/to/WTJW969_Met_L002_R1.fq.gz,/path/to/WTJW969_Met_L003_R1.fq.gz,/path/to/WTJW969_Met_L004_R1.fq.gz \ /path/to/WTJW969_Met_L000_R2.fq.gz,/path/to/WTJW969_Met_L001_R2.fq.gz,/path/to/WTJW969_Met_L002_R2.fq.gz,/path/to/WTJW969_Met_L003_R2.fq.gz,/path/to/WTJW969_Met_L004_R2.fq.gz \ --sample WTJW969 \ --outdir /path/to/results \ --database_dir /path/to/human-reference-GRCh38 \ --chemistry DD-MET5 \ --core 64 \ --filter_ch 2 ``` -------------------------------- ### Install SeekSoulMethyl Dependencies Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Installs specific Python packages for the SeekSoulMethyl pipeline from local wheel files. These packages are likely custom-built or modified versions required by the pipeline. ```bash cd dependence pip install . \ simpleqc/target/wheels/simpleqc-0.1.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl \ search-pattern/target/wheels/search_pattern-0.1.0-py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl cd .. ``` -------------------------------- ### Install umi_tools using pip Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/docs/How_to_deduplicate_single_cell_bam.md Installs the umi_tools Python package, which is a dependency for UMI deduplication. Ensure Python 3 is installed in your environment. ```shell pip install umi_tools ``` -------------------------------- ### Install Nextflow for SeekSoulMethyl Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Installs the Nextflow workflow management system within the 'seeksoulmethyl' Conda environment. This is a prerequisite for running the SeekSoulMethyl pipeline. ```bash conda activate seeksoulmethyl conda install -n seeksoulmethyl -c bioconda nextflow ``` -------------------------------- ### Install Custom Bismark with Barcode/UMI Tagging (Shell) Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt This script clones a custom Bismark repository, copies its contents to the Python bin directory, and makes the executables runnable. It then cleans up the cloned repository. This is useful for setting up specialized Bismark versions for UMI tagging. ```shell git clone https://github.com/seekgene/Bismark.git bin_path=$(dirname $(which python)) cp -r ./Bismark/* $bin_path/ chmod +x $bin_path/bismark* $bin_path/deduplicate_bismark rm -rf ./Bismark ``` -------------------------------- ### Nextflow Configuration for Slurm Cluster Execution Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/README.md Sets up Nextflow to run on a Slurm cluster. It defines the executor, work directory, resources, queue, and cluster-specific options. It also includes an example of fine-grained resource allocation for processes labeled 'high_mem'. ```groovy slurm { process.executor = 'slurm' workDir = '/lustre/work/your_user' process.cpus = 8 process.memory = '32 GB' process.queue = 'normal' process.clusterOptions = '-A your_account --qos=normal' withLabel: 'high_mem' { cpus = 16 memory = '64 GB' } } ``` -------------------------------- ### Clone SeekSoulMethyl Repository Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Clones the SeekSoulMethyl repository from GitHub and navigates into the project directory. This is the initial step for setting up the analysis pipeline. ```bash git clone https://github.com/seekgene/SeekSoulMethyl.git cd SeekSoulMethyl ``` -------------------------------- ### Prepare SeekSoulMethyl Sample Sheet Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Creates a sample sheet file in CSV format for the SeekSoulMethyl pipeline. The file must contain columns for sample ID and read file paths for both expression and methylation data. This file is crucial for defining the input data for the pipeline. ```bash cat > samplelist.csv << EOF sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2 XYRD-WTJW880,/path/to/XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz,/path/to/XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz,/path/to/XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz EOF ``` -------------------------------- ### Download Test Data using wget Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/README.md This snippet demonstrates how to download small test datasets for transcriptome and methylation analysis using the `wget` command. It includes downloading both the FASTQ files and their corresponding MD5 checksum files for verification. ```bash # Download transcriptome test data wget -dc -O XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/fastq/XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz" wget -dc -O XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/fastq/XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz.md5" wget -dc -O XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/fastq/XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz" wget -dc -O XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/fastq/XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz.md5" # Download methylation test data wget -dc -O XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/tiny_fastq/XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz" wget -dc -O XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/tiny_fastq/XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz.md5" wget -dc -O XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/tiny_fastq/XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz" wget -dc -O XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/tiny_fastq/XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz.md5" ``` -------------------------------- ### Activate Conda Environment (Shell) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Activates the 'seeksoulmethyl' Conda environment. This command is necessary to use the tools and dependencies installed within that environment. ```shell conda activate seeksoulmethyl ``` -------------------------------- ### Run SeekSoulMethyl Pipeline with Nextflow Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Executes the SeekSoulMethyl pipeline using Nextflow. This command requires a prepared sample sheet, output directory, working directory, configuration file, profile, and database directory. Optional parameters control fastq splitting, CH filtering, and chemistry type. ```bash nextflow run SeekSoulMethyl/nf/main.nf \ --outdir /path/to/results \ --samplesheet samplelist.csv \ -w /path/to/work \ -c SeekSoulMethyl/nf/nextflow.config \ -profile aliyun_k8s \ --database_dir /path/to/reference \ --split_fastq 4 \ --filter_ch 2 \ --chemistry DD-MET3 > methy.log ``` -------------------------------- ### FASTQ File Pairing (Shell Command) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Scans directories for forward and reverse FASTQ files and creates pairing lists. This step ensures correct association of read pairs for downstream processing. ```shell # PARSE_FASTQ_FILES (script/SeekSoulMethyl/nf/modules/step1.nf:255) # Input: forward/reverse FASTQ sets # Action: scan step1/ and pair files by identifiers # Output: forward_pairs.txt, reverse_pairs.txt ``` -------------------------------- ### Nextflow Samplesheet with Multiple Datasets Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/README.md Illustrates how to handle samples with multiple FASTQ datasets by adding multiple rows for the same sample ID in the samplesheet. This is useful when a single sample is split across several files. ```bash sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2 WTJW969,/path/to/WTJW969_E_L003_R1.fq.gz,/path/to/WTJW969_E_L003_R2.fq.gz,/path/to/WTJW969_Met_L000_R1.fq.gz,/path/to/WTJW969_Met_L000_R2.fq.gz WTJW969,/path/to/WTJW969_E_L004_R1.fq.gz,/path/to/WTJW969_E_L004_R2.fq.gz,/path/to/WTJW969_Met_L001_R1.fq.gz,/path/to/WTJW969_Met_L001_R2.fq.gz WTJW969,,,/path/to/WTJW969_Met_L002_R1.fq.gz,/path/to/WTJW969_Met_L002_R2.fq.gz WTJW969,,,/path/to/WTJW969_Met_L003_R1.fq.gz,/path/to/WTJW969_Met_L003_R2.fq.gz WTJW969,,,/path/to/WTJW969_Met_L004_R1.fq.gz,/path/to/WTJW969_Met_L004_R2.fq.gz ``` -------------------------------- ### Run Methylation-only Workflow (Bash) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Executes the methylation-only Nextflow pipeline using specified parameters. Requires Nextflow, a samplesheet, and a reference database. Outputs results to a specified directory. ```bash nextflow run SeekSoulMethyl/nf/methy_only.nf \ --outdir /path/to/results \ --samplesheet samplelist.csv \ -w /path/to/work \ -c SeekSoulMethyl/nf/nextflow.config \ -profile aliyun_k8s \ --database_dir /path/to/reference \ --split_fastq 4 \ --filter_ch 2 \ --chemistry DD-MET3 ``` -------------------------------- ### Nextflow Configuration Profiles (Groovy) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Defines execution environments for Nextflow, including local, Slurm, and Kubernetes. Tailor executor, resources, work directory, and containerization (conda, Docker, Singularity) to your infrastructure. ```groovy profiles { // Local machine local { process.executor = 'local' workDir = '/path/to/work' process.cpus = 8 process.memory = '32 GB' conda.enabled = true // Or containers: singularity.enabled = true / docker.enabled = true } // Slurm cluster (HPC) slurm { process.executor = 'slurm' workDir = '/lustre/work/your_user' process.cpus = 8 process.memory = '32 GB' process.queue = 'normal' process.clusterOptions = '-A your_account --qos=normal' withLabel: 'high_mem' { cpus = 16 memory = '64 GB' } } // Kubernetes (e.g., Alibaba Cloud ACK) aliyun_k8s { process.executor = 'k8s' workDir = '/mnt/nf-work' // persistent volume path k8s { namespace = 'your-namespace' storageClaimName = 'your-pvc' cpu = 4 memory = '16 GB' } // If using container images globally: docker.enabled = true } } ``` -------------------------------- ### Iterate Sample Data with Jinja2 Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/dependence/seeksoultools/fast/template/sample_part.html This snippet demonstrates how to iterate over a dictionary of sample data using Jinja2 templating. It accesses both the key and value for each item in the sample_table. ```jinja2 {% for key, value in sample_table.items() %} {{key}} {{value}} {% endfor %} ``` -------------------------------- ### Include CSS Stylesheets for SeekSoulMethyl Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/dependence/seeksoultools/fast/template/base.html Links to necessary CSS files for styling the project's user interface. This includes Bootstrap for responsive design and DataTables for enhancing HTML tables with sorting, filtering, and pagination. ```html {% include 'css/jquery.dataTables.min.css' %} {% include 'css/bootstrap.min.css' %} ``` -------------------------------- ### Download Methylation Test Data (Shell) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Downloads FASTQ files and their MD5 checksums for methylation data using wget. The -dc option allows resuming interrupted downloads, and -O specifies the output filename. ```shell wget -dc -O XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/tiny_fastq/XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz" wget -dc -O XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/tiny_fastq/XYRD-WTJW880-MET_S01_L001_R1_001.fastq.gz.md5" wget -dc -O XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/tiny_fastq/XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz" wget -dc -O XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/tiny_fastq/XYRD-WTJW880-MET_S01_L001_R2_001.fastq.gz.md5" ``` -------------------------------- ### Download Transcriptome Test Data (Shell) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Downloads FASTQ files and their MD5 checksums for transcriptome data using wget. The -dc option allows resuming interrupted downloads, and -O specifies the output filename. ```shell wget -dc -O XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/fastq/XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz" wget -dc -O XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/fastq/XYRD-WTJW880-E_S1_L005_R1_001.fastq.gz.md5" wget -dc -O XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/fastq/XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz" wget -dc -O XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/fastq/XYRD-WTJW880-E_S1_L005_R2_001.fastq.gz.md5" ``` -------------------------------- ### Nextflow Configuration for Aliyun Kubernetes Execution Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/README.md Configures Nextflow to execute on an Alibaba Cloud Kubernetes cluster. It specifies the 'k8s' executor, a persistent work directory, and Kubernetes-specific settings like namespace, storage claim, CPU, and memory. Container image usage can be enabled globally. ```groovy aliyun_k8s { process.executor = 'k8s' workDir = '/mnt/nf-work' // persistent volume path k8s { namespace = 'your-namespace' storageClaimName = 'your-pvc' cpu = 4 memory = '16 GB' } // If using container images globally: docker.enabled = true } ``` -------------------------------- ### Run SeekSoulMethyl Pipeline (Background Mode) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Executes the SeekSoulMethyl pipeline in background mode using Nextflow. This command is similar to the foreground execution but allows the process to run independently. It includes parameters for output, working directories, configuration, profile, database, and specific pipeline options. ```bash nextflow run -bg SeekSoulMethyl/nf/main.nf \ --outdir /path/to/tiny_demo/results/ \ --samplesheet samplelist.csv \ -w /path/to/tiny_demo/results/work \ -c SeekSoulMethyl/nf/nextflow.config \ -profile aliyun_k8s \ --database_dir /path/to/human-reference-GRCh38/ \ --split_fastq 1 \ --filter_ch 2 \ --chemistry DD-MET3 > methy.log ``` -------------------------------- ### Run Dual-omics Analysis with Shell Script (Shell) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Executes the dual-omics analysis pipeline using the sc_methy_workflow.sh script. It requires paths to transcriptome and methylation FASTQ files, sample name, output directory, reference database, chemistry type, and computational resources. ```shell bash sc_methy_workflow.sh \ /path/to/expression_R1.fastq.gz \ /path/to/expression_R2.fastq.gz \ /path/to/methy_R1.fastq.gz \ /path/to/methy_R2.fastq.gz \ --sample WTJW880 \ --outdir /path/to/results \ --database_dir /path/to/human-reference-GRCh38 \ --chemistry DD-MET3 \ --core 64 \ --filter_ch 2 ``` -------------------------------- ### Create and Activate Conda Environment Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Creates a new conda environment named 'seeksoulmethyl' using provided YAML files and activates it. This isolates project dependencies for reproducibility. Separate files are provided for users in China and international users. ```bash # For users in China: conda env create -n seeksoulmethyl -f conda_dependencies.zh.yml conda activate seeksoulmethyl # For international users: conda env create -n seeksoulmethyl -f conda_dependencies.yml conda activate seeksoulmethyl ``` -------------------------------- ### Download and Extract Reference Genomes Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Downloads and extracts reference genome data for human (GRCh38) and mouse (GRCm39) using wget. MD5 checksum files are also downloaded for integrity verification. These genomes are required for aligning sequencing reads. ```bash # Download human reference genome (GRCh38) wget -dc -O human-reference-GRCh38.tar.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/v1.1/human-reference-GRCh38.tar.gz" wget -dc -O human-reference-GRCh38.tar.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/v1.1/human-reference-GRCh38.tar.gz.md5" # Download mouse reference genome (GRCm39) wget -dc -O mouse-reference-GRCm39.tar.gz "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/v1.1/mouse-reference-GRCm39.tar.gz" wget -dc -O mouse-reference-GRCm39.tar.gz.md5 "https://seekgene-public.oss-cn-beijing.aliyuncs.com/methy_demo/methy_exp/v1.1/mouse-reference-GRCm39.tar.gz.md5" # Extract reference genomes tar -xzf human-reference-GRCh38.tar.gz tar -xzf mouse-reference-GRCm39.tar.gz ``` -------------------------------- ### Nextflow Configuration (Groovy) Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Configures Nextflow for different execution environments, including local, Slurm, and Kubernetes (Alibaba Cloud ACK). This allows users to adapt the pipeline to their specific computing infrastructure. Dependencies include Nextflow. ```groovy // Example nextflow.config for local execution profiles { local { process.executor = 'local' workDir = '/path/to/work' process.cpus = 8 process.memory = '32 GB' conda.enabled = true } // Slurm cluster configuration slurm { process.executor = 'slurm' workDir = '/lustre/work/your_user' process.cpus = 8 process.memory = '32 GB' process.queue = 'normal' process.clusterOptions = '-A your_account --qos=normal' withLabel: 'high_mem' { cpus = 16 memory = '64 GB' } } // Kubernetes (Alibaba Cloud ACK) aliyun_k8s { process.executor = 'k8s' workDir = '/mnt/nf-work' k8s { namespace = 'your-namespace' storageClaimName = 'your-pvc' cpu = 4 memory = '16 GB' } } } ``` -------------------------------- ### Run Batch Processing with Nextflow Pipeline Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Launches the Nextflow pipeline for parallelized batch processing of multiple samples. Requires a sample sheet (CSV), output directory, working directory, configuration file, profile, reference database, and chemistry type. ```bash conda activate seeksoulmethyl conda install -n seeksoulmethyl -c bioconda nextflow cat > samplelist.csv << EOF sample_id,expression_r1,expression_r2,methylation_r1,methylation_r2 SAMPLE001,/data/exp_R1.fastq.gz,/data/exp_R2.fastq.gz,/data/met_R1.fastq.gz,/data/met_R2.fastq.gz SAMPLE002,/data/exp2_R1.fastq.gz,/data/exp2_R2.fastq.gz,/data/met2_R1.fastq.gz,/data/met2_R2.fastq.gz EOF nextflow run SeekSoulMethyl/nf/main.nf \ --outdir /path/to/results \ --samplesheet samplelist.csv \ -w /path/to/work \ -c SeekSoulMethyl/nf/nextflow.config \ -profile local \ --database_dir /path/to/human-reference-GRCh38 \ --split_fastq 4 \ --filter_ch 2 \ --chemistry DD-MET3 # Run methylation-only workflow (without transcriptome) nextflow run SeekSoulMethyl/nf/methy_only.nf \ --outdir /results \ --samplesheet methy_samples.csv \ -w /work \ -c nf/nextflow.config \ --database_dir /ref/mouse-reference-GRCm39 \ --chemistry DD-MET5 \ --split_fastq 4 ``` -------------------------------- ### Run Single-Sample Dual-Omics Analysis with Shell Script Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Executes the main shell script for single-sample transcriptome and methylation data processing. Requires FASTQ files, sample name, output directory, reference database, chemistry type, and core count. ```bash bash sc_methy_workflow.sh \ /path/to/expression_R1.fastq.gz \ /path/to/expression_R2.fastq.gz \ /path/to/methylation_R1.fastq.gz \ /path/to/methylation_R2.fastq.gz \ --sample SAMPLE001 \ --outdir /path/to/results \ --database_dir /path/to/human-reference-GRCh38 \ --chemistry DD-MET3 \ --core 64 \ --filter_ch 2 bash sc_methy_workflow.sh \ /data/L003_R1.fq.gz,/data/L004_R1.fq.gz \ /data/L003_R2.fq.gz,/data/L004_R2.fq.gz \ /data/Met_L001_R1.fq.gz,/data/Met_L002_R1.fq.gz \ /data/Met_L001_R2.fq.gz,/data/Met_L002_R2.fq.gz \ --sample SAMPLE002 \ --outdir /results \ --database_dir /ref/human-reference-GRCh38 \ --chemistry DD-MET5 \ --core 64 \ --filter_ch 0 ``` -------------------------------- ### Include Jinja HTML Parts Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/dependence/seeksoultools/fast/template/base.html Includes various HTML template parts using Jinja's include syntax. These parts likely represent different sections or components of the web page, such as summary data, sequencing information, and mapping details. ```html {% include 'summary_part.html' %} {% include 'sequencing_part.html' %} {% include 'mapping_part.html' %} {% include 'cells_part.html' %} {% include 'sample_part.html' %} {% include 'biotype_pie.html' %} {% include 'median_part.html' %} {% include 'genebody.html' %} {% include 'saturation_part.html' %} {% include 'umi_part.html' %} {% include 'cluster_part.html' %} {% include 'marker_part.html' %} ``` -------------------------------- ### FASTP for FASTQ QC (Shell Command) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Utilizes the fastp tool for quality control and trimming of FASTQ files. Supports multi-group processing for both expression and methylation data. ```shell # FASTP_EXPRESSION_MULTI (script/SeekSoulMethyl/nf/modules/step1.nf:22) # Input: paired FASTQs per sample (groups G1/G2/...) # Action: fastp trimming and QC # Output: cleaned *_expression_clean_R1/2.fastq.gz, *.html, *.json # FASTP_METHYLATION_MULTI (script/SeekSoulMethyl/nf/modules/step1.nf:88) # Input: paired FASTQs per sample (groups G1/G2/...) # Action: fastp QC (adapter detection disabled, trimming as per pipeline) # Output: cleaned *_methylation_clean_R1/2.fastq.gz, *.html, *.json # FASTP_METHYLATION_BARCODE_EXTRACT (script/SeekSoulMethyl/nf/modules/step1.nf:368) # Input: paired sub-FASTQs # Action: fastp QC # Output: per-pair *.html, *.json ``` -------------------------------- ### Include JavaScript Libraries for SeekSoulMethyl Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/dependence/seeksoultools/fast/template/base.html Includes essential JavaScript libraries such as LZ-String for compression, Plotly for charting, and jQuery for DOM manipulation and AJAX. These libraries are foundational for interactive data display and processing within the project. ```html {% include 'js/lz-string.min.js' %} {% include 'js/plotly-latest.min.js' %} {% include 'js/jquery-3.5.1.min.js' %} {% include 'js/jquery.dataTables.min.js' %} ``` -------------------------------- ### Convert BAM to ALLC Format using ALLCools Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Converts per-cell BAM files to ALLC format using ALLCools with UMI-based deduplication. This step is crucial for accurate methylation quantification in single-cell experiments. It requires input BAM files, sample names, output directories, and reference genome information. ```bash python nf/bin/step3_bam_to_allc.py \ --indir /results/step3/split_bams/merged/merged_fr_bam \ --samplename SAMPLE001 \ --outdir /results/step3 \ --genomefa /ref/human-reference-GRCh38/fasta/genome.fa \ --chrom_size_path /ref/human-reference-GRCh38/bed/chr_nochrM.bed \ --filtered_barcode /results/step3/split_bams/merged/merged_filtered_barcode \ --core 32 \ --tag UR ``` -------------------------------- ### Card and Summary Data Styling Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/dependence/seeksoultools/fast/template/base.html Styles for card elements and summary data displays. This includes centering and styling for card titles, large font sizes for summary data, and specific styling for the 'summary_description' pseudo-element. ```css .card h1 { text-align: center; color: #555; font-size: 24px; font-weight: 500; line-height: normal; margin: 0; } .summary_data { text-align: center; color: #0096F0; font-size: 40px; } .card { margin-bottom: 20; } .summary_description::before { float: right; margin-top: 5px; margin-right: 5px; position: relative; right: 5px; content: "\00d7"; cursor: pointer; font-size: 18px; } .has_desc::after { float: right; margin-top: 5px; margin-right: 5px; position: relative; background: rgba(0,0,0,0.1); color: white; width: 18px; height: 18px; border-radius: 18px; cursor: pointer; text-align: center; line-height: 20px; content: '?'; } ``` -------------------------------- ### ALLCools: Split BAMs and Generate ALLC Files Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/README.md Splits name-sorted BAM files into per-cell BAMs based on cell barcodes. It then converts these per-cell BAMs to ALLC format using `bam-to-allc`, incorporating UR-tag-based UMI correction and deduplication per C site. ```bash allcools split-bam --input sorted_by_name.bam --output per_cell_bam_dir --cell_barcode_tag CB for cell_bam in per_cell_bam_dir/*.bam; do allcools bam-to-allc --input $cell_bam --output ${cell_bam%.bam}.allc --genome /path/to/genome --umi_tag UR done ``` -------------------------------- ### Generate ALLCools Multi-Scale Datasets Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Creates multi-scale methylation datasets (MCDS format) for downstream analysis and visualization. This involves first creating a table of ALLC file paths and then using the 'allcools generate-dataset' command with specified region sizes and quantifiers. ```bash # Create allc file path table ls /results/step3/allcools/*_allc.gz | \ awk '{n=split($0, p, "/"); f=p[n]; gsub(/_allc.gz/, "", f); print f "\t" $0}' \ > allc_file_path.txt # Generate multi-scale datasets allcools generate-dataset \ --allc_table allc_file_path.txt \ --output_path /results/step3/SAMPLE001.mcds \ --chrom_size_path /ref/bed/chr_nochrM.bed \ --obs_dim cell \ --cpu 32 \ --regions chrom10k 10000 \ --regions chrom20k 20000 \ --regions chrom50k 50000 \ --regions chrom100k 100000 \ --regions chrom500k 500000 \ --regions chrom1M 1000000 \ --quantifiers chrom20k count CGN \ --quantifiers chrom20k hypo-score CGN cutoff=0.9 \ --quantifiers chrom100k count CGN \ --quantifiers chrom100k hypo-score CGN cutoff=0.9 ``` -------------------------------- ### Project Logo Link Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/dependence/seeksoultools/fast/template/base.html An HTML anchor tag that displays the project logo, encoded in base64. Clicking the logo navigates to the project's root or homepage. ```html [![](data:image/png;base64,{{ logobase64 }})](#) ``` -------------------------------- ### Inject Data for Bundle Consumption (JavaScript) Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/nf/bin/utils/report_rna_met/base.html Injects parsed JSON data into the window object for consumption by the application's JavaScript bundle. It relies on server-side templating to insert the data. Ensure the 'websummary_json_data' variable is correctly populated and escaped. ```javascript window.data = JSON.parse('{{ websummary_json_data | safe }}') ``` -------------------------------- ### Generate Methylation Summary Report Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Generates comprehensive methylation quality control summary including CpG coverage, conversion rates, and per-cell statistics. This script requires an output directory, sample name, and paths to summary and genome information JSON files. ```bash # Generate methylation summary report python nf/bin/step4_wgs_summary.py \ --outdir /results/SAMPLE001_methy \ --samplename SAMPLE001 \ --summary_json /results/SAMPLE001_methy/SAMPLE001_summary.json \ --genome_info_json /results/genome_cpg_sites.json ``` -------------------------------- ### Language Selection Buttons Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/dependence/seeksoultools/fast/template/base.html Provides buttons or links for selecting the display language, specifically English and Chinese. This suggests the application supports multiple languages. ```html English Chinese ``` -------------------------------- ### Nextflow Configuration for Local Execution Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/README.md Configures Nextflow to run on a local machine. It specifies the executor, work directory, resources (CPU, memory), and enables Conda for environment management. This is suitable for testing and development on a single machine. ```groovy local { process.executor = 'local' workDir = '/path/to/work' process.cpus = 8 process.memory = '32 GB' conda.enabled = true // Or containers: singularity.enabled = true / docker.enabled = true } ``` -------------------------------- ### Include JavaScript Template (Jinja) Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/nf/bin/utils/report_rna_met/base.html Includes external JavaScript content from a template file named 'template.js'. This is a common pattern for organizing and reusing JavaScript code within a project. The exact content of 'template.js' is not provided here. ```jinja {% include 'template.js' %} ``` -------------------------------- ### RNA Alignment and Quantification (Shell Command) Source: https://github.com/seekgene/seeksoulmethyl/wiki/Tutorial Performs RNA sequencing alignment and quantification using seeksoultools. It maps reads, counts transcripts, filters, clusters, and identifies differentially expressed genes. ```shell # SEEKSOULTOOLS_RNA (script/SeekSoulMethyl/nf/modules/step1.nf:156) # Input: cleaned expression R1/R2 lists # Action: seeksoultools rna run (STAR mapping, counting, filtering, clustering, DE) # Output: Analysis/step3/filtered_feature_bc_matrix/ etc. ``` -------------------------------- ### Extract Methylation Barcodes and Process Reads with Python Script Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Processes methylation FASTQ files to extract cell barcodes and UMIs, determine strand direction, calculate C-T conversion rates, and filter reads based on CH methylation patterns. Requires input FASTQ files, sample name, output directory, barcode whitelist, chemistry type, and core count. ```bash python nf/bin/barcode_cs_multi.py \ --fq1 /data/methy_R1_clean.fq.gz \ --fq2 /data/methy_R2_clean.fq.gz \ --samplename SAMPLE001 \ --outdir /results/SAMPLE001_methy \ --barcode nf/bin/barcodes/U3CB_methylation.txt \ --chemistry DD-MET3 \ --split_fastq 4 \ --filter_ch 2 \ --core 16 ``` -------------------------------- ### Run SeekSoulTools RNA Analysis Source: https://context7.com/seekgene/seeksoulmethyl/llms.txt Runs transcriptome analysis using SeekSoulTools to generate cell barcodes for methylation cell identification. This command requires paired-end FASTQ files, sample name, reference genome directory, and GTF file. ```bash # Run SeekSoulTools RNA analysis seeksoultools rna run \ --fq1 /data/exp_R1_clean.fq.gz \ --fq2 /data/exp_R2_clean.fq.gz \ --samplename SAMPLE001 \ --genomeDir /ref/human-reference-GRCh38/star \ --gtf /ref/human-reference-GRCh38/genes/genes.gtf \ --chemistry DDV2 \ --core 32 \ --include-introns \ --outdir /results/SAMPLE001_exp ``` -------------------------------- ### ALLCools: Generate Methylation Dataset (MCDS) Source: https://github.com/seekgene/seeksoulmethyl/blob/nf_rna_methy/README.md Generates per-cell methylation matrices (MCDS) by binning the genome into specified sizes (e.g., chrom10k, 20k, 50k, 100k, 500k, 1M, geneslop2k). Geneslop2k bins are defined as 2k bp flanking each gene. ```bash allcools generate-dataset --input per_cell_allc_dir --output dataset.h5 --binning methy --genome_file /path/to/genome.bed --chrom_size /path/to/chrom.sizes --bin_size 20000 ```