### Install pytximport with Pip Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md Install pytximport using pip, a common Python package installer. ```bash pip install pytximport ``` -------------------------------- ### Install pyarrow with Pip Source: https://github.com/complextissue/pytximport/blob/main/README.md Install pyarrow using pip. This is an alternative installation method for faster quantification file imports. ```bash pip install pyarrow ``` -------------------------------- ### Install pytximport with Mamba Source: https://github.com/complextissue/pytximport/blob/main/README.md Install pytximport using the Bioconda package manager. This is the recommended installation method. ```bash mamba install -c bioconda pytximport ``` -------------------------------- ### Install pytximport from Source Source: https://github.com/complextissue/pytximport/blob/main/docs/source/installation.md For developers, clone the repository, set up a local Python version, create a virtual environment, and install development dependencies. This method includes additional dependencies required for development. ```bash git clone --depth 1 -b dev https://github.com/complextissue/pytximport.git cd pytximport pyenv local 3.12 make create-venv source .venv/source/activate make install-dev ``` -------------------------------- ### Set Up pytximport for Development Source: https://github.com/complextissue/pytximport/blob/main/README.md Follow these steps to clone the repository, set up a virtual environment, and install development dependencies. This is recommended for contributing to the project. ```bash git clone --depth 1 -b dev https://github.com/complextissue/pytximport.git cd pytximport uv venv --python 3.13 source .venv/source/activate make install-dev ``` -------------------------------- ### Install pytximport via GitHub Source: https://github.com/complextissue/pytximport/blob/main/docs/source/installation.md Install the latest development version of pytximport directly from its GitHub repository using pip. ```bash python3 -m pip install git+https://github.com/complextissue/pytximport.git ``` -------------------------------- ### Install pytximport via PyPI Source: https://github.com/complextissue/pytximport/blob/main/docs/source/installation.md Install pytximport and pyarrow using pip. This method is suitable for users who prefer using pip for package management. ```bash python3 -m pip install pytximport pyarrow ``` -------------------------------- ### Build Documentation Locally Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md Build the project's documentation locally. This requires the development requirements and the package to be installed in the same virtual environment, along with `pandoc`. ```bash make html ``` -------------------------------- ### Install pytximport via Bioconda Source: https://github.com/complextissue/pytximport/blob/main/docs/source/installation.md Use mamba or conda to install pytximport and pyarrow-core from the Bioconda channel. This is the recommended installation method. ```bash mamba install -c bioconda pytximport mamba install -c conda-forge pyarrow-core ``` -------------------------------- ### Install pytximport with Mamba Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md Install pytximport using the Bioconda channel for a streamlined dependency management. ```bash mamba install -c bioconda pytximpport ``` -------------------------------- ### Install R package 'matrixStats' Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb Installs the 'matrixStats' R package. This command prompts the user to select a CRAN mirror for the download. ```R install.packages("matrixStats") ``` -------------------------------- ### Install rpy2 Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb Installs the rpy2 package, which allows Python to interface with R. Ensure the version matches requirements. ```python !pip install rpy2==3.4.5 -q ``` -------------------------------- ### Install pyarrow with Mamba Source: https://github.com/complextissue/pytximport/blob/main/README.md Install pyarrow using the conda-forge package manager. This is recommended for faster import of tab-separated value-based quantification files. ```bash mamba install -c conda-forge pyarrow-core ``` -------------------------------- ### Import necessary libraries for PyDESeq2 Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Import the required libraries for using PyDESeq2 and decoupler. Ensure these are installed in your environment. ```python import decoupler as dc from pydeseq2.dds import DeseqDataSet from pydeseq2.default_inference import DefaultInference from pydeseq2.ds import DeseqStats ``` -------------------------------- ### Import data with inferential replicates Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb This Python snippet shows how to use the `tximport` function to import quantification data, including handling inferential replicates. Use this when your data includes bootstrap replicates and you need to process them, for example, by calculating the median. ```python result = tximport( [ "../../test/data/fabry_disease/SRR16504309_wt/", "../../test/data/fabry_disease/SRR16504310_wt/", "../../test/data/fabry_disease/SRR16504311_ko/", "../../test/data/fabry_disease/SRR16504312_ko/", ], "salmon", transcript_gene_map_human, inferential_replicates=True, inferential_replicate_variance=True, # whether to calculate the variance of the inferential replicates inferential_replicate_transformer=lambda x: np.median(x, axis=1), counts_from_abundance="length_scaled_tpm", ) result ``` -------------------------------- ### Get R version string Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb Executes R code to retrieve and display the R version string. This confirms R is accessible via rpy2. ```R R.version.string ``` -------------------------------- ### Check rpy2 and R environment Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb Verifies the installed rpy2 version, Python version, and R environment details. This helps in diagnosing compatibility issues. ```python !python3 -m rpy2.situation ``` -------------------------------- ### R: Import Salmon Quantifications with tximport Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb This R script uses tximport to import Salmon quantification files. It iterates through different abundance options and saves the resulting transcript counts to CSV files. Ensure the 'readr' and 'tximport' packages are installed. ```R %%R dir <- "./data/salmon" files_protein_coding <- c( file.path(dir, "quant.sf") ) tx2gene <- read_tsv(file.path("./data/fabry_disease", "transcript_gene_mapping_human.tsv")) countsFromAbundanceOptions <- c("scaledTPM", "dtuScaledTPM") for (idx in seq_along(countsFromAbundanceOptions)) { txi <- tximport( files_protein_coding, type = "salmon", tx2gene = tx2gene, txOut = TRUE, countsFromAbundance = countsFromAbundanceOptions[idx], ignoreTxVersion = TRUE, ignoreAfterBar = TRUE ) writePath <- file.path(dir, "counts_tximport.csv") if (!is.null(countsFromAbundanceOptions[idx])) { writePath <- gsub(".csv", paste0("_", countsFromAbundanceOptions[idx], ".csv"), writePath) } write.csv(txi$counts, writePath) } ``` -------------------------------- ### Explore pytximport CLI Options Source: https://github.com/complextissue/pytximport/blob/main/README.md Use this command to view all available options for the pytximport command-line interface. This is useful for understanding the full range of functionalities accessible directly from the terminal. ```bash pytximport --help ``` -------------------------------- ### Run pytximport from the command line Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb This snippet shows how to run pytximport from the command line to import salmon quantification data into an h5ad file. Use this for direct command-line processing of quantification files. ```bash !pytximport -i ../../test/data/salmon/quant.sf -t "salmon" -m ../../test/data/gencode.v46.metadata.HGNC.tsv -of "h5ad" -ow -o ../../test/data/salmon/quant.h5ad ``` -------------------------------- ### Python: Import Salmon Quantifications with pytximport Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb This command-line execution uses pytximport to process Salmon quantification data. It specifies input files, output path, and the desired abundance scaling method ('dtu_scaled_tpm'). ```bash !pytximport -i ./data/salmon/quant.sf -m ./data/fabry_disease/transcript_gene_mapping_human.tsv -ow -o ./data/salmon/counts_pytximport_dtuScaledTPM.csv -t salmon -tx -c dtu_scaled_tpm ``` -------------------------------- ### Run pytximport from Command Line Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md Execute pytximport directly from the command line to process quantification files and generate output counts. Specify input files, data type, transcript-to-gene map, and output path. ```bash pytximport -i ./sample_1.sf -i ./sample_2.sf -t salmon -m ./tx2gene_map.tsv -o ./output_counts.csv ``` -------------------------------- ### Run Unit Tests Locally Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md Execute the unit tests for the project locally. Ensure your development environment is set up according to the contributing guidelines. ```bash make coverage-report ``` -------------------------------- ### Add pytximport with Uv Source: https://github.com/complextissue/pytximport/blob/main/README.md Add pytximport to your project dependencies using the uv package manager. ```bash uv add pytximport ``` -------------------------------- ### Initialize DeseqDataSet Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Create a DeseqDataSet object, which is the primary object for running DESeq2 analysis. Specify the design formula and inference method. ```python dds = DeseqDataSet( adata=result, design="~condition", refit_cooks=True, inference=DefaultInference(n_cpus=8), quiet=True, ) ``` -------------------------------- ### Export Data as SummarizedExperiment Object Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Export quantification data as a SummarizedExperiment object by setting `output_type` to 'summarizedexperiment'. This is an experimental feature useful for interoperability with R packages. ```python txi_se = tximport( ["../../test/data/salmon/quant.sf"], "salmon", transcript_gene_map_human, output_type="summarizedexperiment", # the output can optionally be saved to disk by uncommenting the following lines # output_format="summarizedexperiment", # output_path="txi_se", ) txi_se.assay_names, txi_se.get_row_names(), txi_se.get_column_names() ``` -------------------------------- ### Import Salmon quantification files Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Use the tximport function to import quantification files from Salmon. Specify the paths to the files, the quantification tool, a transcript-to-gene mapping, and desired output settings. This function can output data as xarray or anndata objects. ```python txi = tximport( [ "../../test/data/salmon/multiple/Sample_1.sf", "../../test/data/salmon/multiple/Sample_2.sf", ], "salmon", transcript_gene_map_mouse, counts_from_abundance="length_scaled_tpm", output_type="xarray", # or "anndata" ) txi ``` -------------------------------- ### Import necessary libraries Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Import numpy, pandas, and the tximport function from the pytximport library. These are standard imports for data manipulation and using the library's core functionality. ```python import numpy as np import pandas as pd from pytximport import tximport ``` -------------------------------- ### Import and Use tximport in Python Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md Import the tximport function and utility functions to process quantification data within a Python script. Ensure file_paths and transcript_gene_map are defined. ```python from pytximport import tximport from pytximport.utils import create_transcript_gene_map transcript_gene_map = create_transcript_gene_map(species="human") results = tximport( file_paths, data_type="salmon", transcript_gene_map=transcript_gene_map, ) ``` -------------------------------- ### Comparing tximport and pytximport outputs Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb This code reads CSV files generated by both tximport (R) and pytximport (Python) for different quantification methods. It then asserts that the dataframes are equal, ensuring consistency between the two implementations. ```python import pandas as pd counts_tximport_no = pd.read_csv("./data/fabry_disease/counts_tximport_no.csv") counts_tximport_scaledTPM = pd.read_csv("./data/fabry_disease/counts_tximport_scaledTPM.csv") counts_tximport_lengthScaledTPM = pd.read_csv("./data/fabry_disease/counts_tximport_lengthScaledTPM.csv") counts_pytximport_no = pd.read_csv("./data/fabry_disease/counts_pytximport_no.csv") counts_pytximport_scaledTPM = pd.read_csv("./data/fabry_disease/counts_pytximport_scaledTPM.csv") counts_pytximport_lengthScaledTPM = pd.read_csv("./data/fabry_disease/counts_pytximport_lengthScaledTPM.csv") counts_pytximport_no.columns = counts_tximport_no.columns counts_pytximport_scaledTPM.columns = counts_tximport_scaledTPM.columns counts_pytximport_lengthScaledTPM.columns = counts_tximport_lengthScaledTPM.columns pd.testing.assert_frame_equal(counts_tximport_no, counts_pytximport_no) pd.testing.assert_frame_equal(counts_tximport_scaledTPM, counts_pytximport_scaledTPM) pd.testing.assert_frame_equal(counts_tximport_lengthScaledTPM, counts_pytximport_lengthScaledTPM) ``` -------------------------------- ### Export Data as AnnData Object Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Export quantification data directly as an AnnData object by setting `output_type` to 'anndata'. This facilitates integration with the scverse ecosystem. ```python txi_ad = tximport( ["../../test/data/salmon/quant.sf"], "salmon", transcript_gene_map_human, output_type="anndata", # the output can optionally be saved to a file by uncommenting the following lines # output_format="h5ad", # output_path="txi_ad.h5ad", ) txi_ad ``` -------------------------------- ### Python: Compare tximport and pytximport Outputs Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb This Python script reads and compares the CSV outputs generated by R's tximport and Python's pytximport. It ensures that the dataframes are identical after sorting and removing transcript versions from the index. Requires pandas and pandas.testing. ```python counts_tximport_dtuScaledTPM = pd.read_csv("./data/salmon/counts_tximport_dtuScaledTPM.csv", index_col=0).sort_index() counts_pytximport_dtuScaledTPM = pd.read_csv( "./data/salmon/counts_pytximport_dtuScaledTPM.csv", index_col=0 ).sort_index() # cut the transcript version from the index counts_tximport_dtuScaledTPM.index = counts_tximport_dtuScaledTPM.index.str.split(".").str[0] counts_pytximport_dtuScaledTPM.columns = counts_tximport_dtuScaledTPM.columns pd.testing.assert_frame_equal(counts_tximport_dtuScaledTPM, counts_pytximport_dtuScaledTPM) ``` -------------------------------- ### Prepare AnnData object for PyDESeq2 Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Round count estimates to integers and add metadata, such as experimental conditions, to the AnnData object. This step is crucial for PyDESeq2 compatibility. ```python result.X = result.X.round().astype(int) result.obs["condition"] = ["Control", "Control", "Disease", "Disease"] result ``` -------------------------------- ### Load rpy2 IPython extension Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb Loads the rpy2 extension for IPython, enabling the use of R magic commands within Jupyter notebooks or IPython environments. ```python %load_ext rpy2.ipython ``` -------------------------------- ### Create Transcript-to-Gene Map from GTF Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md Generate a transcript-to-gene mapping file from a GTF annotation file using the command line interface. Specify input GTF path and output CSV path. ```bash pytximport create-map -i ./data/annotation.gtf -o tx2gene.csv -ow ``` -------------------------------- ### pytximport Generate Counts (Scaled TPM) Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb This command uses pytximport to generate gene-level counts from Salmon quantification files, applying scaled TPM normalization. It requires input files, a transcript-to-gene mapping, and specifies the output file. ```bash !pytximport -i ./data/fabry_disease/SRR16504309_wt/quant.sf -i ./data/fabry_disease/SRR16504310_wt/quant.sf -i ./data/fabry_disease/SRR16504311_ko/quant.sf -i ./data/fabry_disease/SRR16504312_ko/quant.sf -t salmon -m ./data/fabry_disease/transcript_gene_mapping_human.tsv -ow -o ./data/fabry_disease/counts_pytximport_scaledTPM.csv -c scaled_tpm ``` -------------------------------- ### pytximport Generate Counts (Length Scaled TPM) Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb This command uses pytximport to generate gene-level counts from Salmon quantification files, applying length-scaled TPM normalization. It requires input files, a transcript-to-gene mapping, and specifies the output file. ```bash !pytximport -i ./data/fabry_disease/SRR16504309_wt/quant.sf -i ./data/fabry_disease/SRR16504310_wt/quant.sf -i ./data/fabry_disease/SRR16504311_ko/quant.sf -i ./data/fabry_disease/SRR16504312_ko/quant.sf -t salmon -m ./data/fabry_disease/transcript_gene_mapping_human.tsv -ow -o ./data/fabry_disease/counts_pytximport_lengthScaledTPM.csv -c length_scaled_tpm ``` -------------------------------- ### Export Transcript Counts with pytximport Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Use `tximport` to export transformed transcript counts. Set `counts_from_abundance` to 'scaled_tpm' and `return_transcript_data` to True for transcript-level analysis. ```python txi = tximport( ["../../test/data/salmon/quant.sf"], "salmon", counts_from_abundance="scaled_tpm", return_transcript_data=True, ) txi ``` -------------------------------- ### pytximport Generate Counts (No Scaling) Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb This command uses pytximport to generate gene-level counts from Salmon quantification files without any scaling. It requires input files, a transcript-to-gene mapping, and specifies the output file. ```bash !pytximport -i ./data/fabry_disease/SRR16504309_wt/quant.sf -i ./data/fabry_disease/SRR16504310_wt/quant.sf -i ./data/fabry_disease/SRR16504311_ko/quant.sf -i ./data/fabry_disease/SRR16504312_ko/quant.sf -t salmon -m ./data/fabry_disease/transcript_gene_mapping_human.tsv -ow -o ./data/fabry_disease/counts_pytximport_no.csv ``` -------------------------------- ### Create Mouse Transcript-to-Gene Map (Gene ID) Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Generates a transcript-to-gene map for mouse species, mapping transcript IDs to Ensembl gene IDs by default. This function may take a few seconds on the first run to download data. ```python transcript_gene_map_mouse = create_transcript_gene_map(species="mouse") transcript_gene_map_mouse.head(5) ``` -------------------------------- ### Create Transcript-to-Gene Map Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Create a map between transcript IDs and transcript names using `create_transcript_gene_map`. This is useful for replacing transcript IDs with more readable names in your analysis. ```python from pytximport.utils import replace_transcript_ids_with_names ``` ```python transcript_name_map_human = create_transcript_gene_map("human", target_field="external_transcript_name") transcript_name_map_human.head(5) ``` -------------------------------- ### Create Transcript-to-Gene Map from GTF Annotation Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Generates a transcript-to-gene map from a GTF annotation file, mapping transcript IDs to gene names. Using an annotation file is recommended for consistency with alignment references. ```python from pytximport.utils import create_transcript_gene_map_from_annotation transcript_gene_map_from_gtf = create_transcript_gene_map_from_annotation( "../../test/data/annotation.gtf", target_field="gene_name", ) transcript_gene_map_from_gtf.head(5) ``` -------------------------------- ### Create transcript-to-gene map with biotype information Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb This Python snippet demonstrates how to create a transcript-to-gene mapping from a GTF file, specifically extracting gene name and gene biotype. This is a prerequisite for filtering results by gene biotype. ```python transcript_gene_map_from_gtf_with_biotype = create_transcript_gene_map_from_annotation( "../../test/data/annotation.gtf", target_field=["gene_name", "gene_biotype"], ) transcript_gene_map_from_gtf_with_biotype.head(5) ``` -------------------------------- ### Import and filter by gene biotype Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb This Python snippet shows how to import quantification data and then filter the results to include only 'protein_coding' genes. Use this when you need to focus your analysis on specific gene types. ```python from pytximport.utils import filter_by_biotype result_small = tximport( ["../../test/data/fabry_disease/SRR16504309_wt/"], "salmon", transcript_gene_map_from_gtf_with_biotype, counts_from_abundance="length_scaled_tpm", ) result_small_filtered = filter_by_biotype( result_small, transcript_gene_map_from_gtf_with_biotype, biotype_filter=["protein_coding"], # Since the data is already at the gene level, we have to use the gene_id from the transcript_gene_map id_column="gene_id", ) len(result_small.var_names), len(result_small_filtered.var_names) ``` -------------------------------- ### Run DESeq2 analysis Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Execute the DESeq2 differential expression analysis. This function computes normalized counts and performs statistical tests. ```python dds.deseq2() ``` -------------------------------- ### Import RSEM data with tximport Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb Imports gene expression counts from RSEM files. It iterates through different `countsFromAbundance` options to generate count matrices, saving each to a CSV file. Ensure the RSEM results files and the `transcript_gene_mapping_human.tsv` are in the specified directories. ```R %%R dir <- "./data/rsem" files_protein_coding <- c( file.path(dir, "test.genes.results.gz") ) tx2gene <- read_tsv(file.path("./data/fabry_disease", "transcript_gene_mapping_human.tsv")) countsFromAbundanceOptions <- c("no") for (idx in seq_along(countsFromAbundanceOptions)) { txi <- tximport( files_protein_coding, type = "rsem", tx2gene = tx2gene, txIn = FALSE, countsFromAbundance = countsFromAbundanceOptions[idx], ignoreTxVersion = TRUE, ignoreAfterBar = TRUE ) writePath <- file.path(dir, "counts_tximport.csv") if (!is.null(countsFromAbundanceOptions[idx])) { writePath <- gsub(".csv", paste0("_", countsFromAbundanceOptions[idx], ".csv"), writePath) } write.csv(txi$counts, writePath) } ``` -------------------------------- ### R tximport Output Summary Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb This output shows the summary of data read by R's tximport, including column specifications and messages about transcripts missing from the tx2gene mapping. ```R Output: Rows: 244191 Columns: 2 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "\t" chr (2): transcript_id, gene_id ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` ```R Output: R[write to console]: reading in files with read_tsv R[write to console]: 1 R[write to console]: 2 R[write to console]: 3 R[write to console]: 4 R[write to console]: R[write to console]: transcripts missing from tx2gene: 31380 R[write to console]: summarizing abundance R[write to console]: summarizing counts R[write to console]: summarizing length R[write to console]: reading in files with read_tsv R[write to console]: 1 R[write to console]: 2 R[write to console]: 3 R[write to console]: 4 R[write to console]: R[write to console]: transcripts missing from tx2gene: 31380 R[write to console]: summarizing abundance R[write to console]: summarizing counts R[write to console]: summarizing length R[write to console]: reading in files with read_tsv R[write to console]: 1 R[write to console]: 2 R[write to console]: 3 R[write to console]: 4 R[write to console]: R[write to console]: transcripts missing from tx2gene: 31380 R[write to console]: summarizing abundance R[write to console]: summarizing counts R[write to console]: summarizing length R[write to console]: 1 R[write to console]: 2 R[write to console]: 3 R[write to console]: 4 R[write to console]: R[write to console]: transcripts missing from tx2gene: 31498 R[write to console]: summarizing abundance R[write to console]: summarizing counts R[write to console]: summarizing length R[write to console]: summarizing inferential replicates R[write to console]: reading in files with read_tsv R[write to console]: 1 R[write to console]: 2 R[write to console]: 3 R[write to console]: 4 R[write to console]: R[write to console]: transcripts missing from tx2gene: 31380 R[write to console]: summarizing abundance R[write to console]: summarizing counts R[write to console]: summarizing length R[write to console]: summarizing inferential replicates R[write to console]: 1 R[write to console]: 2 R[write to console]: 3 R[write to console]: 4 R[write to console]: R[write to console]: reading in files with read_tsv R[write to console]: 1 R[write to console]: 2 R[write to console]: 3 R[write to console]: 4 R[write to console]: ``` -------------------------------- ### Create Human Transcript-to-Gene Map (Gene Name and Biotype) Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Generates a transcript-to-gene map for human species, mapping transcript IDs to both external gene names and gene biotypes. The resulting map is then filtered to include only 'lncRNA' biotypes. ```python transcript_gene_map_human_biotype = create_transcript_gene_map( species="human", target_field=["external_gene_name", "gene_biotype"], ) transcript_gene_map_human_biotype = transcript_gene_map_human_biotype[ transcript_gene_map_human_biotype["gene_biotype"] == "lncRNA" ] transcript_gene_map_human_biotype.head(5) ``` -------------------------------- ### Create Human Transcript-to-Gene Map (Gene Name) Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Generates a transcript-to-gene map for human species, mapping transcript IDs to external gene names. This function may take a few seconds on the first run to download data. ```python from pytximport.utils import create_transcript_gene_map transcript_gene_map_human = create_transcript_gene_map( species="human", target_field="external_gene_name", ) transcript_gene_map_human.head(5) ``` -------------------------------- ### Perform statistical analysis and shrinkage Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Generate statistical results for differential expression, including log2 fold changes and adjusted p-values. LFC shrinkage is applied for more reliable estimates. ```python stat_result = DeseqStats(dds, contrast=["condition", "Disease", "Control"], quiet=True) stat_result.summary() stat_result.lfc_shrink(coeff="condition[T.Disease]") ``` -------------------------------- ### Infer pathway activities Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Utilize the Progeny network to infer pathway activities from the differential expression data. This provides insights into affected biological pathways. ```python progeny = dc.op.progeny(organism="human", top=500) pathway_acts, pathway_pvals = dc.mt.ulm(data=data, net=progeny, tmin=5) dc.pl.barplot( data=pathway_acts, name="disease.vs.control", top=40, vertical=True, figsize=(5, 3), ) ``` -------------------------------- ### Summarize Transcript Counts to Genes Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Summarize transcript counts to the gene level by providing a transcript-gene map to `tximport`. Set `output_type` to 'xarray' for gene-level counts. ```python txi = tximport( ["../../test/data/salmon/quant.sf"], "salmon", transcript_gene_map_human, counts_from_abundance="scaled_tpm", output_type="xarray", return_transcript_data=False, ) pd.DataFrame(txi["counts"], index=txi.coords["gene_id"], columns=txi.coords["file_path"]).sort_values( by=txi.coords["file_path"].data[0], ascending=False, ).head(5) ``` -------------------------------- ### Visualize differential expression with volcano plot Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Create a volcano plot to visualize the results of the differential expression analysis. This plot helps identify significantly up- and down-regulated genes. ```python dc.pl.volcano( stat_result.results_df, x="log2FoldChange", y="padj", thr_sign=0.01, top=20, figsize=(10, 5), ) ``` -------------------------------- ### Infer transcription factor activities Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Use the Collectri network to infer transcription factor activities from the differential expression data. This helps understand regulatory mechanisms. ```python collectri = dc.op.collectri(organism="human", remove_complexes=False) data = stat_result.results_df[["stat"]].T.rename(index={"stat": "disease.vs.control"}) tf_acts, tf_pvals = dc.mt.ulm(data=data, net=collectri, tmin=5) dc.pl.barplot(data=tf_acts, name="disease.vs.control", top=10, vertical=True, figsize=(5, 3)) ``` -------------------------------- ### Map Transcript Names to Gene Names Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Creates a map from transcript names to gene IDs for human species, using 'external_transcript_name' as the source field and 'external_gene_name' as the target field. This is useful when transcript identifiers are not standard Ensembl transcript IDs. ```python transcript_name_gene_map_human = create_transcript_gene_map( "human", source_field="external_transcript_name", target_field="external_gene_name", ) transcript_name_gene_map_human.head(5) ``` -------------------------------- ### Replace Transcript IDs with Names Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Use `replace_transcript_ids_with_names` to substitute transcript IDs with their corresponding names from a provided map. This function modifies the transcript data in place. ```python txi = replace_transcript_ids_with_names(txi, transcript_name_map_human) pd.DataFrame(txi.X.T, index=txi.var.index, columns=txi.obs.index).sort_values( by=txi.obs.index[0], ascending=False, ).head(5) ``` -------------------------------- ### Filter genes with low counts Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb Remove genes with very low expression counts across all samples to improve the accuracy of differential expression analysis. Genes with a maximum count of 10 or less are filtered out. ```python result = result[:, result.X.max(axis=0) > 10].copy() result ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.