### Install pytximport with Pip

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md

Install pytximport using pip, a common Python package installer.

```bash
pip install pytximport
```

--------------------------------

### Install pyarrow with Pip

Source: https://github.com/complextissue/pytximport/blob/main/README.md

Install pyarrow using pip. This is an alternative installation method for faster quantification file imports.

```bash
pip install pyarrow
```

--------------------------------

### Install pytximport with Mamba

Source: https://github.com/complextissue/pytximport/blob/main/README.md

Install pytximport using the Bioconda package manager. This is the recommended installation method.

```bash
mamba install -c bioconda pytximport
```

--------------------------------

### Install pytximport from Source

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/installation.md

For developers, clone the repository, set up a local Python version, create a virtual environment, and install development dependencies. This method includes additional dependencies required for development.

```bash
git clone --depth 1 -b dev https://github.com/complextissue/pytximport.git
cd pytximport
pyenv local 3.12
make create-venv
source .venv/source/activate
make install-dev
```

--------------------------------

### Set Up pytximport for Development

Source: https://github.com/complextissue/pytximport/blob/main/README.md

Follow these steps to clone the repository, set up a virtual environment, and install development dependencies. This is recommended for contributing to the project.

```bash
git clone --depth 1 -b dev https://github.com/complextissue/pytximport.git
cd pytximport
uv venv --python 3.13
source .venv/source/activate
make install-dev
```

--------------------------------

### Install pytximport via GitHub

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/installation.md

Install the latest development version of pytximport directly from its GitHub repository using pip.

```bash
python3 -m pip install git+https://github.com/complextissue/pytximport.git
```

--------------------------------

### Install pytximport via PyPI

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/installation.md

Install pytximport and pyarrow using pip. This method is suitable for users who prefer using pip for package management.

```bash
python3 -m pip install pytximport pyarrow
```

--------------------------------

### Build Documentation Locally

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md

Build the project's documentation locally. This requires the development requirements and the package to be installed in the same virtual environment, along with `pandoc`.

```bash
make html
```

--------------------------------

### Install pytximport via Bioconda

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/installation.md

Use mamba or conda to install pytximport and pyarrow-core from the Bioconda channel. This is the recommended installation method.

```bash
mamba install -c bioconda pytximport
mamba install -c conda-forge pyarrow-core
```

--------------------------------

### Install pytximport with Mamba

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md

Install pytximport using the Bioconda channel for a streamlined dependency management.

```bash
mamba install -c bioconda pytximpport
```

--------------------------------

### Install R package 'matrixStats'

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

Installs the 'matrixStats' R package. This command prompts the user to select a CRAN mirror for the download.

```R
install.packages("matrixStats")
```

--------------------------------

### Install rpy2

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

Installs the rpy2 package, which allows Python to interface with R. Ensure the version matches requirements.

```python
!pip install rpy2==3.4.5 -q
```

--------------------------------

### Install pyarrow with Mamba

Source: https://github.com/complextissue/pytximport/blob/main/README.md

Install pyarrow using the conda-forge package manager. This is recommended for faster import of tab-separated value-based quantification files.

```bash
mamba install -c conda-forge pyarrow-core
```

--------------------------------

### Import necessary libraries for PyDESeq2

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Import the required libraries for using PyDESeq2 and decoupler. Ensure these are installed in your environment.

```python
import decoupler as dc
from pydeseq2.dds import DeseqDataSet
from pydeseq2.default_inference import DefaultInference
from pydeseq2.ds import DeseqStats
```

--------------------------------

### Import data with inferential replicates

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

This Python snippet shows how to use the `tximport` function to import quantification data, including handling inferential replicates. Use this when your data includes bootstrap replicates and you need to process them, for example, by calculating the median.

```python
result = tximport(
    [
        "../../test/data/fabry_disease/SRR16504309_wt/",
        "../../test/data/fabry_disease/SRR16504310_wt/",
        "../../test/data/fabry_disease/SRR16504311_ko/",
        "../../test/data/fabry_disease/SRR16504312_ko/",
    ],
    "salmon",
    transcript_gene_map_human,
    inferential_replicates=True,
    inferential_replicate_variance=True,  # whether to calculate the variance of the inferential replicates
    inferential_replicate_transformer=lambda x: np.median(x, axis=1),
    counts_from_abundance="length_scaled_tpm",
)
result
```

--------------------------------

### Get R version string

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

Executes R code to retrieve and display the R version string. This confirms R is accessible via rpy2.

```R
R.version.string
```

--------------------------------

### Check rpy2 and R environment

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

Verifies the installed rpy2 version, Python version, and R environment details. This helps in diagnosing compatibility issues.

```python
!python3 -m rpy2.situation
```

--------------------------------

### R: Import Salmon Quantifications with tximport

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

This R script uses tximport to import Salmon quantification files. It iterates through different abundance options and saves the resulting transcript counts to CSV files. Ensure the 'readr' and 'tximport' packages are installed.

```R
%%R
dir <- "./data/salmon"
files_protein_coding <- c(
  file.path(dir, "quant.sf")
)
tx2gene <- read_tsv(file.path("./data/fabry_disease", "transcript_gene_mapping_human.tsv"))
countsFromAbundanceOptions <- c("scaledTPM", "dtuScaledTPM")
for (idx in seq_along(countsFromAbundanceOptions)) {
    txi <- tximport(
        files_protein_coding,
        type = "salmon",
        tx2gene = tx2gene,
        txOut = TRUE,
        countsFromAbundance = countsFromAbundanceOptions[idx],
        ignoreTxVersion = TRUE,
        ignoreAfterBar = TRUE
    )
    writePath <- file.path(dir, "counts_tximport.csv")
    if (!is.null(countsFromAbundanceOptions[idx])) {
        writePath <- gsub(".csv", paste0("_", countsFromAbundanceOptions[idx], ".csv"), writePath)
    }
    write.csv(txi$counts, writePath)
}
```

--------------------------------

### Explore pytximport CLI Options

Source: https://github.com/complextissue/pytximport/blob/main/README.md

Use this command to view all available options for the pytximport command-line interface. This is useful for understanding the full range of functionalities accessible directly from the terminal.

```bash
pytximport --help
```

--------------------------------

### Run pytximport from the command line

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

This snippet shows how to run pytximport from the command line to import salmon quantification data into an h5ad file. Use this for direct command-line processing of quantification files.

```bash
!pytximport -i ../../test/data/salmon/quant.sf -t "salmon" -m ../../test/data/gencode.v46.metadata.HGNC.tsv -of "h5ad" -ow -o ../../test/data/salmon/quant.h5ad
```

--------------------------------

### Python: Import Salmon Quantifications with pytximport

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

This command-line execution uses pytximport to process Salmon quantification data. It specifies input files, output path, and the desired abundance scaling method ('dtu_scaled_tpm').

```bash
!pytximport -i ./data/salmon/quant.sf -m ./data/fabry_disease/transcript_gene_mapping_human.tsv  -ow  -o ./data/salmon/counts_pytximport_dtuScaledTPM.csv -t salmon -tx -c dtu_scaled_tpm
```

--------------------------------

### Run pytximport from Command Line

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md

Execute pytximport directly from the command line to process quantification files and generate output counts. Specify input files, data type, transcript-to-gene map, and output path.

```bash
pytximport -i ./sample_1.sf -i ./sample_2.sf -t salmon -m ./tx2gene_map.tsv -o ./output_counts.csv
```

--------------------------------

### Run Unit Tests Locally

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md

Execute the unit tests for the project locally. Ensure your development environment is set up according to the contributing guidelines.

```bash
make coverage-report
```

--------------------------------

### Add pytximport with Uv

Source: https://github.com/complextissue/pytximport/blob/main/README.md

Add pytximport to your project dependencies using the uv package manager.

```bash
uv add pytximport
```

--------------------------------

### Initialize DeseqDataSet

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Create a DeseqDataSet object, which is the primary object for running DESeq2 analysis. Specify the design formula and inference method.

```python
dds = DeseqDataSet(
    adata=result,
    design="~condition",
    refit_cooks=True,
    inference=DefaultInference(n_cpus=8),
    quiet=True,
)
```

--------------------------------

### Export Data as SummarizedExperiment Object

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Export quantification data as a SummarizedExperiment object by setting `output_type` to 'summarizedexperiment'. This is an experimental feature useful for interoperability with R packages.

```python
txi_se = tximport(
    ["../../test/data/salmon/quant.sf"],
    "salmon",
    transcript_gene_map_human,
    output_type="summarizedexperiment",
    # the output can optionally be saved to disk by uncommenting the following lines
    # output_format="summarizedexperiment",
    # output_path="txi_se",
)
txi_se.assay_names, txi_se.get_row_names(), txi_se.get_column_names()
```

--------------------------------

### Import Salmon quantification files

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Use the tximport function to import quantification files from Salmon. Specify the paths to the files, the quantification tool, a transcript-to-gene mapping, and desired output settings. This function can output data as xarray or anndata objects.

```python
txi = tximport(
    [
        "../../test/data/salmon/multiple/Sample_1.sf",
        "../../test/data/salmon/multiple/Sample_2.sf",
    ],
    "salmon",
    transcript_gene_map_mouse,
    counts_from_abundance="length_scaled_tpm",
    output_type="xarray",  # or "anndata"
)
txi
```

--------------------------------

### Import necessary libraries

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Import numpy, pandas, and the tximport function from the pytximport library. These are standard imports for data manipulation and using the library's core functionality.

```python
import numpy as np
import pandas as pd

from pytximport import tximport
```

--------------------------------

### Import and Use tximport in Python

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md

Import the tximport function and utility functions to process quantification data within a Python script. Ensure file_paths and transcript_gene_map are defined.

```python
from pytximport import tximport
from pytximport.utils import create_transcript_gene_map

transcript_gene_map = create_transcript_gene_map(species="human")

results = tximport(
    file_paths,
    data_type="salmon",
    transcript_gene_map=transcript_gene_map,
)
```

--------------------------------

### Comparing tximport and pytximport outputs

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

This code reads CSV files generated by both tximport (R) and pytximport (Python) for different quantification methods. It then asserts that the dataframes are equal, ensuring consistency between the two implementations.

```python
import pandas as pd

counts_tximport_no = pd.read_csv("./data/fabry_disease/counts_tximport_no.csv")
counts_tximport_scaledTPM = pd.read_csv("./data/fabry_disease/counts_tximport_scaledTPM.csv")
counts_tximport_lengthScaledTPM = pd.read_csv("./data/fabry_disease/counts_tximport_lengthScaledTPM.csv")

counts_pytximport_no = pd.read_csv("./data/fabry_disease/counts_pytximport_no.csv")
counts_pytximport_scaledTPM = pd.read_csv("./data/fabry_disease/counts_pytximport_scaledTPM.csv")
counts_pytximport_lengthScaledTPM = pd.read_csv("./data/fabry_disease/counts_pytximport_lengthScaledTPM.csv")
counts_pytximport_no.columns = counts_tximport_no.columns
counts_pytximport_scaledTPM.columns = counts_tximport_scaledTPM.columns
counts_pytximport_lengthScaledTPM.columns = counts_tximport_lengthScaledTPM.columns

pd.testing.assert_frame_equal(counts_tximport_no, counts_pytximport_no)
pd.testing.assert_frame_equal(counts_tximport_scaledTPM, counts_pytximport_scaledTPM)
pd.testing.assert_frame_equal(counts_tximport_lengthScaledTPM, counts_pytximport_lengthScaledTPM)
```

--------------------------------

### Export Data as AnnData Object

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Export quantification data directly as an AnnData object by setting `output_type` to 'anndata'. This facilitates integration with the scverse ecosystem.

```python
txi_ad = tximport(
    ["../../test/data/salmon/quant.sf"],
    "salmon",
    transcript_gene_map_human,
    output_type="anndata",
    # the output can optionally be saved to a file by uncommenting the following lines
    # output_format="h5ad",
    # output_path="txi_ad.h5ad",
)
txi_ad
```

--------------------------------

### Python: Compare tximport and pytximport Outputs

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

This Python script reads and compares the CSV outputs generated by R's tximport and Python's pytximport. It ensures that the dataframes are identical after sorting and removing transcript versions from the index. Requires pandas and pandas.testing.

```python
counts_tximport_dtuScaledTPM = pd.read_csv("./data/salmon/counts_tximport_dtuScaledTPM.csv", index_col=0).sort_index()
counts_pytximport_dtuScaledTPM = pd.read_csv(
    "./data/salmon/counts_pytximport_dtuScaledTPM.csv", index_col=0
).sort_index()
# cut the transcript version from the index
counts_tximport_dtuScaledTPM.index = counts_tximport_dtuScaledTPM.index.str.split(".").str[0]
counts_pytximport_dtuScaledTPM.columns = counts_tximport_dtuScaledTPM.columns

pd.testing.assert_frame_equal(counts_tximport_dtuScaledTPM, counts_pytximport_dtuScaledTPM)
```

--------------------------------

### Prepare AnnData object for PyDESeq2

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Round count estimates to integers and add metadata, such as experimental conditions, to the AnnData object. This step is crucial for PyDESeq2 compatibility.

```python
result.X = result.X.round().astype(int)
result.obs["condition"] = ["Control", "Control", "Disease", "Disease"]
result
```

--------------------------------

### Load rpy2 IPython extension

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

Loads the rpy2 extension for IPython, enabling the use of R magic commands within Jupyter notebooks or IPython environments.

```python
%load_ext rpy2.ipython
```

--------------------------------

### Create Transcript-to-Gene Map from GTF

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/start.md

Generate a transcript-to-gene mapping file from a GTF annotation file using the command line interface. Specify input GTF path and output CSV path.

```bash
pytximport create-map -i ./data/annotation.gtf -o tx2gene.csv -ow
```

--------------------------------

### pytximport Generate Counts (Scaled TPM)

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

This command uses pytximport to generate gene-level counts from Salmon quantification files, applying scaled TPM normalization. It requires input files, a transcript-to-gene mapping, and specifies the output file.

```bash
!pytximport -i ./data/fabry_disease/SRR16504309_wt/quant.sf -i ./data/fabry_disease/SRR16504310_wt/quant.sf -i ./data/fabry_disease/SRR16504311_ko/quant.sf -i ./data/fabry_disease/SRR16504312_ko/quant.sf -t salmon -m ./data/fabry_disease/transcript_gene_mapping_human.tsv -ow -o ./data/fabry_disease/counts_pytximport_scaledTPM.csv -c scaled_tpm
```

--------------------------------

### pytximport Generate Counts (Length Scaled TPM)

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

This command uses pytximport to generate gene-level counts from Salmon quantification files, applying length-scaled TPM normalization. It requires input files, a transcript-to-gene mapping, and specifies the output file.

```bash
!pytximport -i ./data/fabry_disease/SRR16504309_wt/quant.sf -i ./data/fabry_disease/SRR16504310_wt/quant.sf -i ./data/fabry_disease/SRR16504311_ko/quant.sf -i ./data/fabry_disease/SRR16504312_ko/quant.sf -t salmon -m ./data/fabry_disease/transcript_gene_mapping_human.tsv -ow -o ./data/fabry_disease/counts_pytximport_lengthScaledTPM.csv -c length_scaled_tpm
```

--------------------------------

### Export Transcript Counts with pytximport

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Use `tximport` to export transformed transcript counts. Set `counts_from_abundance` to 'scaled_tpm' and `return_transcript_data` to True for transcript-level analysis.

```python
txi = tximport(
    ["../../test/data/salmon/quant.sf"],
    "salmon",
    counts_from_abundance="scaled_tpm",
    return_transcript_data=True,
)
txi
```

--------------------------------

### pytximport Generate Counts (No Scaling)

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

This command uses pytximport to generate gene-level counts from Salmon quantification files without any scaling. It requires input files, a transcript-to-gene mapping, and specifies the output file.

```bash
!pytximport -i ./data/fabry_disease/SRR16504309_wt/quant.sf -i ./data/fabry_disease/SRR16504310_wt/quant.sf -i ./data/fabry_disease/SRR16504311_ko/quant.sf -i ./data/fabry_disease/SRR16504312_ko/quant.sf -t salmon -m ./data/fabry_disease/transcript_gene_mapping_human.tsv -ow -o ./data/fabry_disease/counts_pytximport_no.csv
```

--------------------------------

### Create Mouse Transcript-to-Gene Map (Gene ID)

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Generates a transcript-to-gene map for mouse species, mapping transcript IDs to Ensembl gene IDs by default. This function may take a few seconds on the first run to download data.

```python
transcript_gene_map_mouse = create_transcript_gene_map(species="mouse")
transcript_gene_map_mouse.head(5)
```

--------------------------------

### Create Transcript-to-Gene Map

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Create a map between transcript IDs and transcript names using `create_transcript_gene_map`. This is useful for replacing transcript IDs with more readable names in your analysis.

```python
from pytximport.utils import replace_transcript_ids_with_names
```

```python
transcript_name_map_human = create_transcript_gene_map("human", target_field="external_transcript_name")
transcript_name_map_human.head(5)
```

--------------------------------

### Create Transcript-to-Gene Map from GTF Annotation

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Generates a transcript-to-gene map from a GTF annotation file, mapping transcript IDs to gene names. Using an annotation file is recommended for consistency with alignment references.

```python
from pytximport.utils import create_transcript_gene_map_from_annotation

transcript_gene_map_from_gtf = create_transcript_gene_map_from_annotation(
    "../../test/data/annotation.gtf",
    target_field="gene_name",
)
transcript_gene_map_from_gtf.head(5)
```

--------------------------------

### Create transcript-to-gene map with biotype information

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

This Python snippet demonstrates how to create a transcript-to-gene mapping from a GTF file, specifically extracting gene name and gene biotype. This is a prerequisite for filtering results by gene biotype.

```python
transcript_gene_map_from_gtf_with_biotype = create_transcript_gene_map_from_annotation(
    "../../test/data/annotation.gtf",
    target_field=["gene_name", "gene_biotype"],
)
transcript_gene_map_from_gtf_with_biotype.head(5)
```

--------------------------------

### Import and filter by gene biotype

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

This Python snippet shows how to import quantification data and then filter the results to include only 'protein_coding' genes. Use this when you need to focus your analysis on specific gene types.

```python
from pytximport.utils import filter_by_biotype

result_small = tximport(
    ["../../test/data/fabry_disease/SRR16504309_wt/"],
    "salmon",
    transcript_gene_map_from_gtf_with_biotype,
    counts_from_abundance="length_scaled_tpm",
)

result_small_filtered = filter_by_biotype(
    result_small,
    transcript_gene_map_from_gtf_with_biotype,
    biotype_filter=["protein_coding"],
    # Since the data is already at the gene level, we have to use the gene_id from the transcript_gene_map
    id_column="gene_id",
)

len(result_small.var_names), len(result_small_filtered.var_names)
```

--------------------------------

### Run DESeq2 analysis

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Execute the DESeq2 differential expression analysis. This function computes normalized counts and performs statistical tests.

```python
dds.deseq2()
```

--------------------------------

### Import RSEM data with tximport

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

Imports gene expression counts from RSEM files. It iterates through different `countsFromAbundance` options to generate count matrices, saving each to a CSV file. Ensure the RSEM results files and the `transcript_gene_mapping_human.tsv` are in the specified directories.

```R
%%R
dir <- "./data/rsem"
files_protein_coding <- c(
  file.path(dir, "test.genes.results.gz")
)
tx2gene <- read_tsv(file.path("./data/fabry_disease", "transcript_gene_mapping_human.tsv"))
countsFromAbundanceOptions <- c("no")
for (idx in seq_along(countsFromAbundanceOptions)) {
    txi <- tximport(
        files_protein_coding,
        type = "rsem",
        tx2gene = tx2gene,
        txIn = FALSE,
        countsFromAbundance = countsFromAbundanceOptions[idx],
        ignoreTxVersion = TRUE,
        ignoreAfterBar = TRUE
    )
    writePath <- file.path(dir, "counts_tximport.csv")
    if (!is.null(countsFromAbundanceOptions[idx])) {
        writePath <- gsub(".csv", paste0("_", countsFromAbundanceOptions[idx], ".csv"), writePath)
    }
    write.csv(txi$counts, writePath)
}
```

--------------------------------

### R tximport Output Summary

Source: https://github.com/complextissue/pytximport/blob/main/test/test_comparison.ipynb

This output shows the summary of data read by R's tximport, including column specifications and messages about transcripts missing from the tx2gene mapping.

```R
Output:
Rows: 244191 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): transcript_id, gene_id

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.


```

```R
Output:
R[write to console]: reading in files with read_tsv

R[write to console]: 1 
R[write to console]: 2 
R[write to console]: 3 
R[write to console]: 4 
R[write to console]: 

R[write to console]: transcripts missing from tx2gene: 31380

R[write to console]: summarizing abundance

R[write to console]: summarizing counts

R[write to console]: summarizing length

R[write to console]: reading in files with read_tsv

R[write to console]: 1 
R[write to console]: 2 
R[write to console]: 3 
R[write to console]: 4 
R[write to console]: 

R[write to console]: transcripts missing from tx2gene: 31380

R[write to console]: summarizing abundance

R[write to console]: summarizing counts

R[write to console]: summarizing length

R[write to console]: reading in files with read_tsv

R[write to console]: 1 
R[write to console]: 2 
R[write to console]: 3 
R[write to console]: 4 
R[write to console]: 

R[write to console]: transcripts missing from tx2gene: 31380

R[write to console]: summarizing abundance

R[write to console]: summarizing counts

R[write to console]: summarizing length

R[write to console]: 1 
R[write to console]: 2 
R[write to console]: 3 
R[write to console]: 4 
R[write to console]: 

R[write to console]: transcripts missing from tx2gene: 31498

R[write to console]: summarizing abundance

R[write to console]: summarizing counts

R[write to console]: summarizing length

R[write to console]: summarizing inferential replicates

R[write to console]: reading in files with read_tsv

R[write to console]: 1 
R[write to console]: 2 
R[write to console]: 3 
R[write to console]: 4 
R[write to console]: 

R[write to console]: transcripts missing from tx2gene: 31380

R[write to console]: summarizing abundance

R[write to console]: summarizing counts

R[write to console]: summarizing length

R[write to console]: summarizing inferential replicates

R[write to console]: 1 
R[write to console]: 2 
R[write to console]: 3 
R[write to console]: 4 
R[write to console]: 

R[write to console]: reading in files with read_tsv

R[write to console]: 1 
R[write to console]: 2 
R[write to console]: 3 
R[write to console]: 4 
R[write to console]: 


```

--------------------------------

### Create Human Transcript-to-Gene Map (Gene Name and Biotype)

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Generates a transcript-to-gene map for human species, mapping transcript IDs to both external gene names and gene biotypes. The resulting map is then filtered to include only 'lncRNA' biotypes.

```python
transcript_gene_map_human_biotype = create_transcript_gene_map(
    species="human",
    target_field=["external_gene_name", "gene_biotype"],
)
transcript_gene_map_human_biotype = transcript_gene_map_human_biotype[
    transcript_gene_map_human_biotype["gene_biotype"] == "lncRNA"
]
transcript_gene_map_human_biotype.head(5)
```

--------------------------------

### Create Human Transcript-to-Gene Map (Gene Name)

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Generates a transcript-to-gene map for human species, mapping transcript IDs to external gene names. This function may take a few seconds on the first run to download data.

```python
from pytximport.utils import create_transcript_gene_map

transcript_gene_map_human = create_transcript_gene_map(
    species="human",
    target_field="external_gene_name",
)
transcript_gene_map_human.head(5)
```

--------------------------------

### Perform statistical analysis and shrinkage

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Generate statistical results for differential expression, including log2 fold changes and adjusted p-values. LFC shrinkage is applied for more reliable estimates.

```python
stat_result = DeseqStats(dds, contrast=["condition", "Disease", "Control"], quiet=True)
stat_result.summary()
stat_result.lfc_shrink(coeff="condition[T.Disease]")
```

--------------------------------

### Infer pathway activities

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Utilize the Progeny network to infer pathway activities from the differential expression data. This provides insights into affected biological pathways.

```python
progeny = dc.op.progeny(organism="human", top=500)
pathway_acts, pathway_pvals = dc.mt.ulm(data=data, net=progeny, tmin=5)
dc.pl.barplot(
    data=pathway_acts,
    name="disease.vs.control",
    top=40,
    vertical=True,
    figsize=(5, 3),
)
```

--------------------------------

### Summarize Transcript Counts to Genes

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Summarize transcript counts to the gene level by providing a transcript-gene map to `tximport`. Set `output_type` to 'xarray' for gene-level counts.

```python
txi = tximport(
    ["../../test/data/salmon/quant.sf"],
    "salmon",
    transcript_gene_map_human,
    counts_from_abundance="scaled_tpm",
    output_type="xarray",
    return_transcript_data=False,
)
pd.DataFrame(txi["counts"], index=txi.coords["gene_id"], columns=txi.coords["file_path"]).sort_values(
    by=txi.coords["file_path"].data[0],
    ascending=False,
).head(5)
```

--------------------------------

### Visualize differential expression with volcano plot

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Create a volcano plot to visualize the results of the differential expression analysis. This plot helps identify significantly up- and down-regulated genes.

```python
dc.pl.volcano(
    stat_result.results_df,
    x="log2FoldChange",
    y="padj",
    thr_sign=0.01,
    top=20,
    figsize=(10, 5),
)
```

--------------------------------

### Infer transcription factor activities

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Use the Collectri network to infer transcription factor activities from the differential expression data. This helps understand regulatory mechanisms.

```python
collectri = dc.op.collectri(organism="human", remove_complexes=False)
data = stat_result.results_df[["stat"]].T.rename(index={"stat": "disease.vs.control"})
tf_acts, tf_pvals = dc.mt.ulm(data=data, net=collectri, tmin=5)
dc.pl.barplot(data=tf_acts, name="disease.vs.control", top=10, vertical=True, figsize=(5, 3))
```

--------------------------------

### Map Transcript Names to Gene Names

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Creates a map from transcript names to gene IDs for human species, using 'external_transcript_name' as the source field and 'external_gene_name' as the target field. This is useful when transcript identifiers are not standard Ensembl transcript IDs.

```python
transcript_name_gene_map_human = create_transcript_gene_map(
    "human",
    source_field="external_transcript_name",
    target_field="external_gene_name",
)
transcript_name_gene_map_human.head(5)
```

--------------------------------

### Replace Transcript IDs with Names

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Use `replace_transcript_ids_with_names` to substitute transcript IDs with their corresponding names from a provided map. This function modifies the transcript data in place.

```python
txi = replace_transcript_ids_with_names(txi, transcript_name_map_human)
pd.DataFrame(txi.X.T, index=txi.var.index, columns=txi.obs.index).sort_values(
    by=txi.obs.index[0],
    ascending=False,
).head(5)
```

--------------------------------

### Filter genes with low counts

Source: https://github.com/complextissue/pytximport/blob/main/docs/source/example.ipynb

Remove genes with very low expression counts across all samples to improve the accuracy of differential expression analysis. Genes with a maximum count of 10 or less are filtered out.

```python
result = result[:, result.X.max(axis=0) > 10].copy()
result
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.