### Install fastQpick via PyPI

Source: https://github.com/pachterlab/fastqpick/blob/main/README.md

Install the fastQpick library using pip from the Python Package Index.

```bash
pip install fastQpick
```

--------------------------------

### Install fastQpick from Source

Source: https://github.com/pachterlab/fastqpick/blob/main/README.md

Install the fastQpick library directly from its source code repository using pip.

```bash
pip install git+https://github.com/pachterlab/fastQpick.git
```

--------------------------------

### Setup Reference, Reads, and Index for Yeast Data

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/yeast_example.ipynb

Downloads and prepares the necessary reference genome, FASTQ files, and builds a kallisto index. Ensures paired-end reads are of equal length, a requirement for fastQpick.

```python
def sh(cmd):
    subprocess.run(cmd, shell=True, check=True)

# kallisto binary
if not os.path.exists(KALLISTO):
    sh(f"cd {WORK} && wget -q https://github.com/pachterlab/kallisto/releases/download/" 
       f
```

--------------------------------

### Import Libraries and Setup Paths for fastQpick Analysis

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/yeast_example.ipynb

Imports necessary Python libraries and defines working directories and file paths for the fastQpick analysis. Ensures output directories exist.

```python
import os
import glob
import json
import shutil
import subprocess

import numpy as np
import matplotlib.pyplot as plt
import pyfastx

from fastQpick import fastQpick

WORK = os.path.abspath("realdata")
FIG_DIR = os.path.abspath("figures")
os.makedirs(WORK, exist_ok=True)
os.makedirs(FIG_DIR, exist_ok=True)

KALLISTO = os.path.join(WORK, "kallisto", "kallisto")
IDX = os.path.join(WORK, "yeast.idx")
M1 = os.path.join(WORK, "data", "SRR453566_1.fastq.gz")
M2 = os.path.join(WORK, "data", "SRR453566_2.fastq.gz")
ACCESSION = "SRR453566"
```

--------------------------------

### fastQpick Python API Usage

Source: https://github.com/pachterlab/fastqpick/blob/main/README.md

Example of how to import and use the fastQpick function within a Python script.

```python
from fastQpick import fastQpick

fastQpick(
    input_files=['FASTQ_FILE1', 'FASTQ_FILE2', ...],
    ...
)
```

--------------------------------

### fastQpick Command-line Help

Source: https://github.com/pachterlab/fastqpick/blob/main/README.md

Command to display all available options and arguments for the fastQpick command-line interface.

```bash
fastQpick --help
```

--------------------------------

### fastQpick Command-line Usage

Source: https://github.com/pachterlab/fastqpick/blob/main/README.md

Basic command-line syntax for running fastQpick. Specify options and input FASTQ files.

```bash
fastQpick [OPTIONS] FASTQ_FILE1 FASTQ_FILE2 ...
```

--------------------------------

### Bootstrap Replicates with Replacement

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/yeast_example.ipynb

This snippet sets up parameters for bootstrapping, defines a function to quantify each replicate by resampling FASTQ files with replacement and running kallisto, and then collects the results into a count matrix. It reuses a length dictionary and quantifies reads only once per replicate.

```python
B = 200  # reduce for a quick run
length_dict = {M1: N_READS, M2: N_READS}
counts_dir = os.path.join(WORK, "counts_nb")
os.makedirs(counts_dir, exist_ok=True)

def quantify_replicate(seed):
    tmp = os.path.join(WORK, f"tmp_nb/seed{seed}")
    rep, kq = os.path.join(tmp, "rep"), os.path.join(tmp, "kq")
    fastQpick(input_files=[M1, M2], fraction=1.0, seeds=seed, output_dir=rep,
              file_group_size=2, replacement=True, overwrite=True, verbose=False,
              fastq_to_length_dict=length_dict)
    r1 = os.path.join(rep, "SRR453566_1.fastq")
    r2 = os.path.join(rep, "SRR453566_2.fastq")
    subprocess.run([KALLISTO, "quant", "-i", IDX, "-o", kq, "-t", "4", r1, r2],
                   check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    est = np.loadtxt(os.path.join(kq, "abundance.tsv"), skiprows=1, usecols=3)
    shutil.rmtree(tmp, ignore_errors=True)
    return est

R = np.vstack([quantify_replicate(b) for b in range(1, B + 1)])
print("bootstrap count matrix (replicates x transcripts):", R.shape)
```

--------------------------------

### fastQpick Python API Help

Source: https://github.com/pachterlab/fastqpick/blob/main/README.md

Python code to access the help documentation for the fastQpick function.

```python
help(fastQpick)
```

--------------------------------

### Generate Bootstrap Replicates with fastQpick

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/intro.ipynb

Generates bootstrap replicates by resampling FASTQ files with replacement. The `num_samples` argument draws all replicates in a single call, with each replicate written to its own seed-suffixed file. The `fastq_to_length_dict` is reused across replicates for efficiency.

```python
B = 300
length_dict = {input_fastq: N_READS}
rep_dir = os.path.join(WORKDIR, "replicates")
fastQpick(
    input_files=input_fastq,
    fraction=1.0,
    seed=1,
    num_samples=B,
    output_dir=rep_dir,
    without_replacement=False,
    overwrite=True,
    verbose=False,
    fastq_to_length_dict=length_dict,
)
# One seed-suffixed output file per replicate is written into a single directory.
rep_files = sorted(
    os.path.join(rep_dir, f) for f in os.listdir(rep_dir) if f.endswith(".fastq")
)
print("Generated", len(rep_files), "bootstrap replicates with replacement")
```

--------------------------------

### Import necessary libraries and set up directories

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/intro.ipynb

Imports essential Python libraries for data manipulation, file handling, and plotting. It also defines and creates working and figure directories, removing existing ones if they are present.

```python
import os
import shutil
from collections import Counter

import numpy as np
import pyfastx
import matplotlib.pyplot as plt

from fastQpick import fastQpick

rng = np.random.default_rng(0)

WORKDIR = os.path.abspath("_bootstrap_demo")
FIG_DIR = os.path.abspath("figures")
if os.path.exists(WORKDIR):
    shutil.rmtree(WORKDIR)
os.makedirs(WORKDIR, exist_ok=True)
os.makedirs(FIG_DIR, exist_ok=True)
```

--------------------------------

### Simulate RNA-seq experiment and write FASTQ file

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/intro.ipynb

Simulates RNA-seq reads by assigning them to transcripts based on predefined proportions. Each read is generated with a transcript-specific barcode and random padding. The simulated reads are written to a FASTQ file. This section also prints the true and observed proportions for verification.

```python
TRANSCRIPTS = ["TXA", "TXB", "TXC", "TXD"]
TRUE_PROPS = np.array([0.50, 0.30, 0.15, 0.05])
BARCODES = {"TXA": "AAAACCCC", "TXB": "GGGGTTTT", "TXC": "ACACACAC", "TXD": "TGTGTGTG"}
BARCODE_LEN = 8
N_READS = 50_000
READ_PAD = 42

assignments = rng.choice(len(TRANSCRIPTS), size=N_READS, p=TRUE_PROPS)
bases = np.array(list("ACGT"))

input_fastq = os.path.join(WORKDIR, "reads.fastq")
with open(input_fastq, "w") as fh:
    for i, t in enumerate(assignments):
        barcode = BARCODES[TRANSCRIPTS[t]]
        pad = "".join(rng.choice(bases, size=READ_PAD))
        seq = barcode + pad
        qual = "I" * len(seq)
        fh.write(f"@read{i}\n{seq}\n+\n{qual}\n")

observed_counts = Counter(TRANSCRIPTS[t] for t in assignments)
observed_props = np.array([observed_counts[t] / N_READS for t in TRANSCRIPTS])
print("Wrote", N_READS, "reads to", input_fastq)
for t, p_true, p_obs in zip(TRANSCRIPTS, TRUE_PROPS, observed_props):
    print(f"  {t}: true={p_true:.3f}  observed={p_obs:.4f}")
```

--------------------------------

### Baseline Quantification with Kallisto

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/yeast_example.ipynb

Quantifies the original library using kallisto to obtain transcript proportions and pseudoaligned read counts. This sets the basis for analyzing sampling error.

```python
base_dir = os.path.join(WORK, "kout_baseline")
sh(f"{KALLISTO} quant -i {IDX} -o {base_dir} -t 4 {M1} {M2}")

targets = np.loadtxt(os.path.join(base_dir, "abundance.tsv"), skiprows=1, usecols=0, dtype=str)
baseline = np.loadtxt(os.path.join(base_dir, "abundance.tsv"), skiprows=1, usecols=3)
info = json.load(open(os.path.join(base_dir, "run_info.json")))
print("transcripts:", len(targets), " pseudoaligned:", info["n_pseudoaligned"],
      f"({info['p_pseudoaligned']}%) ")
```

--------------------------------

### Analyze Bootstrap Detection Probability and Plot Results

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/yeast_example.ipynb

Calculates bootstrap detection probabilities, identifies persistent and fragile transcripts, and generates a three-panel figure comparing standardized deviations, bootstrap vs. analytic SE, and detection probability vs. point estimate. Saves the figure to a file.

```python
THETA = 5.0  # detection threshold (estimated counts)
detect_prob = (R >= THETA).mean(axis=0)              # bootstrap detection prob. pi_t
persist = detect_prob >= 0.95
fragile = (detect_prob > 0.05) & (detect_prob < 0.95)
print(f"fragile calls (0.05<pi<0.95)     : {int(fragile.sum())}")
print(f"  called at baseline, may vanish : {int((fragile & (baseline >= THETA)).sum())}")
print(f"  absent at baseline, may appear : {int((fragile & (baseline < THETA)).sum())}")

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16.5, 4.3))

# Panel A: standardized bootstrap deviations pooled across expressed transcripts.
z = ((P[:, expressed] - p_hat[expressed]) / analytic_se[expressed]).ravel()
ax1.hist(z, bins=80, density=True, color="#4C72B0", alpha=0.6,
         label=f"pooled $z$  (std={z.std(ddof=1):.2f})")
xs = np.linspace(-4, 4, 400)
ax1.plot(xs, np.exp(-0.5 * xs**2) / np.sqrt(2*np.pi), color="#C44E52", lw=2,
         label=r"$\mathcal{{N}}(0,1)$  ")
ax1.set_xlim(-4, 4)
ax1.set_xlabel(r"standardized deviation  $(\hat{{p}}^{(b)}-\hat{{p}})/SE_{\mathrm{analytic}}$")
ax1.set_ylabel("density")
ax1.set_title("A. Standardized bootstrap deviations vs. $\mathcal{{N}}(0,1)$  ")
ax1.legend(frameon=False, fontsize=9)

# Panel B: bootstrap SE vs analytic SE across all expressed transcripts.
ase, bse = analytic_se[expressed], boot_se[expressed]
lo, hi = min(ase.min(), bse.min())*0.8, max(ase.max(), bse.max())*1.2
ax2.plot([lo, hi], [lo, hi], color="#888888", ls="--", lw=1, label="y = x")
sc = ax2.scatter(ase, bse, c=np.log10(baseline[expressed]), cmap="viridis", s=8, alpha=0.6)
ax2.set_xscale("log"); ax2.set_yscale("log")
ax2.set_xlim(lo, hi); ax2.set_ylim(lo, hi)
ax2.set_xlabel(r"analytic SE  $\sqrt{\hat{{p}}(1-\hat{{p}})/N}$  ")
ax2.set_ylabel("bootstrap SE")
ax2.set_title("B. Bootstrap SE recovers analytic SE")
fig.colorbar(sc, ax=ax2).set_label(r"$\\log_{10}$ est. counts", fontsize=8)
ax2.legend(frameon=False, fontsize=9, loc="upper left")

# Panel C: bootstrap detection probability vs. point estimate for low-abundance
# transcripts. Detection is a discrete presence/absence event, so the Gaussian SE
# of panel B does not describe it; the bootstrap instead gives a per-transcript
# detection probability that flags which low-abundance calls persist and which
# are driven by a handful of reads and go away under resampling.
# Persist and fragile transcripts are plotted in full so the legend counts match
# the totals reported in the text; rarely-called transcripts are restricted to the
# low-count regime, and zero counts are floored to the left edge of the log axis.
XFLOOR = 0.3
cat = np.where(persist, 0, np.where(fragile, 1, 2))
lowab = baseline > XFLOOR
colors = ["#55A868", "#DD8452", "#C44E52"]           # persist / fragile / rarely called
labels = ["persists ($\pi\geq0.95$)", "fragile ($0.05<\pi<0.95$)",
          "rarely called ($\pi\leq0.05$)"]
for k in (0, 1, 2):                                  # persists, fragile, rarely called
    m = (cat == k) if k != 2 else ((cat == k) & lowab)
    xk = np.clip(baseline[m], XFLOOR, None)
    ax3.scatter(xk, detect_prob[m], s=10, alpha=0.6, color=colors[k],
                label=f"{labels[k]}  (n={int(m.sum())})")
ax3.set_xlim(left=XFLOOR * 0.9)
ax3.axvline(THETA, color="#888888", ls="--", lw=1)
ax3.text(THETA*1.1, 0.06, f"threshold $\theta={THETA:g}$", rotation=90,
         va="bottom", ha="left", fontsize=8, color="#888888")
ax3.set_xscale("log")
ax3.set_ylim(-0.03, 1.03)
ax3.set_xlabel("point-estimate count $\hat{{c}}_t$ (baseline)")
ax3.set_ylabel(r"bootstrap detection probability $\pi_t$")
ax3.set_title("C. Detection robustness of low-abundance calls")
ax3.legend(frameon=False, fontsize=8, loc="center right")

fig.tight_layout()
out = os.path.join(FIG_DIR, "bootstrap_realdata.png")
fig.savefig(out, dpi=200)
print("Saved", out)
plt.show()
```

--------------------------------

### Visualize Bootstrap Distributions and SE Comparison

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/intro.ipynb

Generates a two-panel figure. Panel A shows the distribution of bootstrap estimates for transcript proportions, along with true values and 95% confidence intervals. Panel B directly compares the bootstrap standard error against the analytic standard error. This visualization helps to assess the accuracy and reliability of the bootstrap method.

```python
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4.2))

# Panel A: bootstrap distributions with true values and 95% CIs
positions = np.arange(len(TRANSCRIPTS))
parts = ax1.violinplot([boot_props[:, i] for i in range(len(TRANSCRIPTS))],
                       positions=positions, showextrema=False)
for pc in parts['bodies']:
    pc.set_facecolor("#4C72B0")
    pc.set_alpha(0.6)
ax1.vlines(positions, ci_lo, ci_hi, color="#333333", lw=2, label="95% bootstrap CI")
ax1.scatter(positions, TRUE_PROPS, color="#C44E52", zorder=3, label="true proportion")
ax1.set_xticks(positions)
ax1.set_xticklabels(TRANSCRIPTS)
ax1.set_ylabel("estimated proportion")
ax1.set_title("A. Bootstrap distribution of abundance estimates")
ax1.legend(frameon=False, fontsize=9)

# Panel B: bootstrap SE vs analytic SE
lim = max(boot_se.max(), analytic_se.max()) * 1.15
ax2.plot([0, lim], [0, lim], color="#888888", ls="--", lw=1, label="y = x")
ax2.scatter(analytic_se, boot_se, color="#4C72B0", zorder=3)
for i, t in enumerate(TRANSCRIPTS):
    ax2.annotate(t, (analytic_se[i], boot_se[i]), textcoords="offset points", xytext=(6, -2), fontsize=9)
ax2.set_xlim(0, lim)
ax2.set_ylim(0, lim)
ax2.set_xlabel(r"analytic SE  $\\sqrt{p(1-p)/N}$")
ax2.set_ylabel("bootstrap SE")
ax2.set_title("B. Bootstrap SE recovers analytic SE")
ax2.legend(frameon=False, fontsize=9)

fig.tight_layout()
out_path = os.path.join(FIG_DIR, "bootstrap_demo.png")
fig.savefig(out_path, dpi=200)
print("Saved figure to", out_path)
plt.show()
```

--------------------------------

### Checkout Manuscript Branch

Source: https://github.com/pachterlab/fastqpick/blob/main/README.md

Command to switch to the 'manuscript' git branch, likely for reproducing figures from the paper.

```bash
git checkout manuscript
```

--------------------------------

### Re-quantify Bootstrap Replicates

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/intro.ipynb

Quantifies each bootstrap replicate by recovering transcript barcodes and computing the proportion of reads assigned to each transcript. This function is used to generate a bootstrap distribution of abundance estimates for every transcript. Ensure `BARCODES`, `BARCODE_LEN`, and `TRANSCRIPTS` are defined.

```python
barcode_to_tx = {bc: tx for tx, bc in BARCODES.items()}

def quantify(fastq_path):
    counts = Counter()
    total = 0
    for _, seq, _ in pyfastx.Fastx(fastq_path):
        counts[barcode_to_tx[seq[:BARCODE_LEN]]] += 1
        total += 1
    return np.array([counts[t] / total for t in TRANSCRIPTS])

boot_props = np.vstack([quantify(f) for f in rep_files])
print("Bootstrap proportion matrix shape (replicates x transcripts):", boot_props.shape)
```

--------------------------------

### Calculate and Print Bootstrap and Analytic SE and CI

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/intro.ipynb

Calculates bootstrap standard errors, percentile confidence intervals, and analytic standard errors for transcript proportions. It then prints a formatted table comparing these values and reports the maximum relative difference between the bootstrap and analytic standard errors. This is useful for validating the bootstrap method against theoretical expectations.

```python
boot_se = boot_props.std(axis=0, ddof=1)
ci_lo = np.percentile(boot_props, 2.5, axis=0)
ci_hi = np.percentile(boot_props, 97.5, axis=0)
analytic_se = np.sqrt(observed_props * (1 - observed_props) / N_READS)

print(f"{ 'tx':>4} {'observed':>9} {'boot_SE':>9} {'analytic_SE':>12} {'95% CI':>22}")
for i, t in enumerate(TRANSCRIPTS):
    ci = f"[{ci_lo[i]:.4f}, {ci_hi[i]:.4f}]"
    print(f"{t:>4} {observed_props[i]:>9.4f} {boot_se[i]:>9.5f} {analytic_se[i]:>12.5f} {ci:>22}")

rel_err = np.abs(boot_se - analytic_se) / analytic_se
print(f"\nMax relative difference between bootstrap and analytic SE: {rel_err.max():.1%}")
```

--------------------------------

### Calculate Bootstrap and Analytic Standard Errors

Source: https://github.com/pachterlab/fastqpick/blob/main/notebooks/yeast_example.ipynb

Calculates and compares bootstrap and analytic standard errors for transcript proportions. Filters for expressed transcripts (>= 100 counts) and prints summary statistics of the comparison.

```python
P = R / R.sum(axis=1, keepdims=True)
N = baseline.sum()
p_hat = baseline / N
boot_se = P.std(axis=0, ddof=1)
analytic_se = np.sqrt(p_hat * (1 - p_hat) / N)

expressed = baseline >= 100
rel = np.abs(boot_se[expressed] - analytic_se[expressed]) / analytic_se[expressed]
ratio = (boot_se[expressed] / analytic_se[expressed]).mean()
print(f"expressed transcripts            : {expressed.sum()}")
print(f"median |boot-analytic|/analytic  : {np.median(rel):.1%}")
print(f"mean boot/analytic ratio         : {ratio:.4f}")
```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.