### Install DIAMOND on FreeBSD

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Install DIAMOND on FreeBSD systems using the pkg package manager.

```bash
pkg install diamond
```

--------------------------------

### Install DIAMOND to Home Directory

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Compiles and installs DIAMOND to the 'bin/' folder within your home directory, avoiding the need for sudo rights.

```bash
cmake -DCMAKE_INSTALL_PREFIX=$HOME ..
make -j $(nproc --all)
make install
```

--------------------------------

### Example Realign Output

Source: https://context7.com/bbuchfink/diamond/llms.txt

This example demonstrates the output format for the `realign` command, detailing the alignment statistics between cluster centroids and members.

```text
# Example realign output:
# cseqid    mseqid    approx_pident    cstart    cend    mstart    mend    evalue    bitscore
# d4i0va_   d1dlya_   47.1             2         122     1         117     9.61e-31  105
```

--------------------------------

### Print DIAMOND version

Source: https://context7.com/bbuchfink/diamond/llms.txt

Display the installed version of DIAMOND.

```bash
diamond version
```

--------------------------------

### Install DIAMOND via Homebrew

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Install DIAMOND on macOS using the Homebrew package manager.

```bash
brew install diamond
```

--------------------------------

### DIAMOND Clustering Command Examples

Source: https://github.com/bbuchfink/diamond/wiki/Clustering

Examples of using diamond linclust, cluster, and deepclust for different clustering sensitivities and configurations. Use these to perform fast, sensitive, or deep clustering of protein sequences.

```bash
# fast clustering with linear scaling
diamond linclust -d INPUT_FILE -o OUTPUT_FILE --approx-id 30 -M 64G
```

```bash
# sensitive clustering using all-vs-all alignment
diamond cluster -d INPUT_FILE -o OUTPUT_FILE --approx-id 30 -M 64G
```

```bash
# deep clustering (no identity cutoff)
diamond deepclust -d INPUT_FILE -o OUTPUT_FILE -M 64G
```

--------------------------------

### Install DIAMOND from Source (Debian-based)

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Installs necessary dependencies and compiles DIAMOND from source. Ensure you are in the extracted diamond-2.1.24 directory before running these commands.

```bash
sudo apt install g++ automake cmake zlib1g-dev libzstd-dev libsqlite3-dev
wget http://github.com/bbuchfink/diamond/archive/v2.1.24.tar.gz
tar xzf v2.1.24.tar.gz
cd diamond-2.1.24
mkdir bin
cd bin
cmake ..
make -j $(nproc --all)
sudo make install
```

--------------------------------

### Frameshift Alignment Example

Source: https://context7.com/bbuchfink/diamond/llms.txt

This example illustrates the output format for frameshift alignments, indicating insertions or deletions that shift the reading frame. Note the use of '/' for +1 and '\' for -1 frameshifts.

```text
# Example frameshift alignment output (pairwise format):
# Query= read_001
# Length=1500
#
# >protein_xyz
# Score = 245 bits (626),  Expect = 1e-65
# Frame = +1
#
# Query  1    MKFLILLFNILCLFPVLAADRVRVVTGAVIGTAVS/K  108
#             MKFLILLFNILCLFPVLAADRVRVVTGAVIG AVS K
# Sbjct  1    MKFLILLFNILCLFPVLAADRVRVVTGAVIGLAVSVK  37
# (Note: '/' indicates +1 frameshift, '\' indicates -1 frameshift)
```

--------------------------------

### Get database information

Source: https://context7.com/bbuchfink/diamond/llms.txt

Display information about a DIAMOND database using the 'dbinfo' command.

```bash
diamond dbinfo -d reference.dmnd
```

--------------------------------

### Download SCOPe Database

Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial

Use wget to download the SCOPe database in FASTA format. Ensure you have wget installed.

```bash
wget https://scop.berkeley.edu/downloads/scopeseq-2.07/astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa
```

--------------------------------

### Install DIAMOND via Bioconda

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Install DIAMOND using the Bioconda channel. It is crucial to include '-c conda-forge' and specify the version to avoid installing outdated versions.

```bash
conda install -c bioconda -c conda-forge diamond=2.1.24
```

--------------------------------

### Example Cluster Output

Source: https://context7.com/bbuchfink/diamond/llms.txt

This is an example of the output format for protein clustering, showing the centroid (representative) and its associated member sequences.

```text
# Example cluster output:
# centroid    member
# d1dlwa_     d1dlwa_
# d2gkma_     d2gkma_
# d4i0va_     d1dlya_
# d4i0va_     d1s69a_
# d4i0va_     d4i0va_
```

--------------------------------

### Run DIAMOND Fast Clustering (linclust)

Source: https://github.com/bbuchfink/diamond/wiki/Home

Perform fast clustering with linear scaling using DIAMOND's linclust command. This example uses a 30% identity threshold and allocates 64GB of memory.

```bash
# running fast clustering with linear scaling (30% identity threshold)
diamond linclust -d reference.fasta -o clusters.tsv --approx-id 30 -M 64G
```

--------------------------------

### Run DIAMOND Sensitive Clustering (cluster)

Source: https://github.com/bbuchfink/diamond/wiki/Home

Perform sensitive clustering using all-vs-all alignment with DIAMOND's cluster command. This example uses a 30% identity threshold and allocates 64GB of memory.

```bash
# running sensitive clustering using all-vs-all alignment (30% identity threshold)
diamond cluster -d reference.fasta -o clusters.tsv --approx-id 30 -M 64G
```

--------------------------------

### Compile DIAMOND with Zstd Support

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Enables Zstd compression support during compilation. Requires the zstd development library to be installed on the system.

```bash
cmake -DWITH_ZSTD=ON ..
make -j $(nproc --all)
make install
```

--------------------------------

### SLURM Batch Script for Parallel Diamond

Source: https://github.com/bbuchfink/diamond/wiki/6.-Distributed-computing

Example SLURM batch script for launching a massively parallel Diamond run on a supercomputer. Adjust SLURM parameters based on your specific machine and job requirements. `FLAGS` should be replaced with the appropriate parallel flags for Diamond.

```bash
#!/bin/bash -l
#SBATCH -D ./ 
#SBATCH -J DIAMOND
#SBATCH --mem=185000
#SBATCH --nodes=520
#SBATCH --ntasks-per-node=1
#SBATCH --ntasks-per-core=2
#SBATCH --cpus-per-task=80
#SBATCH --mail-type=none
#SBATCH --time=24:00:00

module purge
module load gcc impi
export SLURM_HINT=multithread

srun diamond FLAGS
```

--------------------------------

### Example TSV Output Lines

Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial

These lines represent sample output from the DIAMOND search, showing pairwise alignments in TSV format. Each line details query, target, identity, alignment length, E-value, and bit score.

```plaintext
d1dlwa_ d1dlwa_ 100     116     0       0       1       116     1       116     6.42e-77        220
d1dlwa_ d2gkma_ 35.4    113     73      0       1       113     13      125     1.43e-21        80.9
d1dlwa_ d4i0va_ 31.9    119     75      2       1       113     2       120     9.11e-13        58.2
d2gkma_ d2gkma_ 100     127     0       0       1       127     1       127     1.51e-87        248
d2gkma_ d1dlwa_ 34.8    115     75      0       13      127     1       115     6.90e-23        84.3
d2gkma_ d4i0va_ 33.6    110     69      1       13      118     2       111     1.35e-18        73.6
d2gkma_ d6bmea_ 35.5    110     67      1       13      118     2       111     1.32e-16        68.6
d2gkma_ d2bkma_ 37.3    67      38      2       13      76      5       70      5.18e-06        40.8
d1ngka_ d1ngka_ 100     126     0       0       1       126     1       126     4.34e-91        257
d1ngka_ d2bkma_ 38.4    125     73      2       1       125     4       124     1.42e-24        89.0
```

--------------------------------

### SLURM batch script for parallel execution

Source: https://context7.com/bbuchfink/diamond/llms.txt

Example SLURM batch script to launch parallel DIAMOND jobs across multiple nodes. Ensure --parallel-tmpdir is a shared file system path.

```bash
#!/bin/bash -l
#SBATCH -J DIAMOND
#SBATCH --nodes=100
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=40
#SBATCH --mem=180G
#SBATCH --time=24:00:00

srun diamond blastp --db database.dmnd --query queries.fasta \
  -o results \
  --multiprocessing \
  --tmpdir /local/tmp \
  --parallel-tmpdir /shared/diamond_work
```

--------------------------------

### Print general help

Source: https://context7.com/bbuchfink/diamond/llms.txt

Show general help information for DIAMOND.

```bash
diamond help
```

--------------------------------

### Create DIAMOND Binary Database

Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial

Set up a binary DIAMOND database file from a FASTA file. This prepares the database for subsequent searches.

```bash
diamond makedb --in astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa -d astral40
```

--------------------------------

### Create index for small database optimization

Source: https://context7.com/bbuchfink/diamond/llms.txt

For small databases, create an index first using 'diamond makeidx'. This is a prerequisite for using the --target-indexed option.

```bash
diamond makeidx -d small_db.dmnd --sensitive
```

--------------------------------

### Create DIAMOND Database

Source: https://github.com/bbuchfink/diamond/wiki/Home

Create a DIAMOND-formatted database from a reference FASTA file.

```bash
# creating a diamond-formatted database file
./diamond makedb --in reference.fasta -d reference
```

--------------------------------

### Compile GCC

Source: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics

Steps to download, configure, and compile GCC from source. Ensure you have sufficient resources for compilation.

```bash
cd
wget ftp://ftp.gnu.org/gnu/gcc/gcc-10.2.0/gcc-10.2.0.tar.gz
tar xzf gcc-10.2.0.tar.gz
cd gcc-10.2.0
./contrib/download_prerequisites
cd ..
mkdir objdir
cd objdir
$PWD/../gcc-10.2.0/configure --prefix=$HOME/GCC-10.2.0 --disable-multilib --disable-bootstrap --enable-languages=c++
make -j $(nproc --all)
make install
```

--------------------------------

### Print blastp command help

Source: https://context7.com/bbuchfink/diamond/llms.txt

Display detailed help information specifically for the 'blastp' command.

```bash
diamond blastp --help
```

--------------------------------

### Initialize parallel run

Source: https://context7.com/bbuchfink/diamond/llms.txt

Step 1 of distributed computing: Initialize the parallel run on a login node. Requires --multiprocessing and --mp-init.

```bash
diamond blastp --db database.dmnd --query queries.fasta \
  --multiprocessing --mp-init \
  --tmpdir /local/tmp \
  --parallel-tmpdir /shared/diamond_work \
  -b 2.0
```

--------------------------------

### Create DIAMOND Database

Source: https://context7.com/bbuchfink/diamond/llms.txt

Use `diamond makedb` to create a binary database from FASTA files. Supports optional taxonomy mapping for classification. Input can be gzipped or piped via stdin.

```bash
diamond makedb --in proteins.fasta -d proteins
```

```bash
# Database creation with taxonomy support
# First download NCBI taxonomy files:
wget http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz
wget http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip
unzip taxdmp.zip

diamond makedb --in proteins.fasta -d proteins \
  --taxonmap prot.accession2taxid.FULL.gz \
  --taxonnodes nodes.dmp \
  --taxonnames names.dmp
```

```bash
diamond makedb --in proteins.fasta.gz -d proteins
```

```bash
cat proteins.fasta | diamond makedb -d proteins
```

--------------------------------

### Download FASTA file

Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial

Use wget to download the Astral-Scopedom FASTA file for clustering. This is the input for the DIAMOND clustering command.

```bash
wget https://scop.berkeley.edu/downloads/scopeseq-2.07/astral-scopedom-seqres-gd-sel-gs-bib-95-2.07.fa
```

--------------------------------

### Pull Specific Docker Container Version

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Specify a version number when pulling the Docker container to get a particular release of DIAMOND.

```bash
docker pull buchfink/diamond:version2.1.24
```

--------------------------------

### Algorithm selection for small query files

Source: https://context7.com/bbuchfink/diamond/llms.txt

Choose the algorithm with --algo ctg for processing small query files.

```bash
diamond blastp -d reference -q small_queries.fasta -o results.tsv --algo ctg
```

--------------------------------

### Get Alignments for Cluster Members

Source: https://context7.com/bbuchfink/diamond/llms.txt

Retrieve detailed alignments between cluster members and their representatives. This requires specifying the database, cluster file, output file, memory, and optionally a header.

```bash
diamond realign -d proteins.fasta --clusters clusters.tsv \
  -o alignments.tsv -M 64G --header
```

--------------------------------

### Database Creation with makedb

Source: https://context7.com/bbuchfink/diamond/llms.txt

The `makedb` command creates a DIAMOND-formatted binary database from FASTA input files. This database is required for alignment workflows and can optionally include taxonomy mapping.

```APIDOC
## Database Creation with makedb

### Description
The `makedb` command creates a DIAMOND-formatted binary database from FASTA input files. This database format enables efficient indexed searching and is required before running alignment workflows. The command supports optional taxonomy mapping for taxonomic classification features.

### Method
`diamond makedb`

### Endpoint
N/A (Command-line tool)

### Parameters
#### Path Parameters
None

#### Query Parameters
None

#### Request Body
None

### Request Example
```bash
# Basic database creation from FASTA file
diamond makedb --in proteins.fasta -d proteins

# Database creation with taxonomy support
# First download NCBI taxonomy files:
wget http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz
wget http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip
unzip taxdmp.zip

diamond makedb --in proteins.fasta -d proteins \
  --taxonmap prot.accession2taxid.FULL.gz \
  --taxonnodes nodes.dmp \
  --taxonnames names.dmp

# Database creation from gzip-compressed input (auto-detected)
diamond makedb --in proteins.fasta.gz -d proteins

# Database creation from stdin
cat proteins.fasta | diamond makedb -d proteins
```

### Response
#### Success Response (Output)
- **Database sequences** (integer) - Number of sequences in the database
- **Database letters** (integer) - Total number of letters (bases/amino acids) in the database
- **Database hash** (string) - Hash of the database content
- **Total time** (string) - Time taken for database creation

#### Response Example
```
Database sequences  14323
Database letters    2847635
Database hash       a1b2c3d4e5f6
Total time          0.532s
```
```

--------------------------------

### Create Small Database Index

Source: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics

Optimize small databases (<10 MB) by creating a specific seed index file. The sensitivity setting used here must match the setting for subsequent alignment runs.

```bash
diamond makeidx -d <database file>
```

--------------------------------

### DIAMOND Iterative Search

Source: https://context7.com/bbuchfink/diamond/llms.txt

Perform an iterative search that automatically adjusts sensitivity across multiple rounds, starting with high sensitivity and potentially increasing it. This is useful for finding the single best hit across varying similarity levels.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv \
  --iterate --ultra-sensitive
```

--------------------------------

### Download and Extract Linux Binary

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Use wget to download the precompiled Linux binary and tar to extract it. This binary includes multiple code paths for different CPU instruction set supports.

```bash
wget http://github.com/bbuchfink/diamond/releases/download/v2.1.24/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz
```

--------------------------------

### Build Database from Multiple FASTA Files

Source: https://github.com/bbuchfink/diamond/wiki/4.-Support-&-FAQ

Use this command to build a DIAMOND database from multiple gzipped FASTA files in the current directory. It utilizes a pipe to stream the concatenated files into the `diamond makedb` command.

```bash
zcat *.fasta.gz | diamond makedb -d diamond_db
```

--------------------------------

### Use indexed search for small databases

Source: https://context7.com/bbuchfink/diamond/llms.txt

After creating an index, use the --target-indexed option for efficient searching against small databases.

```bash
diamond blastp -d small_db -q queries.fasta -o results.tsv \
  --target-indexed --sensitive
```

--------------------------------

### Download and Extract DIAMOND

Source: https://github.com/bbuchfink/diamond/wiki/Home

Use these commands to download the DIAMOND executable for Linux 64-bit and extract it.

```bash
# downloading the tool
wget http://github.com/bbuchfink/diamond/releases/download/v2.1.24/diamond-linux64.tar.gz
tar xzf diamond-linux64.tar.gz
```

--------------------------------

### Initialize Parallel Diamond Run

Source: https://github.com/bbuchfink/diamond/wiki/6.-Distributed-computing

Use this command to initialize a parallel run by scanning query and database, and writing work chunks to a file-based stack. Requires `--multiprocessing` and `--mp-init` flags.

```bash
diamond blastp --db DATABASE.dmnd --query QUERY.fasta --multiprocessing --mp-init --tmpdir $TMPDIR --parallel-tmpdir $PTMPDIR
```

--------------------------------

### Compile Diamond with Custom GCC

Source: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics

Instructions to clone the Diamond repository, set up environment variables for the custom GCC, and compile Diamond. This ensures Diamond is built with the specified GCC version.

```bash
cd
git clone https://github.com/bbuchfink/diamond.git
cd diamond
mkdir bin
cd bin
export CC=$HOME/GCC-10.2.0/bin/gcc
export CXX=$HOME/GCC-10.2.0/bin/g++
cmake -DSTATIC_LIBGCC=ON -DSTATIC_LIBSTDC++=ON ..
make -j $(nproc --all)
```

--------------------------------

### Create a Portable DIAMOND Binary

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Compiles a portable binary that includes multiple code paths for different CPU instruction set support (AVX2, SSE4.1, SSE2).

```bash
cmake ..
make -j $(nproc --all)
make install
```

--------------------------------

### Use local temporary directory for I/O performance

Source: https://context7.com/bbuchfink/diamond/llms.txt

Improve I/O performance by specifying a local temporary directory with --tmpdir.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv \
  --tmpdir /local/tmp
```

--------------------------------

### Initialize Multi-Node Blastp Alignment

Source: https://github.com/bbuchfink/diamond/wiki/How-to-cluster-huge-datasets

Initialize a multi-node `blastp` alignment run on a head node using `--mp-init`. The `--parallel-tmpdir` must be accessible by all compute nodes. This prepares for subsequent alignment runs on worker nodes.

```bash
diamond blastp -q reps.faa -d reps -o out -f 6 qseqid sseqid corrected_bitscore --approx-id 30 --query-cover 90 -k1000 -c1 --fast --multiprocessing --mp-init --parallel-tmpdir $PTMP
```

--------------------------------

### Run built-in tests

Source: https://context7.com/bbuchfink/diamond/llms.txt

Execute the built-in test suite for DIAMOND using the 'test' command.

```bash
diamond test
```

--------------------------------

### Create a Statically Linked DIAMOND Binary

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Compiles a more easily portable binary by statically linking the GCC and C++ standard libraries.

```bash
cmake -DSTATIC_LIBGCC=ON -DSTATIC_LIBSTDC++=ON ..
make -j $(nproc --all)
make install
```

--------------------------------

### Run DIAMOND Search (blastx)

Source: https://github.com/bbuchfink/diamond/wiki/Home

Perform a sequence search in blastx mode against a DIAMOND database.

```bash
# running a search in blastx mode
./diamond blastx -d reference -q reads.fasta -o matches.tsv
```

--------------------------------

### Compile DIAMOND from Source

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Commands to compile DIAMOND from source. Requires GCC 4.8.1+, CMake 2.6+, and development headers for libpthread, sqlite3, and zlib.

```bash
cmake -DCMAKE_BUILD_MARCH=native ..
make
```

--------------------------------

### Iterative Search with Custom Sensitivity Steps

Source: https://context7.com/bbuchfink/diamond/llms.txt

Use the --iterate option to perform searches with progressively increasing sensitivity. This is useful for balancing speed and sensitivity.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv \
  --iterate "fast default sensitive very-sensitive"
```

--------------------------------

### Run DIAMOND Search (blastp)

Source: https://github.com/bbuchfink/diamond/wiki/Home

Perform a sequence search in blastp mode against a DIAMOND database.

```bash
# running a search in blastp mode
./diamond blastp -d reference -q queries.fasta -o matches.tsv
```

--------------------------------

### View DAA with custom format

Source: https://context7.com/bbuchfink/diamond/llms.txt

Specify custom output fields when viewing a DAA file using --outfmt 6 and listing desired fields.

```bash
diamond view -a results.daa --outfmt 6 qseqid sseqid pident evalue
```

--------------------------------

### Download and Use BLAST Database with DIAMOND

Source: https://github.com/bbuchfink/diamond/wiki/Home

Download and decompress a BLAST database (e.g., SwissProt) for use with DIAMOND version 2.1.14 or later. Then, perform a blastp search against it.

```bash
# downloading and using a BLAST database (use DIAMOND >= v2.1.14)
update_blastdb.pl --decompress --blastdb_version 5 swissprot
./diamond blastp -d swissprot -q queries.fasta -o matches.tsv
```

--------------------------------

### Verbose logging

Source: https://context7.com/bbuchfink/diamond/llms.txt

Enable verbose logging output with the --log option.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv --log
```

--------------------------------

### Compile DIAMOND with Native Optimizations

Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation

Performs a native compile of DIAMOND, optimizing for the specific architecture of the build machine.

```bash
cmake -DCMAKE_BUILD_MARCH=native ..
make -j $(nproc --all)
make install
```

--------------------------------

### Extract sequences from database

Source: https://context7.com/bbuchfink/diamond/llms.txt

Retrieve all sequences from a DIAMOND database in FASTA format using the 'getseq' command.

```bash
diamond getseq -d reference.dmnd > sequences.fasta
```

--------------------------------

### Inspect realignment output

Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial

Use the `head` command to view the first few lines of the `aln.tsv` file, which contains detailed alignment statistics between cluster representatives and their members.

```bash
head aln.tsv
```

--------------------------------

### Full Tabular Output with Taxonomy

Source: https://context7.com/bbuchfink/diamond/llms.txt

Generate comprehensive tabular output including taxonomic information. Requires a taxonomy-enabled database. Specify fields like 'staxids', 'sscinames', and 'sphylums'.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv \
  --outfmt 6 qseqid sseqid pident evalue staxids sscinames sphylums
```

--------------------------------

### Inspect cluster output

Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial

Use the `head` command to view the first few lines of the `clusters.tsv` file, which shows cluster representatives and their member sequences.

```bash
head clusters.tsv
```

--------------------------------

### Use Small Database Index for Alignment

Source: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics

Align sequences using a pre-generated small database index. Ensure the `--target-indexed` option is used and the sensitivity setting matches the index creation.

```bash
diamond --target-indexed ...
```

--------------------------------

### Configure AVX2 Compile Options (GCC/Clang)

Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt

Sets compile options for AVX2 using GCC or Clang, enabling AVX and related instruction sets.

```cmake
target_compile_options(arch_avx2 PUBLIC -DDISPATCH_ARCH=ARCH_AVX2 -DARCH_ID=2 -mssse3 -mpopcnt -msse4.1 -msse4.2 -mavx -mavx2)
```

--------------------------------

### Compressed Output

Source: https://context7.com/bbuchfink/diamond/llms.txt

Enable compression for output files using the --compress 1 option. This reduces disk space usage for large result files.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv.gz --compress 1
```

--------------------------------

### Execute Parallel Diamond Run

Source: https://github.com/bbuchfink/diamond/wiki/6.-Distributed-computing

Run Diamond in parallel after initialization. Ensure `--parallel-tmpdir` points to the same location as used during initialization. This command distributes work across multiple compute nodes.

```bash
diamond blastp --db DATABASE.dmnd --query QUERY.fasta -o OUTPUT_FILE --multiprocessing --tmpdir $TMPDIR --parallel-tmpdir $PTMPDIR
```

--------------------------------

### Run parallel workers

Source: https://context7.com/bbuchfink/diamond/llms.txt

Step 2 of distributed computing: Launch parallel workers on compute nodes. Requires --multiprocessing.

```bash
diamond blastp --db database.dmnd --query queries.fasta \
  -o results.tsv \
  --multiprocessing \
  --tmpdir /local/tmp \
  --parallel-tmpdir /shared/diamond_work
```

--------------------------------

### Configure AVX512 Compile Options (GCC/Clang)

Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt

Sets compile options for AVX512 using GCC or Clang, including AVX512F and AVX512BW.

```cmake
target_compile_options(arch_avx512 PUBLIC -DDISPATCH_ARCH=ARCH_AVX512 -DARCH_ID=3 -mssse3 -mpopcnt -msse4.1 -msse4.2 -mavx -mavx2 -mavx512f -mavx512bw)
```

--------------------------------

### Configure SSE4.1 Compile Options (MSVC)

Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt

Sets compile options for SSE4.1 on MSVC, defining dispatch architecture and ID.

```cmake
target_compile_options(arch_sse4_1 PUBLIC -DDISPATCH_ARCH=ARCH_SSE4_1 -DARCH_ID=1 -D__SSSE3__ -D__SSE4_1__ -D__POPCNT__ /Zc:__cplusplus)
```

--------------------------------

### DAA Format Output

Source: https://context7.com/bbuchfink/diamond/llms.txt

Produce DIAMOND's proprietary alignment format (DAA) using --outfmt 100. This format is optimized for later conversion or analysis with tools like MEGAN.

```bash
diamond blastp -d reference -q queries.fasta -o results.daa --outfmt 100
```

--------------------------------

### Configure SSE4.1 Compile Options (GCC/Clang)

Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt

Sets compile options for SSE4.1 using GCC or Clang, specifying instruction set extensions.

```cmake
target_compile_options(arch_sse4_1 PUBLIC -DDISPATCH_ARCH=ARCH_SSE4_1 -DARCH_ID=1 -mssse3 -mpopcnt -msse4.1)
```

--------------------------------

### Combine Alignment Outputs and Perform Clustering

Source: https://github.com/bbuchfink/diamond/wiki/How-to-cluster-huge-datasets

Combine the `blastp` alignment output files into a single TSV and index the representative FASTA file. Then, use `greedy-vertex-cover` to perform clustering based on the alignment results. The `--edge-format triplet` assumes the output file contains source, target, and score.

```bash
cat out_* > out.tsv
```

```bash
samtools faidx reps.faa
```

```bash
diamond greedy-vertex-cover --edges out.tsv -d reps.faa.fai --edge-format triplet -o clusters_round_2.tsv --connected-component-depth 0
```

--------------------------------

### Convert DIAMOND Output to DuckDB Database via Pipe

Source: https://github.com/bbuchfink/diamond/wiki/File-formats

Pipe DIAMOND's output directly into DuckDB to create a table named 'alignments' in the specified database. This is an efficient way to load data without intermediate files. Adjust memory and thread settings as needed.

```bash
diamond PARAMETERS | duckdb DATABASE_NAME -c "SET memory_limit='16GB'; SET threads=16; create table alignments as select * from read_csv_auto('/dev/stdin', delim='\t', header=true, parallel=true)"
```

--------------------------------

### Custom Cascaded Clustering Steps

Source: https://context7.com/bbuchfink/diamond/llms.txt

Define a custom sequence of clustering steps, combining different algorithms like 'faster_lin', 'fast', and 'sensitive'. This allows for fine-tuning the clustering process.

```bash
diamond cluster -d proteins.fasta -o clusters.tsv \
  --approx-id 30 --cluster-steps "faster_lin fast default sensitive" -M 64G
```

--------------------------------

### Inspect Search Output

Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial

View the beginning of the output file to inspect the pairwise alignments. The output is in TSV format.

```bash
head out.tsv
```

--------------------------------

### Add galaxy_7 blastx Test

Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt

Sets up a blastx test named 'galaxy_7' with specific parameters for database, query, output format, and various BLAST options. This test is configured for detailed output and specific alignment parameters.

```cmake
add_test(NAME galaxy_7 COMMAND ${CMAKE_COMMAND} -DNAME=galaxy_7 "-DARGS=blastx --threads 2 --db ${TD}/galaxy/db.dmnd --query ${TD}/galaxy/nucleotide.fasta --query-gencode 1 --strand both --min-orf 1 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore --header 0 --compress 0 --matrix BLOSUM62 --comp-based-stats 1 --masking tantan --max-target-seqs 25 --evalue 0.001 --id 0.0 --approx-id 0.0 --query-cover 0.0 --subject-cover 0.0 --block-size 2.0 --motif-masking 0 --soft-masking 0  --swipe --algo 0 --index-chunks 4 --file-buffer-size 67108864" ${SP})
```

--------------------------------

### Protein Alignment with `diamond blastp`

Source: https://context7.com/bbuchfink/diamond/llms.txt

Align protein queries against a reference database using `diamond blastp`. Supports various sensitivity modes, custom output fields, iterative search, taxonomy filtering, and direct use of BLAST databases. Parallel processing is available.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv
```

```bash
# High-sensitivity alignment for detecting remote homologs (<40% identity)
diamond blastp -d reference -q queries.fasta -o results.tsv --very-sensitive
```

```bash
# Fast mode for high-identity hits (>90% identity)
diamond blastp -d reference -q queries.fasta -o results.tsv --fast
```

```bash
# Alignment with custom output fields
diamond blastp -d reference -q queries.fasta -o results.tsv \
  --outfmt 6 qseqid sseqid pident length evalue bitscore qcovhsp stitle
```

```bash
# Using iterative search for better performance when only best hit needed
diamond blastp -d reference -q queries.fasta -o results.tsv \
  --iterate --top 1
```

```bash
# Alignment with taxonomy filtering (requires taxonomy-enabled database)
diamond blastp -d reference -q queries.fasta -o results.tsv \
  --taxonlist 9606,10090  # Only search against human and mouse sequences
```

```bash
# Using BLAST database directly (no makedb needed)
diamond blastp -d swissprot -q queries.fasta -o results.tsv
```

```bash
# Parallel processing with specified threads and memory control
diamond blastp -d reference -q queries.fasta -o results.tsv \
  -p 16 -b 8.0 -c 1
```

```bash
# Output in JSON format
diamond blastp -d reference -q queries.fasta -o results.json \
  --outfmt 104 qseqid sseqid pident evalue
```

--------------------------------

### DIAMOND Faster Sensitivity Mode

Source: https://context7.com/bbuchfink/diamond/llms.txt

The `--faster` mode provides a speed improvement over the default mode while maintaining reasonable sensitivity, positioned between 'fast' and 'default'.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv --faster
```

--------------------------------

### Fast Linear-Time Clustering

Source: https://context7.com/bbuchfink/diamond/llms.txt

Perform fast linear-time clustering recommended for large datasets with high identity thresholds (e.g., >50%). Specify the database, output file, approximate identity, and memory allocation.

```bash
diamond linclust -d proteins.fasta -o clusters.tsv --approx-id 50 -M 64G
```

--------------------------------

### Configure AVX2 Compile Options (MSVC)

Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt

Sets compile options for AVX2 on MSVC, including architecture-specific flags.

```cmake
target_compile_options(arch_avx2 PUBLIC -DDISPATCH_ARCH=ARCH_AVX2 -DARCH_ID=2 /arch:AVX2 -D__SSSE3__ -D__SSE4_1__ -D__POPCNT__ /Zc:__cplusplus)
```

--------------------------------

### Convert DAA to Tabular Format

Source: https://context7.com/bbuchfink/diamond/llms.txt

Convert DAA files to tabular format using the 'diamond view' command. Specify the output format with --outfmt 6.

```bash
diamond view -a results.daa -o results.tsv --outfmt 6
```

--------------------------------

### Global Ranking for Memory-Efficient Best-Hit Searches

Source: https://context7.com/bbuchfink/diamond/llms.txt

Employ --global-ranking for memory-efficient best-hit searches, especially when dealing with large datasets. Adjust the ranking value as needed.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv \
  --global-ranking 25 --very-sensitive
```

--------------------------------

### Convert DIAMOND Output to Parquet via Pipe

Source: https://github.com/bbuchfink/diamond/wiki/File-formats

Pipe the output of the 'diamond' command directly into DuckDB to create a Parquet file. This method avoids intermediate TSV files. Use '/dev/stdin' for reading from the pipe and ensure DIAMOND outputs with headers.

```bash
diamond PARAMETERS | duckdb -c "SET memory_limit='16GB'; SET threads=16; COPY(select * from read_csv_auto('/dev/stdin', delim='\t', header=true, parallel=true)) TO 'output.parquet' WITH (FORMAT 'PARQUET')"
```

--------------------------------

### Convert DAA to XML Format

Source: https://context7.com/bbuchfink/diamond/llms.txt

Convert DAA files to BLAST XML format using the 'diamond view' command. Specify the output format with --outfmt 5.

```bash
diamond view -a results.daa -o results.xml --outfmt 5
```

--------------------------------

### Add DIAMOND Test Command

Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt

Defines a basic test for the DIAMOND command itself, likely for verifying its execution. This is a simple command to ensure DIAMOND runs.

```cmake
add_test(NAME diamond COMMAND diamond test)
```

--------------------------------

### Add NEON Library with Compiler Flag

Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt

Adds a library for NEON support on ARM when the compiler supports '-mfpu=neon'.

```cmake
add_library(arch_neon OBJECT ${DISPATCH_OBJECTS})
    target_compile_options(arch_neon PUBLIC -DDISPATCH_ARCH=ARCH_NEON -DARCH_ID=4 -D__ARM_NEON -mfpu=neon)
```

--------------------------------

### Convert TSV to Parquet using DuckDB CLI

Source: https://github.com/bbuchfink/diamond/wiki/File-formats

Use this command to convert a local TSV file to a Parquet file. Ensure the TSV file has a header and is tab-delimited. Adjust memory and thread settings as needed.

```bash
duckdb -c "SET memory_limit='16GB'; SET threads=16; COPY(select * from read_csv_auto('input.tsv', delim='\t', header=true, parallel=true)) TO 'output.parquet' WITH (FORMAT 'PARQUET')"
```

--------------------------------

### Cluster protein sequences

Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial

Run DIAMOND to cluster protein sequences from a FASTA file. Specify approximate identity, memory, and output file. The `--header` option adds a header line to the output.

```bash
diamond cluster -d astral-scopedom-seqres-gd-sel-gs-bib-95-2.07.fa -o clusters.tsv \
  --approx-id 40 -M 64G --header
```

--------------------------------

### Configure AVX512 Compile Options (MSVC)

Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt

Sets compile options for AVX512 on MSVC, enabling advanced vector extensions.

```cmake
target_compile_options(arch_avx512 PUBLIC -DDISPATCH_ARCH=ARCH_AVX512 -DARCH_ID=3 /arch:AVX512 -D__SSSE3__ -D__SSE4_1__ -D__POPCNT__)
```

--------------------------------

### Custom Tabular Fields for Performance

Source: https://context7.com/bbuchfink/diamond/llms.txt

Select only required fields using --outfmt 6 to improve performance by avoiding unnecessary computation. Specify fields after the format code.

```bash
diamond blastp -d reference -q queries.fasta -o results.tsv \
  --outfmt 6 qseqid sseqid evalue bitscore
```