### Install DIAMOND on FreeBSD Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Install DIAMOND on FreeBSD systems using the pkg package manager. ```bash pkg install diamond ``` -------------------------------- ### Install DIAMOND to Home Directory Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Compiles and installs DIAMOND to the 'bin/' folder within your home directory, avoiding the need for sudo rights. ```bash cmake -DCMAKE_INSTALL_PREFIX=$HOME .. make -j $(nproc --all) make install ``` -------------------------------- ### Example Realign Output Source: https://context7.com/bbuchfink/diamond/llms.txt This example demonstrates the output format for the `realign` command, detailing the alignment statistics between cluster centroids and members. ```text # Example realign output: # cseqid mseqid approx_pident cstart cend mstart mend evalue bitscore # d4i0va_ d1dlya_ 47.1 2 122 1 117 9.61e-31 105 ``` -------------------------------- ### Print DIAMOND version Source: https://context7.com/bbuchfink/diamond/llms.txt Display the installed version of DIAMOND. ```bash diamond version ``` -------------------------------- ### Install DIAMOND via Homebrew Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Install DIAMOND on macOS using the Homebrew package manager. ```bash brew install diamond ``` -------------------------------- ### DIAMOND Clustering Command Examples Source: https://github.com/bbuchfink/diamond/wiki/Clustering Examples of using diamond linclust, cluster, and deepclust for different clustering sensitivities and configurations. Use these to perform fast, sensitive, or deep clustering of protein sequences. ```bash # fast clustering with linear scaling diamond linclust -d INPUT_FILE -o OUTPUT_FILE --approx-id 30 -M 64G ``` ```bash # sensitive clustering using all-vs-all alignment diamond cluster -d INPUT_FILE -o OUTPUT_FILE --approx-id 30 -M 64G ``` ```bash # deep clustering (no identity cutoff) diamond deepclust -d INPUT_FILE -o OUTPUT_FILE -M 64G ``` -------------------------------- ### Install DIAMOND from Source (Debian-based) Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Installs necessary dependencies and compiles DIAMOND from source. Ensure you are in the extracted diamond-2.1.24 directory before running these commands. ```bash sudo apt install g++ automake cmake zlib1g-dev libzstd-dev libsqlite3-dev wget http://github.com/bbuchfink/diamond/archive/v2.1.24.tar.gz tar xzf v2.1.24.tar.gz cd diamond-2.1.24 mkdir bin cd bin cmake .. make -j $(nproc --all) sudo make install ``` -------------------------------- ### Frameshift Alignment Example Source: https://context7.com/bbuchfink/diamond/llms.txt This example illustrates the output format for frameshift alignments, indicating insertions or deletions that shift the reading frame. Note the use of '/' for +1 and '\' for -1 frameshifts. ```text # Example frameshift alignment output (pairwise format): # Query= read_001 # Length=1500 # # >protein_xyz # Score = 245 bits (626), Expect = 1e-65 # Frame = +1 # # Query 1 MKFLILLFNILCLFPVLAADRVRVVTGAVIGTAVS/K 108 # MKFLILLFNILCLFPVLAADRVRVVTGAVIG AVS K # Sbjct 1 MKFLILLFNILCLFPVLAADRVRVVTGAVIGLAVSVK 37 # (Note: '/' indicates +1 frameshift, '\' indicates -1 frameshift) ``` -------------------------------- ### Get database information Source: https://context7.com/bbuchfink/diamond/llms.txt Display information about a DIAMOND database using the 'dbinfo' command. ```bash diamond dbinfo -d reference.dmnd ``` -------------------------------- ### Download SCOPe Database Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial Use wget to download the SCOPe database in FASTA format. Ensure you have wget installed. ```bash wget https://scop.berkeley.edu/downloads/scopeseq-2.07/astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa ``` -------------------------------- ### Install DIAMOND via Bioconda Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Install DIAMOND using the Bioconda channel. It is crucial to include '-c conda-forge' and specify the version to avoid installing outdated versions. ```bash conda install -c bioconda -c conda-forge diamond=2.1.24 ``` -------------------------------- ### Example Cluster Output Source: https://context7.com/bbuchfink/diamond/llms.txt This is an example of the output format for protein clustering, showing the centroid (representative) and its associated member sequences. ```text # Example cluster output: # centroid member # d1dlwa_ d1dlwa_ # d2gkma_ d2gkma_ # d4i0va_ d1dlya_ # d4i0va_ d1s69a_ # d4i0va_ d4i0va_ ``` -------------------------------- ### Run DIAMOND Fast Clustering (linclust) Source: https://github.com/bbuchfink/diamond/wiki/Home Perform fast clustering with linear scaling using DIAMOND's linclust command. This example uses a 30% identity threshold and allocates 64GB of memory. ```bash # running fast clustering with linear scaling (30% identity threshold) diamond linclust -d reference.fasta -o clusters.tsv --approx-id 30 -M 64G ``` -------------------------------- ### Run DIAMOND Sensitive Clustering (cluster) Source: https://github.com/bbuchfink/diamond/wiki/Home Perform sensitive clustering using all-vs-all alignment with DIAMOND's cluster command. This example uses a 30% identity threshold and allocates 64GB of memory. ```bash # running sensitive clustering using all-vs-all alignment (30% identity threshold) diamond cluster -d reference.fasta -o clusters.tsv --approx-id 30 -M 64G ``` -------------------------------- ### Compile DIAMOND with Zstd Support Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Enables Zstd compression support during compilation. Requires the zstd development library to be installed on the system. ```bash cmake -DWITH_ZSTD=ON .. make -j $(nproc --all) make install ``` -------------------------------- ### SLURM Batch Script for Parallel Diamond Source: https://github.com/bbuchfink/diamond/wiki/6.-Distributed-computing Example SLURM batch script for launching a massively parallel Diamond run on a supercomputer. Adjust SLURM parameters based on your specific machine and job requirements. `FLAGS` should be replaced with the appropriate parallel flags for Diamond. ```bash #!/bin/bash -l #SBATCH -D ./ #SBATCH -J DIAMOND #SBATCH --mem=185000 #SBATCH --nodes=520 #SBATCH --ntasks-per-node=1 #SBATCH --ntasks-per-core=2 #SBATCH --cpus-per-task=80 #SBATCH --mail-type=none #SBATCH --time=24:00:00 module purge module load gcc impi export SLURM_HINT=multithread srun diamond FLAGS ``` -------------------------------- ### Example TSV Output Lines Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial These lines represent sample output from the DIAMOND search, showing pairwise alignments in TSV format. Each line details query, target, identity, alignment length, E-value, and bit score. ```plaintext d1dlwa_ d1dlwa_ 100 116 0 0 1 116 1 116 6.42e-77 220 d1dlwa_ d2gkma_ 35.4 113 73 0 1 113 13 125 1.43e-21 80.9 d1dlwa_ d4i0va_ 31.9 119 75 2 1 113 2 120 9.11e-13 58.2 d2gkma_ d2gkma_ 100 127 0 0 1 127 1 127 1.51e-87 248 d2gkma_ d1dlwa_ 34.8 115 75 0 13 127 1 115 6.90e-23 84.3 d2gkma_ d4i0va_ 33.6 110 69 1 13 118 2 111 1.35e-18 73.6 d2gkma_ d6bmea_ 35.5 110 67 1 13 118 2 111 1.32e-16 68.6 d2gkma_ d2bkma_ 37.3 67 38 2 13 76 5 70 5.18e-06 40.8 d1ngka_ d1ngka_ 100 126 0 0 1 126 1 126 4.34e-91 257 d1ngka_ d2bkma_ 38.4 125 73 2 1 125 4 124 1.42e-24 89.0 ``` -------------------------------- ### SLURM batch script for parallel execution Source: https://context7.com/bbuchfink/diamond/llms.txt Example SLURM batch script to launch parallel DIAMOND jobs across multiple nodes. Ensure --parallel-tmpdir is a shared file system path. ```bash #!/bin/bash -l #SBATCH -J DIAMOND #SBATCH --nodes=100 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=40 #SBATCH --mem=180G #SBATCH --time=24:00:00 srun diamond blastp --db database.dmnd --query queries.fasta \ -o results \ --multiprocessing \ --tmpdir /local/tmp \ --parallel-tmpdir /shared/diamond_work ``` -------------------------------- ### Print general help Source: https://context7.com/bbuchfink/diamond/llms.txt Show general help information for DIAMOND. ```bash diamond help ``` -------------------------------- ### Create DIAMOND Binary Database Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial Set up a binary DIAMOND database file from a FASTA file. This prepares the database for subsequent searches. ```bash diamond makedb --in astral-scopedom-seqres-gd-sel-gs-bib-40-2.07.fa -d astral40 ``` -------------------------------- ### Create index for small database optimization Source: https://context7.com/bbuchfink/diamond/llms.txt For small databases, create an index first using 'diamond makeidx'. This is a prerequisite for using the --target-indexed option. ```bash diamond makeidx -d small_db.dmnd --sensitive ``` -------------------------------- ### Create DIAMOND Database Source: https://github.com/bbuchfink/diamond/wiki/Home Create a DIAMOND-formatted database from a reference FASTA file. ```bash # creating a diamond-formatted database file ./diamond makedb --in reference.fasta -d reference ``` -------------------------------- ### Compile GCC Source: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics Steps to download, configure, and compile GCC from source. Ensure you have sufficient resources for compilation. ```bash cd wget ftp://ftp.gnu.org/gnu/gcc/gcc-10.2.0/gcc-10.2.0.tar.gz tar xzf gcc-10.2.0.tar.gz cd gcc-10.2.0 ./contrib/download_prerequisites cd .. mkdir objdir cd objdir $PWD/../gcc-10.2.0/configure --prefix=$HOME/GCC-10.2.0 --disable-multilib --disable-bootstrap --enable-languages=c++ make -j $(nproc --all) make install ``` -------------------------------- ### Print blastp command help Source: https://context7.com/bbuchfink/diamond/llms.txt Display detailed help information specifically for the 'blastp' command. ```bash diamond blastp --help ``` -------------------------------- ### Initialize parallel run Source: https://context7.com/bbuchfink/diamond/llms.txt Step 1 of distributed computing: Initialize the parallel run on a login node. Requires --multiprocessing and --mp-init. ```bash diamond blastp --db database.dmnd --query queries.fasta \ --multiprocessing --mp-init \ --tmpdir /local/tmp \ --parallel-tmpdir /shared/diamond_work \ -b 2.0 ``` -------------------------------- ### Create DIAMOND Database Source: https://context7.com/bbuchfink/diamond/llms.txt Use `diamond makedb` to create a binary database from FASTA files. Supports optional taxonomy mapping for classification. Input can be gzipped or piped via stdin. ```bash diamond makedb --in proteins.fasta -d proteins ``` ```bash # Database creation with taxonomy support # First download NCBI taxonomy files: wget http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz wget http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip unzip taxdmp.zip diamond makedb --in proteins.fasta -d proteins \ --taxonmap prot.accession2taxid.FULL.gz \ --taxonnodes nodes.dmp \ --taxonnames names.dmp ``` ```bash diamond makedb --in proteins.fasta.gz -d proteins ``` ```bash cat proteins.fasta | diamond makedb -d proteins ``` -------------------------------- ### Download FASTA file Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial Use wget to download the Astral-Scopedom FASTA file for clustering. This is the input for the DIAMOND clustering command. ```bash wget https://scop.berkeley.edu/downloads/scopeseq-2.07/astral-scopedom-seqres-gd-sel-gs-bib-95-2.07.fa ``` -------------------------------- ### Pull Specific Docker Container Version Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Specify a version number when pulling the Docker container to get a particular release of DIAMOND. ```bash docker pull buchfink/diamond:version2.1.24 ``` -------------------------------- ### Algorithm selection for small query files Source: https://context7.com/bbuchfink/diamond/llms.txt Choose the algorithm with --algo ctg for processing small query files. ```bash diamond blastp -d reference -q small_queries.fasta -o results.tsv --algo ctg ``` -------------------------------- ### Get Alignments for Cluster Members Source: https://context7.com/bbuchfink/diamond/llms.txt Retrieve detailed alignments between cluster members and their representatives. This requires specifying the database, cluster file, output file, memory, and optionally a header. ```bash diamond realign -d proteins.fasta --clusters clusters.tsv \ -o alignments.tsv -M 64G --header ``` -------------------------------- ### Database Creation with makedb Source: https://context7.com/bbuchfink/diamond/llms.txt The `makedb` command creates a DIAMOND-formatted binary database from FASTA input files. This database is required for alignment workflows and can optionally include taxonomy mapping. ```APIDOC ## Database Creation with makedb ### Description The `makedb` command creates a DIAMOND-formatted binary database from FASTA input files. This database format enables efficient indexed searching and is required before running alignment workflows. The command supports optional taxonomy mapping for taxonomic classification features. ### Method `diamond makedb` ### Endpoint N/A (Command-line tool) ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```bash # Basic database creation from FASTA file diamond makedb --in proteins.fasta -d proteins # Database creation with taxonomy support # First download NCBI taxonomy files: wget http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz wget http://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip unzip taxdmp.zip diamond makedb --in proteins.fasta -d proteins \ --taxonmap prot.accession2taxid.FULL.gz \ --taxonnodes nodes.dmp \ --taxonnames names.dmp # Database creation from gzip-compressed input (auto-detected) diamond makedb --in proteins.fasta.gz -d proteins # Database creation from stdin cat proteins.fasta | diamond makedb -d proteins ``` ### Response #### Success Response (Output) - **Database sequences** (integer) - Number of sequences in the database - **Database letters** (integer) - Total number of letters (bases/amino acids) in the database - **Database hash** (string) - Hash of the database content - **Total time** (string) - Time taken for database creation #### Response Example ``` Database sequences 14323 Database letters 2847635 Database hash a1b2c3d4e5f6 Total time 0.532s ``` ``` -------------------------------- ### Create Small Database Index Source: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics Optimize small databases (<10 MB) by creating a specific seed index file. The sensitivity setting used here must match the setting for subsequent alignment runs. ```bash diamond makeidx -d ``` -------------------------------- ### DIAMOND Iterative Search Source: https://context7.com/bbuchfink/diamond/llms.txt Perform an iterative search that automatically adjusts sensitivity across multiple rounds, starting with high sensitivity and potentially increasing it. This is useful for finding the single best hit across varying similarity levels. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv \ --iterate --ultra-sensitive ``` -------------------------------- ### Download and Extract Linux Binary Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Use wget to download the precompiled Linux binary and tar to extract it. This binary includes multiple code paths for different CPU instruction set supports. ```bash wget http://github.com/bbuchfink/diamond/releases/download/v2.1.24/diamond-linux64.tar.gz tar xzf diamond-linux64.tar.gz ``` -------------------------------- ### Build Database from Multiple FASTA Files Source: https://github.com/bbuchfink/diamond/wiki/4.-Support-&-FAQ Use this command to build a DIAMOND database from multiple gzipped FASTA files in the current directory. It utilizes a pipe to stream the concatenated files into the `diamond makedb` command. ```bash zcat *.fasta.gz | diamond makedb -d diamond_db ``` -------------------------------- ### Use indexed search for small databases Source: https://context7.com/bbuchfink/diamond/llms.txt After creating an index, use the --target-indexed option for efficient searching against small databases. ```bash diamond blastp -d small_db -q queries.fasta -o results.tsv \ --target-indexed --sensitive ``` -------------------------------- ### Download and Extract DIAMOND Source: https://github.com/bbuchfink/diamond/wiki/Home Use these commands to download the DIAMOND executable for Linux 64-bit and extract it. ```bash # downloading the tool wget http://github.com/bbuchfink/diamond/releases/download/v2.1.24/diamond-linux64.tar.gz tar xzf diamond-linux64.tar.gz ``` -------------------------------- ### Initialize Parallel Diamond Run Source: https://github.com/bbuchfink/diamond/wiki/6.-Distributed-computing Use this command to initialize a parallel run by scanning query and database, and writing work chunks to a file-based stack. Requires `--multiprocessing` and `--mp-init` flags. ```bash diamond blastp --db DATABASE.dmnd --query QUERY.fasta --multiprocessing --mp-init --tmpdir $TMPDIR --parallel-tmpdir $PTMPDIR ``` -------------------------------- ### Compile Diamond with Custom GCC Source: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics Instructions to clone the Diamond repository, set up environment variables for the custom GCC, and compile Diamond. This ensures Diamond is built with the specified GCC version. ```bash cd git clone https://github.com/bbuchfink/diamond.git cd diamond mkdir bin cd bin export CC=$HOME/GCC-10.2.0/bin/gcc export CXX=$HOME/GCC-10.2.0/bin/g++ cmake -DSTATIC_LIBGCC=ON -DSTATIC_LIBSTDC++=ON .. make -j $(nproc --all) ``` -------------------------------- ### Create a Portable DIAMOND Binary Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Compiles a portable binary that includes multiple code paths for different CPU instruction set support (AVX2, SSE4.1, SSE2). ```bash cmake .. make -j $(nproc --all) make install ``` -------------------------------- ### Use local temporary directory for I/O performance Source: https://context7.com/bbuchfink/diamond/llms.txt Improve I/O performance by specifying a local temporary directory with --tmpdir. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv \ --tmpdir /local/tmp ``` -------------------------------- ### Initialize Multi-Node Blastp Alignment Source: https://github.com/bbuchfink/diamond/wiki/How-to-cluster-huge-datasets Initialize a multi-node `blastp` alignment run on a head node using `--mp-init`. The `--parallel-tmpdir` must be accessible by all compute nodes. This prepares for subsequent alignment runs on worker nodes. ```bash diamond blastp -q reps.faa -d reps -o out -f 6 qseqid sseqid corrected_bitscore --approx-id 30 --query-cover 90 -k1000 -c1 --fast --multiprocessing --mp-init --parallel-tmpdir $PTMP ``` -------------------------------- ### Run built-in tests Source: https://context7.com/bbuchfink/diamond/llms.txt Execute the built-in test suite for DIAMOND using the 'test' command. ```bash diamond test ``` -------------------------------- ### Create a Statically Linked DIAMOND Binary Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Compiles a more easily portable binary by statically linking the GCC and C++ standard libraries. ```bash cmake -DSTATIC_LIBGCC=ON -DSTATIC_LIBSTDC++=ON .. make -j $(nproc --all) make install ``` -------------------------------- ### Run DIAMOND Search (blastx) Source: https://github.com/bbuchfink/diamond/wiki/Home Perform a sequence search in blastx mode against a DIAMOND database. ```bash # running a search in blastx mode ./diamond blastx -d reference -q reads.fasta -o matches.tsv ``` -------------------------------- ### Compile DIAMOND from Source Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Commands to compile DIAMOND from source. Requires GCC 4.8.1+, CMake 2.6+, and development headers for libpthread, sqlite3, and zlib. ```bash cmake -DCMAKE_BUILD_MARCH=native .. make ``` -------------------------------- ### Iterative Search with Custom Sensitivity Steps Source: https://context7.com/bbuchfink/diamond/llms.txt Use the --iterate option to perform searches with progressively increasing sensitivity. This is useful for balancing speed and sensitivity. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv \ --iterate "fast default sensitive very-sensitive" ``` -------------------------------- ### Run DIAMOND Search (blastp) Source: https://github.com/bbuchfink/diamond/wiki/Home Perform a sequence search in blastp mode against a DIAMOND database. ```bash # running a search in blastp mode ./diamond blastp -d reference -q queries.fasta -o matches.tsv ``` -------------------------------- ### View DAA with custom format Source: https://context7.com/bbuchfink/diamond/llms.txt Specify custom output fields when viewing a DAA file using --outfmt 6 and listing desired fields. ```bash diamond view -a results.daa --outfmt 6 qseqid sseqid pident evalue ``` -------------------------------- ### Download and Use BLAST Database with DIAMOND Source: https://github.com/bbuchfink/diamond/wiki/Home Download and decompress a BLAST database (e.g., SwissProt) for use with DIAMOND version 2.1.14 or later. Then, perform a blastp search against it. ```bash # downloading and using a BLAST database (use DIAMOND >= v2.1.14) update_blastdb.pl --decompress --blastdb_version 5 swissprot ./diamond blastp -d swissprot -q queries.fasta -o matches.tsv ``` -------------------------------- ### Verbose logging Source: https://context7.com/bbuchfink/diamond/llms.txt Enable verbose logging output with the --log option. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv --log ``` -------------------------------- ### Compile DIAMOND with Native Optimizations Source: https://github.com/bbuchfink/diamond/wiki/2.-Installation Performs a native compile of DIAMOND, optimizing for the specific architecture of the build machine. ```bash cmake -DCMAKE_BUILD_MARCH=native .. make -j $(nproc --all) make install ``` -------------------------------- ### Extract sequences from database Source: https://context7.com/bbuchfink/diamond/llms.txt Retrieve all sequences from a DIAMOND database in FASTA format using the 'getseq' command. ```bash diamond getseq -d reference.dmnd > sequences.fasta ``` -------------------------------- ### Inspect realignment output Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial Use the `head` command to view the first few lines of the `aln.tsv` file, which contains detailed alignment statistics between cluster representatives and their members. ```bash head aln.tsv ``` -------------------------------- ### Full Tabular Output with Taxonomy Source: https://context7.com/bbuchfink/diamond/llms.txt Generate comprehensive tabular output including taxonomic information. Requires a taxonomy-enabled database. Specify fields like 'staxids', 'sscinames', and 'sphylums'. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv \ --outfmt 6 qseqid sseqid pident evalue staxids sscinames sphylums ``` -------------------------------- ### Inspect cluster output Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial Use the `head` command to view the first few lines of the `clusters.tsv` file, which shows cluster representatives and their member sequences. ```bash head clusters.tsv ``` -------------------------------- ### Use Small Database Index for Alignment Source: https://github.com/bbuchfink/diamond/wiki/5.-Advanced-topics Align sequences using a pre-generated small database index. Ensure the `--target-indexed` option is used and the sensitivity setting matches the index creation. ```bash diamond --target-indexed ... ``` -------------------------------- ### Configure AVX2 Compile Options (GCC/Clang) Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt Sets compile options for AVX2 using GCC or Clang, enabling AVX and related instruction sets. ```cmake target_compile_options(arch_avx2 PUBLIC -DDISPATCH_ARCH=ARCH_AVX2 -DARCH_ID=2 -mssse3 -mpopcnt -msse4.1 -msse4.2 -mavx -mavx2) ``` -------------------------------- ### Compressed Output Source: https://context7.com/bbuchfink/diamond/llms.txt Enable compression for output files using the --compress 1 option. This reduces disk space usage for large result files. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv.gz --compress 1 ``` -------------------------------- ### Execute Parallel Diamond Run Source: https://github.com/bbuchfink/diamond/wiki/6.-Distributed-computing Run Diamond in parallel after initialization. Ensure `--parallel-tmpdir` points to the same location as used during initialization. This command distributes work across multiple compute nodes. ```bash diamond blastp --db DATABASE.dmnd --query QUERY.fasta -o OUTPUT_FILE --multiprocessing --tmpdir $TMPDIR --parallel-tmpdir $PTMPDIR ``` -------------------------------- ### Run parallel workers Source: https://context7.com/bbuchfink/diamond/llms.txt Step 2 of distributed computing: Launch parallel workers on compute nodes. Requires --multiprocessing. ```bash diamond blastp --db database.dmnd --query queries.fasta \ -o results.tsv \ --multiprocessing \ --tmpdir /local/tmp \ --parallel-tmpdir /shared/diamond_work ``` -------------------------------- ### Configure AVX512 Compile Options (GCC/Clang) Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt Sets compile options for AVX512 using GCC or Clang, including AVX512F and AVX512BW. ```cmake target_compile_options(arch_avx512 PUBLIC -DDISPATCH_ARCH=ARCH_AVX512 -DARCH_ID=3 -mssse3 -mpopcnt -msse4.1 -msse4.2 -mavx -mavx2 -mavx512f -mavx512bw) ``` -------------------------------- ### Configure SSE4.1 Compile Options (MSVC) Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt Sets compile options for SSE4.1 on MSVC, defining dispatch architecture and ID. ```cmake target_compile_options(arch_sse4_1 PUBLIC -DDISPATCH_ARCH=ARCH_SSE4_1 -DARCH_ID=1 -D__SSSE3__ -D__SSE4_1__ -D__POPCNT__ /Zc:__cplusplus) ``` -------------------------------- ### DAA Format Output Source: https://context7.com/bbuchfink/diamond/llms.txt Produce DIAMOND's proprietary alignment format (DAA) using --outfmt 100. This format is optimized for later conversion or analysis with tools like MEGAN. ```bash diamond blastp -d reference -q queries.fasta -o results.daa --outfmt 100 ``` -------------------------------- ### Configure SSE4.1 Compile Options (GCC/Clang) Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt Sets compile options for SSE4.1 using GCC or Clang, specifying instruction set extensions. ```cmake target_compile_options(arch_sse4_1 PUBLIC -DDISPATCH_ARCH=ARCH_SSE4_1 -DARCH_ID=1 -mssse3 -mpopcnt -msse4.1) ``` -------------------------------- ### Combine Alignment Outputs and Perform Clustering Source: https://github.com/bbuchfink/diamond/wiki/How-to-cluster-huge-datasets Combine the `blastp` alignment output files into a single TSV and index the representative FASTA file. Then, use `greedy-vertex-cover` to perform clustering based on the alignment results. The `--edge-format triplet` assumes the output file contains source, target, and score. ```bash cat out_* > out.tsv ``` ```bash samtools faidx reps.faa ``` ```bash diamond greedy-vertex-cover --edges out.tsv -d reps.faa.fai --edge-format triplet -o clusters_round_2.tsv --connected-component-depth 0 ``` -------------------------------- ### Convert DIAMOND Output to DuckDB Database via Pipe Source: https://github.com/bbuchfink/diamond/wiki/File-formats Pipe DIAMOND's output directly into DuckDB to create a table named 'alignments' in the specified database. This is an efficient way to load data without intermediate files. Adjust memory and thread settings as needed. ```bash diamond PARAMETERS | duckdb DATABASE_NAME -c "SET memory_limit='16GB'; SET threads=16; create table alignments as select * from read_csv_auto('/dev/stdin', delim='\t', header=true, parallel=true)" ``` -------------------------------- ### Custom Cascaded Clustering Steps Source: https://context7.com/bbuchfink/diamond/llms.txt Define a custom sequence of clustering steps, combining different algorithms like 'faster_lin', 'fast', and 'sensitive'. This allows for fine-tuning the clustering process. ```bash diamond cluster -d proteins.fasta -o clusters.tsv \ --approx-id 30 --cluster-steps "faster_lin fast default sensitive" -M 64G ``` -------------------------------- ### Inspect Search Output Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial View the beginning of the output file to inspect the pairwise alignments. The output is in TSV format. ```bash head out.tsv ``` -------------------------------- ### Add galaxy_7 blastx Test Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt Sets up a blastx test named 'galaxy_7' with specific parameters for database, query, output format, and various BLAST options. This test is configured for detailed output and specific alignment parameters. ```cmake add_test(NAME galaxy_7 COMMAND ${CMAKE_COMMAND} -DNAME=galaxy_7 "-DARGS=blastx --threads 2 --db ${TD}/galaxy/db.dmnd --query ${TD}/galaxy/nucleotide.fasta --query-gencode 1 --strand both --min-orf 1 --outfmt 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore --header 0 --compress 0 --matrix BLOSUM62 --comp-based-stats 1 --masking tantan --max-target-seqs 25 --evalue 0.001 --id 0.0 --approx-id 0.0 --query-cover 0.0 --subject-cover 0.0 --block-size 2.0 --motif-masking 0 --soft-masking 0 --swipe --algo 0 --index-chunks 4 --file-buffer-size 67108864" ${SP}) ``` -------------------------------- ### Protein Alignment with `diamond blastp` Source: https://context7.com/bbuchfink/diamond/llms.txt Align protein queries against a reference database using `diamond blastp`. Supports various sensitivity modes, custom output fields, iterative search, taxonomy filtering, and direct use of BLAST databases. Parallel processing is available. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv ``` ```bash # High-sensitivity alignment for detecting remote homologs (<40% identity) diamond blastp -d reference -q queries.fasta -o results.tsv --very-sensitive ``` ```bash # Fast mode for high-identity hits (>90% identity) diamond blastp -d reference -q queries.fasta -o results.tsv --fast ``` ```bash # Alignment with custom output fields diamond blastp -d reference -q queries.fasta -o results.tsv \ --outfmt 6 qseqid sseqid pident length evalue bitscore qcovhsp stitle ``` ```bash # Using iterative search for better performance when only best hit needed diamond blastp -d reference -q queries.fasta -o results.tsv \ --iterate --top 1 ``` ```bash # Alignment with taxonomy filtering (requires taxonomy-enabled database) diamond blastp -d reference -q queries.fasta -o results.tsv \ --taxonlist 9606,10090 # Only search against human and mouse sequences ``` ```bash # Using BLAST database directly (no makedb needed) diamond blastp -d swissprot -q queries.fasta -o results.tsv ``` ```bash # Parallel processing with specified threads and memory control diamond blastp -d reference -q queries.fasta -o results.tsv \ -p 16 -b 8.0 -c 1 ``` ```bash # Output in JSON format diamond blastp -d reference -q queries.fasta -o results.json \ --outfmt 104 qseqid sseqid pident evalue ``` -------------------------------- ### DIAMOND Faster Sensitivity Mode Source: https://context7.com/bbuchfink/diamond/llms.txt The `--faster` mode provides a speed improvement over the default mode while maintaining reasonable sensitivity, positioned between 'fast' and 'default'. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv --faster ``` -------------------------------- ### Fast Linear-Time Clustering Source: https://context7.com/bbuchfink/diamond/llms.txt Perform fast linear-time clustering recommended for large datasets with high identity thresholds (e.g., >50%). Specify the database, output file, approximate identity, and memory allocation. ```bash diamond linclust -d proteins.fasta -o clusters.tsv --approx-id 50 -M 64G ``` -------------------------------- ### Configure AVX2 Compile Options (MSVC) Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt Sets compile options for AVX2 on MSVC, including architecture-specific flags. ```cmake target_compile_options(arch_avx2 PUBLIC -DDISPATCH_ARCH=ARCH_AVX2 -DARCH_ID=2 /arch:AVX2 -D__SSSE3__ -D__SSE4_1__ -D__POPCNT__ /Zc:__cplusplus) ``` -------------------------------- ### Convert DAA to Tabular Format Source: https://context7.com/bbuchfink/diamond/llms.txt Convert DAA files to tabular format using the 'diamond view' command. Specify the output format with --outfmt 6. ```bash diamond view -a results.daa -o results.tsv --outfmt 6 ``` -------------------------------- ### Global Ranking for Memory-Efficient Best-Hit Searches Source: https://context7.com/bbuchfink/diamond/llms.txt Employ --global-ranking for memory-efficient best-hit searches, especially when dealing with large datasets. Adjust the ranking value as needed. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv \ --global-ranking 25 --very-sensitive ``` -------------------------------- ### Convert DIAMOND Output to Parquet via Pipe Source: https://github.com/bbuchfink/diamond/wiki/File-formats Pipe the output of the 'diamond' command directly into DuckDB to create a Parquet file. This method avoids intermediate TSV files. Use '/dev/stdin' for reading from the pipe and ensure DIAMOND outputs with headers. ```bash diamond PARAMETERS | duckdb -c "SET memory_limit='16GB'; SET threads=16; COPY(select * from read_csv_auto('/dev/stdin', delim='\t', header=true, parallel=true)) TO 'output.parquet' WITH (FORMAT 'PARQUET')" ``` -------------------------------- ### Convert DAA to XML Format Source: https://context7.com/bbuchfink/diamond/llms.txt Convert DAA files to BLAST XML format using the 'diamond view' command. Specify the output format with --outfmt 5. ```bash diamond view -a results.daa -o results.xml --outfmt 5 ``` -------------------------------- ### Add DIAMOND Test Command Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt Defines a basic test for the DIAMOND command itself, likely for verifying its execution. This is a simple command to ensure DIAMOND runs. ```cmake add_test(NAME diamond COMMAND diamond test) ``` -------------------------------- ### Add NEON Library with Compiler Flag Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt Adds a library for NEON support on ARM when the compiler supports '-mfpu=neon'. ```cmake add_library(arch_neon OBJECT ${DISPATCH_OBJECTS}) target_compile_options(arch_neon PUBLIC -DDISPATCH_ARCH=ARCH_NEON -DARCH_ID=4 -D__ARM_NEON -mfpu=neon) ``` -------------------------------- ### Convert TSV to Parquet using DuckDB CLI Source: https://github.com/bbuchfink/diamond/wiki/File-formats Use this command to convert a local TSV file to a Parquet file. Ensure the TSV file has a header and is tab-delimited. Adjust memory and thread settings as needed. ```bash duckdb -c "SET memory_limit='16GB'; SET threads=16; COPY(select * from read_csv_auto('input.tsv', delim='\t', header=true, parallel=true)) TO 'output.parquet' WITH (FORMAT 'PARQUET')" ``` -------------------------------- ### Cluster protein sequences Source: https://github.com/bbuchfink/diamond/wiki/1.-Tutorial Run DIAMOND to cluster protein sequences from a FASTA file. Specify approximate identity, memory, and output file. The `--header` option adds a header line to the output. ```bash diamond cluster -d astral-scopedom-seqres-gd-sel-gs-bib-95-2.07.fa -o clusters.tsv \ --approx-id 40 -M 64G --header ``` -------------------------------- ### Configure AVX512 Compile Options (MSVC) Source: https://github.com/bbuchfink/diamond/blob/master/CMakeLists.txt Sets compile options for AVX512 on MSVC, enabling advanced vector extensions. ```cmake target_compile_options(arch_avx512 PUBLIC -DDISPATCH_ARCH=ARCH_AVX512 -DARCH_ID=3 /arch:AVX512 -D__SSSE3__ -D__SSE4_1__ -D__POPCNT__) ``` -------------------------------- ### Custom Tabular Fields for Performance Source: https://context7.com/bbuchfink/diamond/llms.txt Select only required fields using --outfmt 6 to improve performance by avoiding unnecessary computation. Specify fields after the format code. ```bash diamond blastp -d reference -q queries.fasta -o results.tsv \ --outfmt 6 qseqid sseqid evalue bitscore ```