### Log Installation Paths Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/build/cmake/CMakeLists.txt Logs the installation prefix and library directory paths. This is useful for debugging installation issues. ```cmake MESSAGE(STATUS "CMAKE_INSTALL_PREFIX: ${CMAKE_INSTALL_PREFIX}") MESSAGE(STATUS "CMAKE_INSTALL_LIBDIR: ${CMAKE_INSTALL_LIBDIR}") ``` -------------------------------- ### Build and Install zstd-adaptive Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/contrib/adaptive-compression/README.md Build the adaptive compression tool by running 'make adapt' in the specified directory. For system-wide installation, use 'make install' to create the 'zstd-adaptive' command. ```bash make adapt make install ``` -------------------------------- ### Install Zstandard to Staging Directory Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/build/meson/README.md Install the zstandard library into a staging directory using DESTDIR with ninja. ```sh DESTDIR=./staging ninja install ``` -------------------------------- ### Global Alignment with KSW2 Library Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/ksw2/README.md This example demonstrates how to perform a global alignment using the ksw_extz function from the KSW2 library. It includes sequence encoding, scoring matrix setup, and CIGAR string generation. Ensure ksw2.h is included and the relevant ksw2_*.c file is compiled. ```c #include #include #include "ksw2.h" void align(const char *tseq, const char *qseq, int sc_mch, int sc_mis, int gapo, int gape) { int i, a = sc_mch, b = sc_mis < 0? sc_mis : -sc_mis; // a>0 and b<0 int8_t mat[25] = { a,b,b,b,0, b,a,b,b,0, b,b,a,b,0, b,b,b,a,0, 0,0,0,0,0 }; int tl = strlen(tseq), ql = strlen(qseq); uint8_t *ts, *qs, c[256]; ksw_extz_t ez; memset(&ez, 0, sizeof(ksw_extz_t)); memset(c, 4, 256); c['A'] = c['a'] = 0; c['C'] = c['c'] = 1; c['G'] = c['g'] = 2; c['T'] = c['t'] = 3; // build the encoding table ts = (uint8_t*)malloc(tl); qs = (uint8_t*)malloc(ql); for (i = 0; i < tl; ++i) ts[i] = c[(uint8_t)tseq[i]]; // encode to 0/1/2/3 for (i = 0; i < ql; ++i) qs[i] = c[(uint8_t)qseq[i]]; ksw_extz(0, ql, qs, tl, ts, 5, mat, gapo, gape, -1, -1, 0, &ez); for (i = 0; i < ez.n_cigar; ++i) // print CIGAR printf("%d%c", ez.cigar[i]>>4, "MID"[ez.cigar[i]&0xf]); putchar('\n'); free(ez.cigar); free(ts); free(qs); } int main(int argc, char *argv[]) { align("ATAGCTAGCTAGCAT", "AGCTAcCGCAT", 1, -2, 2, 1); return 0; } ``` -------------------------------- ### Install Corrosion from Source Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/corrosion/doc/src/setup_corrosion.md Build and install Corrosion from its source code. This process includes pre-building native tooling for older CMake versions. Ensure the installation directory is in your PATH or CMAKE_PREFIX_PATH. ```bash git clone https://github.com/corrosion-rs/corrosion.git # Optionally, specify -DCMAKE_INSTALL_PREFIX= to specify a # custom installation directory cmake -Scorrosion -Bbuild -DCMAKE_BUILD_TYPE=Release cmake --build build --config Release # This next step may require sudo or admin privileges if you're installing to a system location, # which is the default. cmake --install build --config Release ``` -------------------------------- ### Install cross-compiler for PowerPC on Ubuntu Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/corrosion/doc/src/usage.md Install the necessary g++ cross-compiler for 64-bit Little-Endian PowerPC on Ubuntu using apt-get. ```bash sudo apt install g++-powerpc64le-linux-gnu ``` -------------------------------- ### Get Module Help Source: https://github.com/soedinglab/mmseqs2/wiki/Home Call any MMseqs2 module without arguments to get a short description, usage text, and a list of important options. ```bash mmseqs createdb ``` -------------------------------- ### PZstandard Piping Example Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/contrib/pzstd/README.md Demonstrates using PZstandard with standard input/output streams for compression, piping data to /dev/null. ```bash cat input-file | pzstd -p num-threads -# -c > /dev/null ``` -------------------------------- ### Print Hello World to Stdout Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/fmt/README.md Basic example of printing a simple string to standard output using fmt::print. ```c++ #include int main() { fmt::print("Hello, world!\n"); } ``` -------------------------------- ### Example compilation with zstd wrapper Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/zlibWrapper/README.md Compile the example.c file with the zstd wrapper enabled and link with the zstd library. ```bash gcc example.c zstd_zlibwrapper.o gz*.c -DZWRAP_USE_ZSTD=1 -lzstd ``` -------------------------------- ### Basic CMake Project Setup Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/corrosion/test/cpp2rust/cpp2rust/CMakeLists.txt Sets the minimum CMake version and defines the project name and version. ```cmake cmake_minimum_required(VERSION 3.15) project(test_project VERSION 0.1.0) ``` -------------------------------- ### Install Rust Target Standard Library for Cross-Compiling Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/corrosion/doc/src/usage.md When cross-compiling Rust code, you must install the standard library for the target triple. Use `rustup target add` to install the necessary components for the specified Rust target. ```bash rustup target add ``` -------------------------------- ### Run Nanopore Bench Global Example Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/block-aligner/vis/block_aligner_bench_vis.ipynb Executes the nanopore_bench_global example using cargo, enabling AVX2 features for performance. The output captures the benchmark results. ```python output = !cd .. && cargo run --example nanopore_bench_global --release --features simd_avx2 --quiet output ``` -------------------------------- ### Build makedb Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/libmarv/Readme.md Compiles the 'makedb' component of the software. Ensure you have the necessary software requirements installed. ```bash make makedb ``` -------------------------------- ### Printf Formatting Example Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/fmt/README.md Shows the conciseness of printf for formatting floating-point numbers. ```c++ printf("%.2f\n", 1.23456); ``` -------------------------------- ### Install Static Zstd Library Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/build/cmake/lib/CMakeLists.txt Installs the static Zstd library to the archive destination. ```cmake IF (ZSTD_BUILD_STATIC) INSTALL(TARGETS libzstd_static ARCHIVE DESTINATION "${CMAKE_INSTALL_LIBDIR}") ENDIF (ZSTD_BUILD_STATIC) ``` -------------------------------- ### Install MMseqs2 via Homebrew Source: https://github.com/soedinglab/mmseqs2/blob/master/README.md Use this command to install MMseqs2 if you are using Homebrew as your package manager. ```bash brew install mmseqs2 ``` -------------------------------- ### Install Zstd Headers Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/build/cmake/lib/CMakeLists.txt Installs the Zstd header files to the include directory. ```cmake INSTALL(FILES ${LIBRARY_DIR}/zstd.h ${LIBRARY_DIR}/deprecated/zbuff.h ${LIBRARY_DIR}/dictBuilder/zdict.h ${LIBRARY_DIR}/dictBuilder/cover.h ${LIBRARY_DIR}/common/zstd_errors.h DESTINATION "include") ``` -------------------------------- ### Include GNU Install Directories Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/build/cmake/CMakeLists.txt Includes standard CMake modules for defining installation directories according to GNU standards. ```cmake INCLUDE(GNUInstallDirs) ``` -------------------------------- ### Run Nanopore Accuracy Example Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/block-aligner/vis/block_aligner_accuracy_vis.ipynb Executes the nanopore_accuracy example using cargo, enabling AVX2 features for performance. The output is captured for further processing. ```python output = !cd .. && cargo run --example nanopore_accuracy --release --features simd_avx2 --quiet output ``` -------------------------------- ### Install MMseqs2 via Docker Source: https://github.com/soedinglab/mmseqs2/blob/master/README.md Pull the official MMseqs2 Docker image from GitHub Container Registry. ```bash docker pull ghcr.io/soedinglab/mmseqs2 ``` -------------------------------- ### Download and Install MMseqs2 Static SSE4.1 Binary Source: https://github.com/soedinglab/mmseqs2/blob/master/README.md Download the static build of MMseqs2 with SSE4.1 support, extract it, and add its bin directory to your PATH. ```bash wget https://mmseqs.com/latest/mmseqs-linux-sse41.tar.gz; tar xvfz mmseqs-linux-sse41.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH ``` -------------------------------- ### Example Usage of Fake Prefiltering Source: https://github.com/soedinglab/mmseqs2/wiki/Home Demonstrates how to use the `fake_pref` function to set up and perform an all-vs-all alignment, followed by converting the results. ```bash fake_pref qdb tdb allvsallpref mmseqs align qdb tdb allvsallpref allvsallaln mmseqs convertalis qdb tdb allvsallaln allvsall.m8 ``` -------------------------------- ### Create and Index Database Source: https://github.com/soedinglab/mmseqs2/wiki/Home Sets up a database by creating it, indexing it, and then loading the index into memory for faster access. ```bash mmseqs createdb targetDB.fasta targetDB mmseqs createindex targetDB tmp mmseqs touchdb targetDB # alternative using vmtouch vmtouch -l -d -t targetDB.idx ``` -------------------------------- ### Download Tutorial Data Source: https://github.com/soedinglab/mmseqs2/wiki/Tutorials Download the necessary FASTA file for the tutorial reads and the Swiss-Prot protein database with its mapping file. ```bash wget http://wwwuser.gwdg.de/~compbiol/mmseqs2/tutorials/mystery_reads.fasta ``` ```bash wget http://wwwuser.gwdg.de/~compbiol/mmseqs2/tutorials/uniprot_sprot_2018_03.fasta.gz ``` ```bash wget http://wwwuser.gwdg.de/~compbiol/mmseqs2/tutorials/uniprot_sprot_2018_03_mapping.tsv.gz ``` ```bash gunzip uniprot_sprot_2018_03_mapping.tsv.gz ``` -------------------------------- ### Download and Setup Swiss-Prot Database Source: https://github.com/soedinglab/mmseqs2/wiki/Home Use the 'databases' command to download and prepare specific databases like UniProtKB/Swiss-Prot. Specify the database name, output path, and a temporary directory. This prepares the database for use as a seqTaxDB if taxonomy information is available. ```bash mmseqs databases UniProtKB/Swiss-Prot outpath/swissprot tmp ``` -------------------------------- ### Install Corrosion for Testing Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/corrosion/test/CMakeLists.txt Configures tests to install Corrosion to a specific test directory and then use the installed version. ```cmake option(CORROSION_TESTS_INSTALL_CORROSION "Install Corrosion to a test directory and let tests use the installed Corrosion" OFF) if(CORROSION_TESTS_INSTALL_CORROSION) add_test(NAME "install_corrosion_configure" COMMAND ${CMAKE_COMMAND} -S "${CMAKE_CURRENT_SOURCE_DIR}/.." -B "${CMAKE_CURRENT_BINARY_DIR}/build-corrosion" -DCORROSION_VERBOSE_OUTPUT=ON -DCORROSION_TESTS=OFF -DCMAKE_BUILD_TYPE=Release -G${CMAKE_GENERATOR} "-DCMAKE_INSTALL_PREFIX=${test_install_path}" ) add_test(NAME "install_corrosion_build" COMMAND ${CMAKE_COMMAND} --build "${CMAKE_CURRENT_BINARY_DIR}/build-corrosion" --config Release ) add_test(NAME "install_corrosion_install" COMMAND ${CMAKE_COMMAND} --install "${CMAKE_CURRENT_BINARY_DIR}/build-corrosion" --config Release ) set_tests_properties("install_corrosion_configure" PROPERTIES FIXTURES_SETUP "fixture_corrosion_configure") set_tests_properties("install_corrosion_build" PROPERTIES FIXTURES_SETUP "fixture_corrosion_build") set_tests_properties("install_corrosion_build" PROPERTIES FIXTURES_REQUIRED "fixture_corrosion_configure") set_tests_properties("install_corrosion_install" PROPERTIES FIXTURES_REQUIRED "fixture_corrosion_build") set_tests_properties("install_corrosion_install" PROPERTIES FIXTURES_SETUP "fixture_corrosion_install") add_test(NAME "install_corrosion_build_cleanup" COMMAND "${CMAKE_COMMAND}" -E remove_directory "${CMAKE_CURRENT_BINARY_DIR}/build-corrosion") set_tests_properties("install_corrosion_build_cleanup" PROPERTIES FIXTURES_CLEANUP "fixture_corrosion_configure;fixture_corrosion_build" ) add_test(NAME "install_corrosion_cleanup" COMMAND "${CMAKE_COMMAND}" -E remove_directory "${test_install_path}") set_tests_properties("install_corrosion_cleanup" PROPERTIES FIXTURES_CLEANUP "fixture_corrosion_configure;fixture_corrosion_build;fixture_corrosion_install" ) endif() ``` -------------------------------- ### Run MMseqs2 Docker Container with Mounted Volume Source: https://github.com/soedinglab/mmseqs2/wiki/Home Example of running the MMseqs2 Docker container, mounting the current directory, and executing a command. ```shell docker run -v "$(pwd):/app" ghcr.io/soedinglab/mmseqs2 easy-search /app/QUERY.fasta /app/DB.fasta /app/result.m8 /app/tmp ``` -------------------------------- ### Build and Run PZstandard Tests Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/contrib/pzstd/README.md Commands to build and run tests for PZstandard, assuming gtest is installed or built via make. ```bash make tests && make check ``` -------------------------------- ### Executing Cargo Install for Generator Build Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/corrosion/generator/CMakeLists.txt Installs the Corrosion generator using cargo install, specifying the Rust compiler and output directory. Handles potential build failures. ```cmake execute_process( COMMAND ${CMAKE_COMMAND} -E env # If the Generator is built at configure of a project (instead of being pre-installed) # We don't want environment variables like `RUSTFLAGS` affecting the Generator build. --unset=RUSTFLAGS "CARGO_BUILD_RUSTC=${RUSTC_EXECUTABLE}" "${CARGO_EXECUTABLE}" install --path "." --root "${generator_destination}" --locked ${_CORROSION_QUIET_OUTPUT_FLAG} WORKING_DIRECTORY "${generator_src}" RESULT_VARIABLE generator_build_failed ) if(generator_build_failed) message(FATAL_ERROR "Building CMake Generator for Corrosion - failed") else() message(STATUS "Building CMake Generator for Corrosion - done") endif() ``` -------------------------------- ### Install MMseqs2 and related tools with Conda Source: https://github.com/soedinglab/mmseqs2/wiki/Tutorials Sets up a Conda environment named 'tutorial' with MMseqs2, Plass, and other necessary bioinformatics tools. Activate the environment before running commands. ```bash conda create -n tutorial -c conda-forge -c bioconda mmseqs2 plass megahit prodigal hmmer sra-tools conda activate tutorial ``` -------------------------------- ### Install Shared Zstd Library Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/build/cmake/lib/CMakeLists.txt Installs the shared Zstd library to runtime, library, and archive destinations. ```cmake IF (ZSTD_BUILD_SHARED) INSTALL(TARGETS libzstd_shared RUNTIME DESTINATION "bin" LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}" ARCHIVE DESTINATION "${CMAKE_INSTALL_LIBDIR}") ENDIF() ``` -------------------------------- ### Compile MMseqs2 from Source on Linux Source: https://github.com/soedinglab/mmseqs2/wiki/Home Clones the MMseqs2 repository, builds it from source using CMake and Make, and installs it. Optimizes performance for the specific system. ```bash git clone https://github.com/soedinglab/MMseqs2.git cd MMseqs2 mkdir build cd build cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. .. make -j8 make install export PATH=$(pwd)/bin/:$PATH ``` -------------------------------- ### List Available Databases Source: https://github.com/soedinglab/mmseqs2/wiki/Home Run 'mmseqs databases' without parameters to view a list of prepared databases, their types, taxonomy information, and download URLs. Use '-h' for extended descriptions. ```bash # mmseqs databases Usage: mmseqs databases [options] Name Type Taxonomy Url - UniRef100 Aminoacid yes https://www.uniprot.org/help/uniref - UniRef90 Aminoacid yes https://www.uniprot.org/help/uniref - UniRef50 Aminoacid yes https://www.uniprot.org/help/uniref - UniProtKB Aminoacid yes https://www.uniprot.org/help/uniprotkb - UniProtKB/TrEMBL Aminoacid yes https://www.uniprot.org/help/uniprotkb - UniProtKB/Swiss-Prot Aminoacid yes https://www.uniprot.org - NR Aminoacid yes https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA - NT Nucleotide - https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA - GTDB Aminoacid yes https://gtdb.ecogenomic.org - PDB Aminoacid - https://www.rcsb.org - PDB70 Profile - https://github.com/soedinglab/hh-suite - Pfam-A.full Profile - https://pfam.xfam.org - Pfam-A.seed Profile - https://pfam.xfam.org - Pfam-B Profile - https://xfam.wordpress.com/2020/06/30/a-new-pfam-b-is-released - CDD Profile - https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml - eggNOG Profile - http://eggnog5.embl.de - VOGDB Profile - https://vogdb.org - dbCAN2 Profile - http://bcb.unl.edu/dbCAN2 - SILVA Nucleotide yes https://www.arb-silva.de - Resfinder Nucleotide - https://cge.cbs.dtu.dk/services/ResFinder - Kalamari Nucleotide yes https://github.com/lskatz/Kalamari ``` -------------------------------- ### Install MMseqs2 via Conda Source: https://github.com/soedinglab/mmseqs2/blob/master/README.md Install MMseqs2 using conda, specifying the conda-forge and bioconda channels. ```bash conda install -c conda-forge -c bioconda mmseqs2 ``` -------------------------------- ### Basic TinyExpr Library Build Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/tinyexpr/CMakeLists.txt Configures and builds the static TinyExpr C library. It sets compiler flags and installs the library and header files. ```cmake cmake_minimum_required(VERSION 3.15) project(tinyexpr C) option(TE_POW_FROM_RIGHT "Evaluate exponents from right to left." OFF) option(TE_NAT_LOG "Define the log function as natural logarithm." OFF) option(build_tinyexpr_test "Build TinyExpr tests." OFF) option(build_tinyexpr_test_pr "Build TinyExpr tests PR." OFF) option(build_tinyexpr_bench "Build TinyExpr benchmark." OFF) option(build_tinyexpr_example "Build TinyExpr example." OFF) option(build_tinyexpr_example2 "Build TinyExpr example 2." OFF) option(build_tinyexpr_example3 "Build TinyExpr example 3." OFF) set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -ansi -Wall -Wshadow -fPIC -O3") set(SOURCE_FILES tinyexpr.c tinyexpr.h ) add_library(tinyexpr STATIC ${SOURCE_FILES}) set_target_properties(tinyexpr PROPERTIES COMPILE_FLAGS "${MMSEQS_C_FLAGS}" LINK_FLAGS "${MMSEQS_C_FLAGS}") if (TE_POW_FROM_RIGHT) target_compile_definitions(tinyexpr PRIVATE TE_POW_FROM_RIGHT) endif() if (TE_NAT_LOG) target_compile_definitions(tinyexpr PRIVATE TE_NAT_LOG) endif() install(TARGETS tinyexpr ARCHIVE DESTINATION lib) install(FILES tinyexpr.h DESTINATION include COMPONENT Devel) ``` -------------------------------- ### Clone and Build Format Benchmarks Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/fmt/README.md Clone the format-benchmark repository and configure it with CMake to prepare for running benchmarks. ```bash git clone --recursive https://github.com/fmtlib/format-benchmark.git cd format-benchmark cmake . ``` -------------------------------- ### Prepare Swiss-Prot Reference Database Source: https://github.com/soedinglab/mmseqs2/wiki/Tutorials Create an MMseqs2 sequence database and a taxonomic database from the downloaded Swiss-Prot data. ```bash mmseqs createdb uniprot_sprot_2018_03.fasta.gz uniprot_sprot ``` ```bash mmseqs createtaxdb uniprot_sprot tmp --tax-mapping-file uniprot_sprot_2018_03_mapping.tsv ``` -------------------------------- ### Find Installed Corrosion Package Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/corrosion/doc/src/setup_corrosion.md After installing Corrosion and ensuring it's in your PATH, find and use it in your CMakeLists.txt. ```cmake find_package(Corrosion REQUIRED) ``` -------------------------------- ### Create Sequence Database and Initial Clustering Source: https://github.com/soedinglab/mmseqs2/wiki/Home Creates a sequence database from a trimmed FASTA file and performs an initial clustering. This sets up the database that will be updated later. ```bash mmseqs createdb DB_trimmed.fasta DB_trimmed mmseqs cluster DB_trimmed DB_trimmed_clu tmp ``` -------------------------------- ### Example LCA output Source: https://github.com/soedinglab/mmseqs2/wiki/Home Example of the resulting TSV file containing taxonomic assignments for each set representative. ```tsv A7ZUJ8 91347 order Enterobacterales ``` -------------------------------- ### Create Profile Database Source: https://github.com/soedinglab/mmseqs2/wiki/Home Use this command to initialize an empty MMseqs2 database file for profiles. ```bash awk 'BEGIN { printf("%c%c%c%c",2,0,0,0); exit; }' > seqDb.dbtype ``` -------------------------------- ### MMseqs2 Database Example Data Source: https://github.com/soedinglab/mmseqs2/wiki/Home Example data file content for an MMseqs2 database containing four sequences. ```text PSSLDIRL \0GTLKRLSAHYTPAW \0AEAIFIHEG \0YTHGAGFDNDI \0 ``` -------------------------------- ### Install Corrosion via Homebrew Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/corrosion/doc/src/setup_corrosion.md Install Corrosion using the Homebrew package manager. Note that this is an unofficial, community-maintained package. ```bash brew install corrosion ``` -------------------------------- ### MMseqs2 Database Example Index Source: https://github.com/soedinglab/mmseqs2/wiki/Home Example index file content corresponding to the MMseqs2 database data, showing IDs, offsets, and sizes. ```text 10 0 9 11 9 15 12 24 10 13 34 12 ``` -------------------------------- ### Create MMseqs2 sequence and taxonomy databases Source: https://github.com/soedinglab/mmseqs2/wiki/Home Commands to create a sequence database and a taxonomy database from input files. ```bash mmseqs createdb sequences.fasta seqdb mmseqs createtaxdb seqdb tmp --tax-mapping-file taxonomy.tsv ``` -------------------------------- ### ZSTD CLI Usage Example Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/lib/dll/example/README.md Shows the basic usage syntax for the ZSTD command-line interface. Use -h or -H for a full list of commands. ```bash Usage: zstd [arg] [input] [output] ``` -------------------------------- ### Converted Alignment Results Example Source: https://github.com/soedinglab/mmseqs2/wiki/Home Shows an example of alignment results converted into a flat file format, potentially using tools like createtsv or convertalis. ```text Q0KJ32 Q0KJ32 783 1.000 7.540E-260 0 418 419 0 418 419 419M Q0KJ32 C0W539 260 0.455 1.305E-79 26 368 379 21 363 369 173M2D41M2D65M6I21M2D37M Q0KJ32 D6KVP9 233 0.434 2.830E-70 25 364 379 30 367 373 162M2I16M3I10M1I5M6D16M2D67M6I25M2D27M ``` -------------------------------- ### Download and Install MMseqs2 Static AVX2 Binary Source: https://github.com/soedinglab/mmseqs2/blob/master/README.md Download the fastest static build of MMseqs2 with AVX2 support, extract it, and add its bin directory to your PATH. ```bash wget https://mmseqs.com/latest/mmseqs-linux-avx2.tar.gz; tar xvfz mmseqs-linux-avx2.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH ``` -------------------------------- ### Example Taxonomy Classification Output Source: https://github.com/soedinglab/mmseqs2/wiki/Home This is an example of the taxonomy classification output produced by the 'lca' module, showing a numeric identifier, rank, and name for two sequences. ```text 1758121 subspecies Limosa lapponica baueri 0 no rank unclassified ``` -------------------------------- ### Configure and Build Zstandard with Meson Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/build/meson/README.md Configure the zstandard build with specific options like build type, enabling contributions, and tests. Then, build the project using ninja. ```sh meson --buildtype=release -D with-contrib=true -D with-tests=true -D with-contrib=true builddir cd builddir ninja # to build ninja install # to install ``` -------------------------------- ### Run Block Aligner Accuracy Example Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/block-aligner/vis/block_aligner_accuracy_vis.ipynb Executes the `uc_accuracy` example using MMseqs2's cargo, enabling SIMD acceleration. The output is captured for further processing. ```python output = !cd .. && cargo run --example uc_accuracy --release --features simd_avx2 --quiet output ``` -------------------------------- ### Create index for expandable profile database Source: https://github.com/soedinglab/mmseqs2/wiki/Home Builds an index for a precomputed profile database to speed up searches. Requires sufficient RAM. ```bash mmseqs createindex uniref30_2103_db tmp --split 1 ``` -------------------------------- ### Example Kraken-style Taxonomy Report Output Source: https://github.com/soedinglab/mmseqs2/wiki/Home An example of the output format for a Kraken-style taxonomy report, showing clade percentages, read counts, rank, and scientific names. ```text 5.6829 362 362 no rank 0 unclassified 94.3171 6008 43 no rank 1 root 87.8493 5596 126 no rank 131567 cellular organisms 42.5903 2713 79 superkingdom 2759 Eukaryota 32.8257 2091 38 no rank 33154 Opisthokonta 24.0502 1532 2 kingdom 33208 Metazoa 23.8776 1521 3 no rank 6072 Eumetazoa 23.2810 1483 49 no rank 33213 Bilateria 14.2857 910 2 no rank 33511 Deuterostomia 13.9560 889 3 phylum 7711 Chordata 13.3124 848 0 subphylum 89593 Craniata ``` -------------------------------- ### Initialize ZSTD Compression Context with Advanced Parameters Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/doc/zstd_manual.html Initializes a ZSTD compression context with advanced parameters, including dictionary, ZSTD_parameters, and pledged source size. Use this for fine-grained control over compression. ```c size_t ZSTD_compressBegin_advanced(ZSTD_CCtx* cctx, const void* dict, size_t dictSize, ZSTD_parameters params, unsigned long long pledgedSrcSize); ``` -------------------------------- ### Run PSSM Accuracy Example Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/block-aligner/vis/block_aligner_accuracy_vis.ipynb Executes the 'pssm_accuracy' example from the Rust project using Cargo, enabling AVX2 SIMD instructions and running quietly. The output is captured. ```bash output = !cd .. && cargo run --example pssm_accuracy --release --features simd_avx2 --quiet output ``` -------------------------------- ### Install Dependencies for Compiling MMseqs2 with Clang (macOS) Source: https://github.com/soedinglab/mmseqs2/wiki/Home Installs necessary build tools and libraries (CMake, libomp, zlib, bzip2) for compiling MMseqs2 with Clang on macOS using Homebrew. ```bash brew install cmake libomp zlib bzip2 ``` -------------------------------- ### Build Zstandard with Makefile Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/README.md Generates the zstd CLI in the root directory by invoking 'make' in the project's root directory. Supports 'make install' and 'make check' for installation and testing. ```bash make ``` -------------------------------- ### PZstandard Help Command Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/contrib/pzstd/README.md Displays the help message for PZstandard, showing all available options and usage information. ```bash pzstd --help ``` -------------------------------- ### Example Taxonomy Top Hit Report Output Source: https://github.com/soedinglab/mmseqs2/wiki/Home An example of a taxonomy top hit report, detailing target identifier, alignment counts, coverage, sequence identity, and taxonomic information. ```text A0A6B9SVR4 6 0.744 1.026 0.419 112596 species Wolbachia phage WO d_Viruses;-_Duplodnaviria;-_Heunggongvirae;p_Uroviricota;c_Caudoviricetes;o_Caudovirales;f_Myoviridae;-_unclassified Myoviridae;s_Wolbachia phage WO ``` -------------------------------- ### ZSTD_isFrame Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/doc/zstd_manual.html Checks if the provided buffer starts with a valid Zstandard frame identifier. It considers legacy and skippable frame identifiers. Returns 0 if the buffer is too small or does not start with a valid identifier. ```APIDOC ## ZSTD_isFrame ### Description Determines if the beginning of a given buffer contains a valid Zstandard frame identifier. This function is useful for identifying Zstandard compressed data streams. ### Parameters - **buffer** (const void*) - Pointer to the buffer to check. - **size** (size_t) - The size of the buffer. ### Returns - unsigned - Returns 1 if the buffer starts with a valid frame identifier, 0 otherwise. Note that legacy frame identifiers are only considered valid if legacy support is enabled. ``` -------------------------------- ### Database Construction Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/libmarv/Readme.md Constructs a database from a FASTA file using the 'makedb' tool. The input file can be gzipped and supports up to 2 billion sequences. Ensure the output directory exists. ```bash mkdir -p dbfolder ./makedb input.fa(.gz) dbfolder/dbname [options] ``` -------------------------------- ### Install Dependencies for Compiling MMseqs2 with GCC (macOS) Source: https://github.com/soedinglab/mmseqs2/wiki/Home Installs necessary build tools and libraries (CMake, GCC 12, zlib, bzip2) for compiling MMseqs2 with GCC on macOS using Homebrew. ```bash brew install cmake gcc@12 zlib bzip2 ``` -------------------------------- ### Initialize Compression Context Parameters with Advanced Parameters Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/doc/zstd_manual.html Initializes compression and frame parameters using a ZSTD_parameters structure. All other parameters are reset to their default values. ```c size_t ZSTD_CCtxParams_init_advanced(ZSTD_CCtx_params* cctxParams, ZSTD_parameters params); ``` -------------------------------- ### Start and Stop MMseqs2 GPU Server Source: https://github.com/soedinglab/mmseqs2/wiki/Home Starts the MMseqs2 GPU server in the background and then stops it using its PID. Ensure that the same --max-seqs and --prefilter-mode parameters, along with CUDA_VISIBLE_DEVICES, are used for both gpuserver and search. ```bash mmseqs gpuserver targetDB_gpu & PID=$! mmseqs search queryDB targetDB_gpu resultDB tmp --gpu 1 --gpu-server 1 --db-load-mode 2 kill $PID ``` -------------------------------- ### Compiling and Generating Zstandard Manual Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/contrib/gen_html/README.md Demonstrates the steps to compile the gen_html program using 'make' and then execute it to generate the zstd manual HTML file. ```bash make ./gen_html.exe 1.1.1 ../../lib/zstd.h zstd_manual.html ``` -------------------------------- ### Build All with Visual Studio 2015 Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/build/VS_scripts/README.md Executes the build for both Release Win32 and Release x64 versions using Visual Studio 2015. ```batch build.VS2015.cmd ``` -------------------------------- ### Example Contig Taxonomy Assignment Output Source: https://github.com/soedinglab/mmseqs2/wiki/Home Example output from the 'aggregatetax' module for nucleotide-protein searches. It includes query accession, taxid, rank name, fragment counts (retained, assigned, in agreement), support, and lineage information. ```text NC_001133.9 4932 species Saccharomyces cerevisiae 32 32 30 0.890 131567;2759;33154;4751;451864;4890;716545;147537;4891;4892;4893;4930;4932 ``` -------------------------------- ### Parse GET Options from URL Source: https://github.com/soedinglab/mmseqs2/blob/master/data/resources/krona_prelude.html Parses GET parameters from the URL to configure visualization options such as collapse, color, dataset, depth, keys, font size, and node selection. Default values are set if parameters are not present. ```javascript var datasetDefault = 0; var maxDepthDefault; var nodeDefault = 0; if ( urlHalves[1] ) { var vars = urlHalves[1].split('&'); for ( i = 0; i < vars.length; i++ ) { var pair = vars[i].split('='); switch ( pair[0] ) { case 'collapse': collapse = pair[1] == 'true'; break; case 'color': hueDefault = pair[1] == 'true'; break; case 'dataset': datasetDefault = Number(pair[1]); break; case 'depth': maxDepthDefault = Number(pair[1]) + 1; break; case 'key': showKeys = pair[1] == 'true'; break; case 'font': fontSize = Number(pair[1]); break; case 'node': nodeDefault = Number(pair[1]); break; default: getVariables.push(pair[0] + '=' + pair[1]); break; } } } ``` -------------------------------- ### Example Taxonomy Output with Ranks Source: https://github.com/soedinglab/mmseqs2/wiki/Home Example of taxonomy classification output when specific ranks (genus, family, order, superkingdom) are requested using the --lca-ranks parameter. The output includes query accession, taxid, rank name, scientific name, and a semicolon-concatenated string of requested taxa. ```text NB501858:55:HMHW7BGXB:1:23301:17888:3880 8932 species Columba livia NB501858:55:HMHW7BGXB:3:12402:9002:13498 131567 no rank cellular organisms NB501858:55:HMHW7BGXB:4:23405:2354:17246 299123 subspecies Lonchura striata domestica NB501858:55:HMHW7BGXB:4:11506:25310:7474 117571 no rank Euteleostomi NB501858:55:HMHW7BGXB:1:21310:9510:6655 0 no rank unclassified NB501858:55:HMHW7BGXB:1:11112:6821:9848 1758121 subspecies Limosa lapponica baueri NB501858:55:HMHW7BGXB:2:22303:18627:2744 2182385 species Brachybacterium endophyticum NB501858:55:HMHW7BGXB:4:22410:13879:7449 8825 superorder Neognathae NB501858:55:HMHW7BGXB:3:13402:20359:7200 97097 species Phaethon lepturus ``` -------------------------------- ### Run Format Benchmark Speed Test Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/fmt/README.md Execute the speed test for the format-benchmark suite after building. ```bash make speed-test ``` -------------------------------- ### Get Percentage Source: https://github.com/soedinglab/mmseqs2/blob/master/data/resources/krona_prelude.html Converts a fraction to a percentage value, rounded to the nearest integer. ```javascript function getPercentage(fraction) { return round(fraction * 100); } ``` -------------------------------- ### Enable POWER8 AltiVec Instructions Source: https://github.com/soedinglab/mmseqs2/wiki/Home Use this flag for PPC64LE systems that support POWER8 AltiVec instructions. ```bash cmake .. -DHAVE_POWER8=1 ``` -------------------------------- ### Get ZSTD Version Number Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/doc/zstd_manual.html Useful for checking the dynamic library version at runtime. ```c unsigned ZSTD_versionNumber(void); ``` -------------------------------- ### Run XXHash Userland Tests Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/contrib/linux-kernel/README.md Navigate to the test directory, build the Google Test framework, compile the XXHash userland tests, and execute them to verify the xxHash kernel module patch. ```bash cd test && make googletest && make XXHashUserLandTest && ./XXHashUserLandTest ``` -------------------------------- ### Build align Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/libmarv/Readme.md Compiles the 'align' component of the software. Ensure you have the necessary software requirements installed. ```bash make align ``` -------------------------------- ### Create Zstandard Dictionary Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/README.md Trains Zstandard on a set of files to create a dictionary for improved small data compression. Specify the training data path and the output dictionary name. ```bash zstd --train FullPathToTrainingSet/* -o dictionaryName ``` -------------------------------- ### Download and Install MMseqs2 Static SSE2 Binary Source: https://github.com/soedinglab/mmseqs2/blob/master/README.md Download the slowest static build of MMseqs2 with SSE2 support for very old systems, extract it, and add its bin directory to your PATH. ```bash wget https://mmseqs.com/latest/mmseqs-linux-sse2.tar.gz; tar xvfz mmseqs-linux-sse2.tar.gz; export PATH=$(pwd)/mmseqs/bin/:$PATH ``` -------------------------------- ### Querying the Database (Default Output) Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/libmarv/Readme.md Queries a constructed database using the 'align' tool. Results are output to stdout in plain text by default. Requires query file and database path. ```bash ./align --query queries.fa(.gz) --db dbfolder/dbname ``` -------------------------------- ### Iostreams Formatting Example Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/fmt/README.md Illustrates the verbosity of iostreams for formatting floating-point numbers compared to printf. ```c++ std::cout << std::setprecision(2) << std::fixed << 1.23456 << "\n"; ``` -------------------------------- ### Registering a New MMseqs2 Command Source: https://github.com/soedinglab/mmseqs2/wiki/MMseqs2-Developer-Guide Example of how a new command is registered in `src/mmseqs.cpp`, including its name, function pointer, argument parsing structure, and documentation. ```cpp {"search", search, &par.searchworkflow, COMMAND_MAIN, "Search with query sequence or profile DB (iteratively) through target sequence DB", "Searches with the sequences or profiles query DB through the target sequence DB by running the prefilter tool and the align tool for Smith-Waterman alignment. For each query a results file with sequence matches is written as entry into a database of search results (alignmentDB).\nIn iterative profile search mode, the detected sequences satisfying user-specified criteria are aligned to the query MSA, and the resulting query profile is used for the next search iteration. Iterative profile searches are usually much more sensitive than (and at least as sensitive as) searches with single query sequences.", "Martin Steinegger ", " ", CITATION_MMSEQS2}, ``` -------------------------------- ### Find System Zstd Library Source: https://github.com/soedinglab/mmseqs2/blob/master/CMakeLists.txt Locates the Zstd library and its include directories if Zstd is installed on the system. ```cmake include(FindPackageHandleStandardArgs) find_path(ZSTD_INCLUDE_DIRS NAMES zstd.h REQUIRED) # We use ZSTD_findDecompressedSize which is only available with ZSTD_STATIC_LINKING_ONLY find_library(ZSTD_LIBRARIES NAMES libzstd.a libzstd_static REQUIRED) find_package_handle_standard_args(ZSTD DEFAULT_MSG ZSTD_LIBRARIES ZSTD_INCLUDE_DIRS) mark_as_advanced(ZSTD_LIBRARIES ZSTD_INCLUDE_DIRS) include_directories(${ZSTD_INCLUDE_DIRS}) ``` -------------------------------- ### Create Generic Database Source: https://github.com/soedinglab/mmseqs2/wiki/Home Use this command to initialize an empty MMseqs2 database file for generic data, such as header databases. ```bash awk 'BEGIN { printf("%c%c%c%c",12,0,0,0); exit; }' > seqDb.dbtype ``` -------------------------------- ### Write to a File Efficiently Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/fmt/README.md Example of writing formatted output to a file using fmt::output_file for potentially faster I/O operations compared to fprintf. ```c++ #include int main() { auto out = fmt::output_file("guide.txt"); out.print("Don't {}", "Panic"); } ``` -------------------------------- ### ZSTD_CCtxParam Set/Get Parameter Source: https://github.com/soedinglab/mmseqs2/blob/master/lib/zstd/doc/zstd_manual.html Functions for setting and getting individual compression parameters within a `ZSTD_CCtx_params` structure. ```APIDOC ## ZSTD_CCtxParam Set/Get Parameter ### Description Functions for setting and getting individual compression parameters within a `ZSTD_CCtx_params` structure. ### Functions - **size_t ZSTD_CCtxParam_setParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, unsigned value);** Sets a single compression parameter, identified by the `ZSTD_cParameter` enum, to the specified `value`. Parameters must be applied to a `ZSTD_CCtx` using `ZSTD_CCtx_setParametersUsingCCtxParams()`. Note: when `value` is an enum, cast it to `unsigned` for proper type checking. @result: 0 on success, or an error code (which can be tested with `ZSTD_isError()`). - **size_t ZSTD_CCtxParam_getParameter(ZSTD_CCtx_params* params, ZSTD_cParameter param, unsigned* value);** Gets the current value of a single compression parameter, identified by the `ZSTD_cParameter` enum. The value is stored in the `unsigned* value` pointer. @result: 0 on success, or an error code (which can be tested with `ZSTD_isError()`). ```