### C++ Example Usage Source: https://github.com/tlemane/kmtricks/wiki/repartition-api This C++ code demonstrates how to load a repartition, get the partition for a kmer's minimizer, and use partition-aware hashing. ```APIDOC ## C++ Example Usage ### Description This example shows how to initialize the `Repartition` class, retrieve the partition for a kmer's minimizer, and then use this partition for hash window initialization. ### Method C++ Class Initialization and Method Calls ### Usage ```cpp #include using namespace km; int main(int argc, char* argv[]) { // Load the repartition from a file Repartition repart("kmtricks_dir/repartition_gatb/repartition.minimRepart"); // Create a Kmer object Kmer<32> kmer("ACGTACGTACGT"); uint8_t minim_size = 4; // Get the partition for the kmer's minimizer uint32_t partition = repart.get_partition(kmer.minimizer(minim_size).value()); // Initialize a partition-aware hash window using WHType = KmerHashers<0>::WinHasher<32>; HashWindow hw("kmtricks_dir/hash.info"); WHType whasher(partition, hw.get_window_size_bits()); // Compute the hash uint64_t whash = whasher(kmer); return 0; } ``` ### Parameters #### Repartition Constructor - **filePath** (string) - Required - Path to the repartition file (e.g., `kmtricks_dir/repartition_gatb/repartition.minimRepart`). #### `get_partition` Method - **minimizer_hash** (uint64_t) - Required - The hash value of a minimizer. #### `HashWindow` Constructor - **filePath** (string) - Required - Path to the hash info file (e.g., `kmtricks_dir/hash.info`). #### `WinHasher` Constructor - **partition** (uint32_t) - Required - The partition ID obtained from `Repartition::get_partition`. - **window_size_bits** (uint32_t) - Required - The size of the hash window in bits, obtained from `HashWindow::get_window_size_bits()`. ### Request Example (N/A - This is C++ code, not an HTTP request) ### Response (N/A - This is C++ code, not an HTTP response) ``` -------------------------------- ### Install kmtricks with Nix Source: https://github.com/tlemane/kmtricks/wiki/Installation Use Nix to enter a shell environment with kmtricks available. ```bash nix shell github:tlemane/kmtricks ``` -------------------------------- ### Example fof.txt content Source: https://github.com/tlemane/kmtricks/wiki/combine Example content for the `fof.txt` file used with `kmtricks combine`, listing the matrices to be merged. ```text matrix1 matrix2 matrix3 ``` -------------------------------- ### Install kmtricks with Conda Source: https://github.com/tlemane/kmtricks/wiki/Installation Use Conda to create an environment, activate it, and install kmtricks from conda-forge and bioconda channels. ```bash conda create -p kmtricks_env conda activate ./kmtricks_env conda install -c conda-forge -c bioconda kmtricks ``` -------------------------------- ### Check kmtricks Build Configuration Source: https://context7.com/tlemane/kmtricks/llms.txt Use the `kmtricks infos` command to display compiled-in limits, library versions, and build flags. This is useful for verifying your kmtricks installation and understanding its capabilities. ```bash # Show compiled-in limits, library versions, and build flags kmtricks infos ``` -------------------------------- ### File-of-File (fof) Format Example Source: https://github.com/tlemane/kmtricks/wiki/Input-data Illustrates the structure of the file-of-file (fof) format for specifying input samples and their associated files, including an optional minimum abundance threshold. ```text A1 : /path/to/fastq_A1_1 ! 4 B1 : /path/to/fastq_B1_1 ; /with/mutiple/fasta_B1_2 ! 2 ``` -------------------------------- ### Kmer<32> Usage Example Source: https://github.com/tlemane/kmtricks/wiki/k-mer-api Demonstrates basic Kmer<32> operations including initialization, reverse complement, canonical form, comparisons, string conversion, binary I/O, data access, and hashing. Ensure necessary headers are included and the namespace is used. ```cpp #include using namespace km; int main(int argc, char* argv[]) { Kmer<32> kmer("ACGTACGTACGT"); Kmer<32> kmer2("TACTACTACTAC"); // k-mer operations Kmer<32> rev = kmer.rev_comp(); Kmer<32> cano = kmer.canonical(); // comparisons bool a = kmer == kmer2; bool b = kmer != kmer2; bool c = kmer < kmer2; bool c = kmer > kmer2; // string representation std::cout << kmer.to_string() << std::endl; std::cout << kmer.to_bit_string() << std::endl; // io { std::ofstream out("kmer_file", std::ios::out | std::ios::binary); kmer.dump(out); } { std::ifstream in("kmer_file", std::ios::in | std::ios::binary); Kmer<32> loaded(12); // kmer_size loaded.load(in); } // access const uint64_t* kmer_data = kmer.get_data64(); const uint8_t* kmer_data8 = kmer.get_data8(); uint8_t value = kmer.at2bit(0); // 0 (A) char nt = kmer.at(0) // 'A' // Hash using HType = KmerHashers<0>::Hasher<32>; HType hasher; uint64_t hash = hasher(kmer); // KmerHashers<0> is folly hash, KmerHashers<1> uses xxHash, you need to add #define WITH_XXHASH before include kmtricks and link with xxHash. } ``` -------------------------------- ### Conditional Installation of Project Files Source: https://github.com/tlemane/kmtricks/blob/master/thirdparty/gatb-core-stripped/CMakeLists.txt Installs project files like LICENCE and THIRDPARTIES.md, and the 'boost' directory from thirdparty, unless GATB_CORE_INSTALL_EXCLUDE is defined. ```cmake IF (NOT DEFINED GATB_CORE_INSTALL_EXCLUDE) INSTALL (FILES ${PROJECT_SOURCE_DIR}/LICENCE DESTINATION . OPTIONAL) INSTALL (FILES ${PROJECT_SOURCE_DIR}/THIRDPARTIES.md DESTINATION . OPTIONAL) INSTALL (DIRECTORY ${PROJECT_SOURCE_DIR}/thirdparty/boost DESTINATION ./include) ENDIF() ``` -------------------------------- ### Basic kmtricks Plugin Example Source: https://github.com/tlemane/kmtricks/wiki/Plugins A simple plugin that filters k-mers based on a count threshold. It overrides `process_kmer` to discard rows with abundances below the configured threshold and `configure` to set the threshold from a string argument. ```cpp #include // DMAX_C is a compile definition set by cmake using count_type = typename km::selectC::type; class BasicEx : public km::IMergePlugin { public: BasicEx() = default; private: unsigned int m_threshold {0}; // Override process_kmer // Discard lines which contain abundances less than a threshold bool process_kmer(const uint64_t* kmer_data, std::vector& count_vector) override { for (auto& c : count_vector) if (c < m_threshold) return false; return true; } // Override configure (not necessary if you don't need configuration) // The string is passed to kmtricks with --plugin-config // Here it's a simple example where the string is a threshold // It could be a path to a config file for instance void configure(const std::string& s) override { m_threshold = std::stoll(s); } }; // Make the plugin loadable extern "C" std::string plugin_name() { return "BasicEx"; } extern "C" int use_template() { return 0; } extern "C" km::IMergePlugin* create0() { return new BasicEx(); } extern "C" void destroy(km::IMergePlugin* p) { delete p; } ``` -------------------------------- ### Dump count matrix Source: https://github.com/tlemane/kmtricks/wiki/dump Example of dumping a count matrix file (.count), which typically represents k-mer counts across different samples or conditions. The output shows k-mers and their corresponding values in the matrix. ```bash > kmtricks dump --input ./km_dir/matrices/matrix_0.count AAAC 2 5 ... CCAT 3 0 ``` -------------------------------- ### Compile and Run kmtricks with a Plugin Source: https://context7.com/tlemane/kmtricks/llms.txt Compile your C++ plugin using the provided `install.sh` script. Then, run the `kmtricksp` pipeline, specifying the path to your compiled plugin and its configuration. ```bash # Compile plugins (using conda-installed kmtricks binary, 4 bytes/count) ./install.sh -c 4 -p -q # Run with plugin (kmtricksp = plugin-enabled binary) kmtricksp pipeline \ --file ./samples.fof \ --run-dir ./km_plugin \ --mode kmer:count:bin \ --hard-min 1 \ --plugin ./build/plugins/libmy_filter.so \ --plugin-config 5 \ --threads 16 ``` -------------------------------- ### Run kmtricks Docker Container with Plugin Support Source: https://github.com/tlemane/kmtricks/blob/master/docker/README.md Run the kmtricks Docker container with the plugin entrypoint ('kmtricksp'), mounting a local directory to '/tmp'. Use this command when you need to utilize kmtricks plugins. ```bash docker run --rm -v $PWD/SHARED:/tmp --entrypoint kmtricksp kmtricks-d ``` -------------------------------- ### Build kmtricks from Source Source: https://context7.com/tlemane/kmtricks/llms.txt Clones the kmtricks repository and builds the tool. Supports custom build configurations for different k-mer sizes, count types, threads, and optimization levels. ```bash # Clone with submodules git clone --recursive https://github.com/tlemane/kmtricks # Default build (Release, k up to 128, 4 bytes/count, run tests) ./install.sh # Custom build: Debug mode, k up to 256, 2 bytes/count, 16 threads, no native opt ./install.sh -r Debug -k "32 64 128 256" -c 2 -j 16 -n # Use conda-provided compilers (no system gcc/clang required) ./install.sh -e ``` -------------------------------- ### Dump kmer counts Source: https://github.com/tlemane/kmtricks/wiki/dump Example of dumping the contents of a .kmer file, showing k-mer sequences and their associated counts. This is useful for inspecting raw k-mer data. ```bash > kmtricks dump --input ./km_dir/counts/partition_0/D1.kmer AAAC 2 ... CCAT 3 ``` -------------------------------- ### Build kmtricks Docker Image Source: https://github.com/tlemane/kmtricks/blob/master/docker/README.md Clone the repository and build the Docker image using the provided Dockerfile. This command tags the image as 'kmtricks-d'. ```bash git clone https://github.com/tlemane/kmtricks cd kmtricks/docker docker build -f Dockerfile -t kmtricks-d . ``` -------------------------------- ### Run kmtricksp with a Plugin Source: https://github.com/tlemane/kmtricks/wiki/Plugins Executes the kmtricksp binary (kmtricks with plugin support) using a compiled plugin. The --plugin-config argument is used to pass configuration to the plugin. ```bash # To be consistent with the examples, an integer is used for --plugin-config # In real case, it would be something like a path to a config file kmtricksp --plugin build/plugins/lib.so --plugin-config 12 ``` -------------------------------- ### Dump histogram data Source: https://github.com/tlemane/kmtricks/wiki/dump Example of dumping a histogram file (.hist). This output includes metadata like lower and upper bounds for counts, out-of-bounds sums, and the distribution of counts within the specified range. ```bash > kmtricks dump --input ./km_dir/histograms/D1.hist @LOWER=1 @UPPER=255 @OOB_L=0 @OOB_U=2452 1 150 2 80 ... 255 42 ``` -------------------------------- ### Open PAMatrixFileMerger Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Initializes a PAMatrixFileMerger to combine multiple presence/absence matrix files. Requires a list of file paths and the k-mer size. ```cpp std::vector paths {"./km_dir/counts/matrices/matrix_0.pa", "./km_dir/counts/matrices/matrix_1.pa"}; PAMatrixFileMerger pmfm(paths, kmer_size); ``` -------------------------------- ### Template Plugin Implementation in C++ Source: https://github.com/tlemane/kmtricks/wiki/Plugins Defines a template plugin for kmtricks that allows custom k-mer processing. It includes configuration, k-mer size setting, and a custom processing logic based on a threshold and the k-mer's starting character. ```cpp #include // Same as BasicEx using count_type = typename km::selectC::type; template class TemplateEx : public km::IMergePlugin { public: TemplateEx() = default; private: unsigned int m_threshold {0}; // Declare a k-mer km::Kmer m_kmer; public: // same as BasicEx void configure(const std::string& s) { m_threshold = std::stoll(s); } // Override set_kmer_size to pass the k-mer size to m_kmer void set_kmer_size(size_t kmer_size) override { this->m_kmer_size = kmer_size; m_kmer.set_k(this->m_kmer_size); } // Override process_kmer // Discard lines which contain abundances less than a threshold if the k-mer starts with 'A' bool process_kmer(const uint64_t* kmer_data, std::vector& count_vector) override { m_kmer.set64_p(kmer_data); if (m_kmer.at(0) == 'A') { for (auto& c : count_vector) { if (c < m_threshold) { return false; } } } return true; } }; // Make the plugin loadable extern "C" std::string plugin_name() { return "TemplateEx"; } extern "C" int use_template() { return 1; } extern "C" km::IMergePlugin* create32() { return new TemplateEx<32>(); } // call if --kmer-size < 32 extern "C" km::IMergePlugin* create64() { return new TemplateEx<64>(); } // call if --kmer-size < 64 extern "C" km::IMergePlugin* create512() { return new TemplateEx<512>(); } // call if --kmer-size < 512 // With create32, create64 and create512, the plugin supports k-mer size in [8, 64) and [480, 512) ``` -------------------------------- ### Open a KmerReader Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Initializes a KmerReader to read a counted k-mer file. Specify the buffer size and the file path. ```cpp KmerReader reader("./km_dir/counts/partition_0/D1.kmer"); ``` -------------------------------- ### Open MatrixFileAggregator Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Initializes a MatrixFileAggregator to combine multiple count matrix files. Provide a list of file paths and the k-mer size. ```cpp std::vector paths {"./km_dir/counts/matrices/matrix_0.count", "./km_dir/counts/matrices/matrix_1.count"}; MatrixFileAggregator mfa(paths, kmer_size); ``` -------------------------------- ### kmtricks dump command usage Source: https://github.com/tlemane/kmtricks/wiki/dump Displays the usage instructions and available options for the `kmtricks dump` command. This includes required arguments like `--run-dir` and `--input`, as well as optional flags for output, threads, and verbosity. ```bash kmtricks dump v1.0.0 DESCRIPTION Dump kmtricks's files in human readable format. USAGE kmtricks dump --run-dir --input [-o/--output ] [-t/--threads ] [-v/--verbose ] [-h/--help] [--version] OPTIONS [global] --run-dir - kmtricks runtime directory --input - path to file. -o --output - output file. {stdout} [common] -t --threads - number of threads. {8} -h --help - show this message and exit. [⚑] --version - show version and exit. [⚑] -v --verbose - verbosity level [debug|info|warning|error]. {info} ``` -------------------------------- ### Open PAMatrixFileAggregator Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Initializes a PAMatrixFileAggregator to combine multiple presence/absence matrix files. Provide a list of file paths and the k-mer size. ```cpp std::vector paths {"./km_dir/counts/matrices/matrix_0.pa", "./km_dir/counts/matrices/matrix_1.pa"}; PAMatrixFileAggregator pmfa(paths, ); ``` -------------------------------- ### Load Repartition and Perform Partition-Aware Hashing in C++ Source: https://github.com/tlemane/kmtricks/wiki/repartition-api This C++ snippet demonstrates loading a pre-computed repartition and performing partition-aware hashing. It requires the repartition file and hash information. ```cpp #include using namespace km; int main(int argc, char* argv[]) { Repartition repart("kmtricks_dir/repartition_gatb/repartition.minimRepart"); Kmer<32> kmer("ACGTACGTACGT"); uint8_t minim_size = 4; uint32_t partition = repart.get_partition(kmer.minimizer(minim_size).value()); // Partition-aware hashing using WHType = KmerHashers<0>::WinHasher<32>; HashWindow hw("kmtricks_dir/hash.info");; WHType whasher(partition, hw.get_window_size_bits()); uint64_t whash = whasher(kmer); return 0; return 0; } ``` -------------------------------- ### kmtricks superk Usage Source: https://github.com/tlemane/kmtricks/wiki/superk Displays the usage instructions and available options for the `kmtricks superk` command. Use this to understand the command's parameters and their functions. ```bash kmtricks superk v1.0.0 DESCRIPTION Compute super-k-mers. USAGE kmtricks superk --run-dir --id [--restrict-to-list ] [-t/--threads ] [-v/--verbose ] [--cpr] [-h/--help] [--version] OPTIONS [global] --run-dir - kmtricks runtime directory. --id - sample ID, as define in the input fof. --restrict-to-list - process only some partitions, comma separated. --cpr - output compressed super-k-mers. [⚑] [common] -t --threads - number of threads. {8} -h --help - show this message and exit. [⚑] --version - show version and exit. [⚑] -v --verbose - verbosity level [debug|info|warning|error]. {info} ``` -------------------------------- ### Open KmerFileAggregator Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Initializes a KmerFileAggregator to combine multiple counted k-mer files. Provide a list of file paths and the k-mer size. ```cpp uint32_t kmer_size = 32; std::vector paths {"./km_dir/counts/partition_0/D1.kmer", "./km_dir/counts/partition_1/D1.kmer"}; KmerFileAggregator agg(paths, kmer_size); ``` -------------------------------- ### Initialize Hash Matrix and PA-Hash Matrix Readers C++ Source: https://context7.com/tlemane/kmtricks/llms.txt Initializes readers for hash matrix and PA-hash matrix files. Requires `WITH_KM_IO` and `kmtricks/public.hpp`. ```cpp #define WITH_KM_IO #include using namespace km; // Hash matrix and PA-hash matrix readers HashMatrixReader<4096> hmr("./km_run/matrices/matrix_0.count_hash"); PAHashMatrixReader<4096> phmr("./km_run/matrices/matrix_0.pa_hash"); ``` -------------------------------- ### Run kmtricks Docker Container Source: https://github.com/tlemane/kmtricks/blob/master/docker/README.md Execute the kmtricks Docker container, mounting a local directory to '/tmp' for shared data. Replace '' with your desired command-line arguments for kmtricks. ```bash docker run --rm -v $PWD/SHARED:/tmp kmtricks-d ``` -------------------------------- ### Inspect Individual Partition Files Source: https://context7.com/tlemane/kmtricks/llms.txt Use 'kmtricks dump' to convert kmtricks binary files into human-readable text. This command operates at the partition level. ```bash # Dump k-mer counts for sample D1, partition 0 kmtricks dump --run-dir ./km_run --input ./km_run/counts/partition_0/D1.kmer ``` ```bash # Dump a count matrix partition kmtricks dump --run-dir ./km_run --input ./km_run/matrices/matrix_0.count ``` ```bash # Dump a k-mer histogram kmtricks dump --run-dir ./km_run --input ./km_run/histograms/D1.hist ``` ```bash # Write dump to a file kmtricks dump --run-dir ./km_run --input ./km_run/counts/partition_0/D1.kmer \ --output ./D1_partition0.txt ``` -------------------------------- ### Stream and Dump K-mer Count Files Source: https://context7.com/tlemane/kmtricks/llms.txt Provides methods for streaming individual k-mer count partitions, dumping them as text, aggregating multiple partitions, and merging them into a globally sorted stream or binary file. Requires WITH_KM_IO. ```cpp #define WITH_KM_IO #include using namespace km; // --- Stream a single partition --- KmerReader<4096> reader("./km_run/counts/partition_0/D1.kmer"); uint32_t k = reader.infos().kmer_size; Kmer<32> kmer; kmer.set_k(k); count_type count; // uint8_t / uint16_t / uint32_t depending on build while (reader.read<32, 4>(kmer, count)) std::cout << kmer.to_string() << " " << (int)count << "\n"; // --- Dump as text shortcut --- KmerReader<4096> r2("./km_run/counts/partition_0/D1.kmer"); r2.write_as_text(std::cout); // --- Aggregate multiple partitions (not globally sorted) --- std::vector parts = { "./km_run/counts/partition_0/D1.kmer", "./km_run/counts/partition_1/D1.kmer" }; KmerFileAggregator<32, 4> agg(parts, k); agg.write_as_text(std::cout); // --- Merge partitions into a globally sorted stream --- KmerFileMerger<32, 4> merger(parts, k); while (merger.next()) std::cout << merger.current().to_string() << " " << (int)merger.count() << "\n"; bool lz4 = true; merger.write_as_bin("./D1.sorted.kmer", lz4); ``` -------------------------------- ### Runtime K-mer Implementation Selection with Executor API Source: https://github.com/tlemane/kmtricks/wiki/executor-api Demonstrates how to use the `const_loop_executor` to select the best k-mer implementation at runtime. The `KMER_LIST` macro defines available k-mer sizes, and `KMER_N` specifies the count. The `exec` method takes the functor and its parameters, dynamically choosing the appropriate `Kmer` specialization. ```cpp #define KMER_LIST 32, 64, 96, 128 #define KMER_N 4 // size of KMER_LIST #include using namespace km; template struct MyFunctor { void operator()(int an_int, float a_float, const std::string& a_string) { std::cout << "Use " << Kmer::name() << std::endl; } }; int main(int argc, char* argv[]) { uint32_t kmer_size = 20; // first param is the k-mer size for the executor, others are MyFunctor::operator() parameters const_loop_executor<0, KMER_N>::exec(kmer_size, 0, 0.0, ""); // print "Use Kmer<32> - uint64_t" kmer_size = 40; const_loop_executor<0, KMER_N>::exec(kmer_size, 0, 0.0, ""); // print "Use Kmer<64> - __uint128_t" if available, else "Use Kmer<64> - uint64_t[2]" kmer_size = 80; const_loop_executor<0, KMER_N>::exec(kmer_size, 0, 0.0, ""); // print "Use Kmer<96> - uint64_t[3]" kmer_size = 150; const_loop_executor<0, KMER_N>::exec(kmer_size, 0, 0.0, ""); // throw, needs to increase KMER_LIST return 0; } ``` -------------------------------- ### Count K-mers with kmtricks pipeline Source: https://github.com/tlemane/kmtricks/wiki/merge-api This command initiates the k-mer counting process using the kmtricks pipeline. It specifies input files, output directory, number of partitions, k-mer size, minimum abundance, and the desired output mode until the counting stage. ```bash kmtricks pipeline --file kmtricks.fof --run-dir kmtricks_dir --nb-partitions 4 --kmer-size 20 --count-abundance-min 1 --mode kmer:count:bin --until count ``` -------------------------------- ### Compute Repartition with kmtricks Source: https://github.com/tlemane/kmtricks/wiki/repartition-api Use this command to compute a minimizer repartition. Ensure you have the necessary input file and specify the k-mer size. ```bash kmtricks repart --file kmtricks.fof --run-dir kmtricks_dir --kmer-size 12 ``` -------------------------------- ### Define Test Executable and Link Libraries Source: https://github.com/tlemane/kmtricks/blob/master/tests/CMakeLists.txt Configures the main test executable, including finding test files, setting runtime output directory, defining compile flags, and linking dependencies. ```cmake file(GLOB_RECURSE TEST_FILES "*_test.cpp") set (CMAKE_RUNTIME_OUTPUT_DIRECTORY ${PROJECT_SOURCE_DIR}/tests/) add_executable(${PROJECT_NAME}-tests ${TEST_FILES}) target_compile_definitions(${PROJECT_NAME}-tests PRIVATE DMAX_C=${MAX_C}) target_link_libraries(${PROJECT_NAME}-tests PRIVATE build_type_flags headers links deps) ``` -------------------------------- ### Open MatrixFileMerger Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Initializes a MatrixFileMerger to merge multiple count matrix files into a single sorted output. Provide file paths and k-mer size. ```cpp std::vector paths {"./km_dir/counts/matrices/matrix_0.count", "./km_dir/counts/matrices/matrix_1.count"}; MatrixFileMerger mfm(paths, kmer_size); ``` -------------------------------- ### kmtricks pipeline: K-mer Count Matrix Source: https://context7.com/tlemane/kmtricks/llms.txt Runs the full kmtricks pipeline to generate a compressed binary k-mer abundance matrix. Enables k-mer rescue and specifies various minimum abundance thresholds. ```bash # K-mer count matrix (binary, compressed) with k-mer rescue enabled kmtricks pipeline \ --file ./samples.fof \ --run-dir ./km_run \ --kmer-size 31 \ --hard-min 1 \ --soft-min 3 \ --share-min 1 \ --recurrence-min 2 \ --mode kmer:count:bin \ --cpr \ --threads 16 ``` -------------------------------- ### Run kmtricks with a Plugin Source: https://github.com/tlemane/kmtricks/wiki/Plugins Executes the kmtricks binary with a compiled plugin. This command is used when both kmtricks and the plugin have been compiled together. ```bash ./bin/kmtricks --plugin build/plugins/lib.so --plugin-config 12 ``` -------------------------------- ### Compile Plugins and kmtricks Together Source: https://github.com/tlemane/kmtricks/wiki/Plugins Compiles both the plugins and the kmtricks binary. This is used when building kmtricks from source with plugin support enabled. ```bash ./install.sh -p ``` -------------------------------- ### Build HowDeSBT Index Source: https://github.com/tlemane/kmtricks/wiki/Index-example Constructs a HowDeSBT index from the previously generated Bloom filters. The `--howde` flag enables the determined brief tree structure. ```bash kmtricks index --run-dir ./index_example --howde ``` -------------------------------- ### kmtricks combine Usage Source: https://github.com/tlemane/kmtricks/wiki/combine Displays the usage information for the `kmtricks combine` command, including available options and their descriptions. ```bash kmtricks combine v1.4.0 DESCRIPTION Combine kmtricks's matrices (support kmer/hash matrices). USAGE kmtricks combine --fof --output [-t/--threads ] [-v/--verbose ] [--cpr] [-h/--help] [--version] OPTIONS [global] --fof - input fof, one kmtricks run per line. --output - output directory. --cpr - compress output. [⚑] [common] -t --threads - number of threads. {12} -h --help - show this message and exit. [⚑] --version - show version and exit. [⚑] -v --verbose - verbosity level [debug|info|warning|error]. {info} ``` -------------------------------- ### Generate File of Files (fof) from Folder Source: https://context7.com/tlemane/kmtricks/llms.txt Automatically generates a 'file-of-files' (fof) input format from a directory containing FASTQ files. Each line in the fof represents a sample. ```bash ls -1 /data/samples/*.fastq.gz | sort | awk '{print ++i" : "$1}' > samples.fof ``` -------------------------------- ### Build K-mer Count Matrix Source: https://github.com/tlemane/kmtricks/wiki/Membership-matrix-example Use the `kmtricks pipeline` command to build a k-mer count matrix. Specify the file of sample lists, the run directory, the desired modes, a hard minimum k-mer count, and enable LZ4 compression. ```bash kmtricks pipeline --file ./data/kmtricks.fof \ --run-dir ./membership_example \ --mode kmer:pa:bin \ --hard-min 2 \ --lz4 ``` -------------------------------- ### kmtricks pipeline Usage Source: https://github.com/tlemane/kmtricks/wiki/kmtricks-pipeline This command-line interface provides a comprehensive set of options for running the kmtricks pipeline. Use --until to control the execution flow. ```bash kmtricks pipeline v1.4.0 DESCRIPTION kmtricks pipeline (run all the steps, repart -> superk -> count -> merge) USAGE kmtricks pipeline --file --run-dir [--kmer-size ] [--hard-min ] [--mode ] [--repart-from ] [--soft-min ] [--recurrence-min ] [--share-min ] [--until ] [--minimizer-size ] [--minimizer-type ] [--repartition-type ] [--nb-partitions ] [--restrict-to ] [--restrict-to-list ] [--focus ] [--bloom-size ] [--bf-format ] [--bitw ] [-t/--threads ] [-v/--verbose ] [--hist] [--kff-output] [--keep-tmp] [--skip-merge] [--cpr] [-h/--help] [--version] OPTIONS [global] --file - kmtricks input file, see README.md. --run-dir - kmtricks runtime directory. --kmer-size - size of a k-mer. [8, 127]. {31} --hard-min - min abundance to keep a k-mer. {2} --mode - matrix mode , see README {kmer:count:bin} --hist - compute k-mer histograms. [⚑] --kff-output - output counted k-mers in kff format (only with --until count). [⚑] --keep-tmp - keep tmp files. [⚑] --repart-from - use repartition from another kmtricks run. [merge options] --soft-min - during merge, min abundance to keep a k-mer, see README. {1} --recurrence-min - min recurrence to keep a k-mer. {1} --share-min - save a non-solid k-mer if it is solid in N other samples. {0} [pipeline control] --until - run until [all|repart|superk|count|merge] {all} [advanced performance tweaks] --minimizer-size - size of minimizers. [4, 15] {10} --minimizer-type - minimizer type (0=lexi, 1=freq). {0} --repartition-type - minimizer repartition (0=unordered, 1=ordered). {0} --nb-partitions - number of partitions (0=auto). {0} --restrict-to - Process only a fraction of partitions. [0.05, 1.0] {1.0} --restrict-to-list - Process only some partitions, comma separated. --focus - 0: focus on disk usage, 1: focus on speed. [0.0, 1.0] {0.5} --cpr - compression for kmtricks's tmp files. [⚑] [hash mode configuration] --bloom-size - bloom filter size {10000000} --bf-format - bloom filter format. [howdesbt|sdsl] {howdesbt} --bitw - entry width of cbf, with --mode hash:bfc:bin {2} [common] -t --threads - number of threads. {12} -h --help - show this message and exit. [⚑] --version - show version and exit. [⚑] -v --verbose - verbosity level [debug|info|warning|error]. {info} ``` -------------------------------- ### With IO Support Source: https://github.com/tlemane/kmtricks/wiki/kmtricks-API Enables I/O operations by defining `WITH_KM_IO`. This requires linking with lz4 and TurboPFor libraries and provides various readers and writers for different data formats. ```APIDOC ## With -> #define WITH_KM_IO, requires link with lz4 and TurboPFor (see [IOs](https://github.com/tlemane/kmtricks/wiki/io-api)) * `km::KmerMerger`, `km::HashMerger` (see [Merge](https://github.com/tlemane/kmtricks/wiki/merge)) * `km::KmerReader`, `km::KmerWriter` * `km::HashReader`, `km::HashWriter` * `km::MatrixReader`, `km::MatrixWriter` * `km::PAMatrixReader`, `km::PAMatrixWriter` * `km::HashMatrixReader`, `km::HashMatrixWriter` * `km::PAHashMatrixReader`, `km::PAHashMatrixWriter` * `km::BitVectorReader`, `km::BitVectorWriter` * `km::VectorMatrixReader`, `km::VectorMatrixWriter` * `km::HistReader`, `km::HistWriter` ``` -------------------------------- ### GATB-Core Include Directories Source: https://github.com/tlemane/kmtricks/blob/master/thirdparty/gatb-core-stripped/CMakeLists.txt Sets the include directories for linking GATB-Core binaries. This includes ZLIB, project-specific paths, and extra include paths defined by EXTRALIBS_INC. ```cmake set (gatb-core-includes ${ZLIB_INCLUDE_DIR} ${PROJECT_BINARY_DIR}/include ${PROJECT_BINARY_DIR}/thirdparty/LZ4/src/LZ4/lib ${PROJECT_BINARY_DIR}/include/${CMAKE_BUILD_TYPE} ${PROJECT_SOURCE_DIR}/src ${PROJECT_SOURCE_DIR}/thirdparty ${gatb-core-extra-libraries-inc}) ``` -------------------------------- ### Open KmerFileMerger Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Initializes a KmerFileMerger to merge multiple counted k-mer files into a single sorted output. Provide file paths and k-mer size. ```cpp uint32_t kmer_size = 32; std::vector paths {"./km_dir/counts/partition_0/D1.kmer", "./km_dir/counts/partition_1/D1.kmer"}; KmerFileMerger merger(paths, kmer_size); ``` -------------------------------- ### Open MatrixReader Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Initializes a MatrixReader to read a count matrix file. Specify the buffer size and the file path. ```cpp MatrixReader reader("./km_dir/counts/matrices/matrix_0.count"); ``` -------------------------------- ### Repartition API Source: https://context7.com/tlemane/kmtricks/llms.txt This section explains how to use the Repartition API to load a pre-computed minimizer repartition and assign k-mers to partitions or compute partition-aware hashes. ```APIDOC ## Load repartition ```cpp #include using namespace km; // Compute the repartition first: // kmtricks repart --file samples.fof --run-dir km_run --kmer-size 31 int main() { // Load repartition Repartition repart("./km_run/repartition_gatb/repartition.minimRepart"); Kmer<32> kmer("ACGTACGTACGTACGT"); uint8_t minim_size = 10; // Which partition does this k-mer belong to? uint32_t partition = repart.get_partition(kmer.minimizer(minim_size).value()); std::cout << "Partition: " << partition << "\n"; // Partition-aware windowed hash (used internally by kmtricks merge) HashWindow hw("./km_run/hash.info"); using WH = KmerHashers<0>::WinHasher<32>; WH whasher(partition, hw.get_window_size_bits()); uint64_t whash = whasher(kmer); std::cout << "Windowed hash: " << whash << "\n"; } ``` ``` -------------------------------- ### Minimal Include (Header-Only) Source: https://github.com/tlemane/kmtricks/wiki/kmtricks-API This section lists the core components available when only including the public header without specific feature defines. These are fundamental building blocks for k-mer operations. ```APIDOC ## Minimal Include (Header-Only) * `km::Kmer` (see [Kmer](https://github.com/tlemane/kmtricks/wiki/k-mer-api)) * `km::KmerHashers<0>::Hasher` (see [Kmer](https://github.com/tlemane/kmtricks/wiki/k-mer-api)) * `km::KmerHashers<0>::WinHasher` (see [Repartition](https://github.com/tlemane/kmtricks/wiki/repartition-api)) * `km::HashWindow` (see [Repartition](https://github.com/tlemane/kmtricks/wiki/repartition-api)) * `km::KHist` (see [example](https://github.com/tlemane/kmtricks/wiki/Real-life-example)) * `km::Repartition` (see [Repartition](https://github.com/tlemane/kmtricks/wiki/repartition-api)) * `km::const_loop_executor<0, KMER_LIST_SIZE>` (see [Executor](https://github.com/tlemane/kmtricks/wiki/executor-api)) * `km::ITask`, `km::TaskPool` (see [Task](https://github.com/tlemane/kmtricks/wiki/Real-life-example)) * `km::Timer` (see [example](https://github.com/tlemane/kmtricks/wiki/Real-life-example)) ``` -------------------------------- ### Open PAMatrixReader Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Initializes a PAMatrixReader to read a presence/absence matrix file. Specify the buffer size and the file path. ```cpp PAMatrixReader reader("./km_dir/counts/matrices/matrix_0.pa"); ``` -------------------------------- ### kmtricks Build Script Usage Source: https://github.com/tlemane/kmtricks/wiki/Installation This script is used to build kmtricks from source. It supports various options for customization, including build type, k-mer sizes, testing, threading, and interface support. ```bash kmtricks build script. Usage: ./install.sh [-r str] [-k LIST[int]] [-t int] [-c int] [-j int] [-w] [-o] [-m] [-s] [-n] [-e] [-h] Options: -r -> build type {Release}. -k -> k-mer size {"32 64 96 128"}. -t <0|1|2> -> tests: 0 = disabled, 1 = compile, 2 = compile and run {2}. -c <1|2|4> -> byte per count {4}. -j -> nb threads {8}. -o -> build socks interface {disabled} -w -> build with howdesbt support {disabled} -m -> build all modules {disabled} -s -> static build {disabled}. -n -> disable -march=native {enabled} -e -> use conda to install compilers/dependencies {disabled} -h -> show help. ``` -------------------------------- ### Read a counted k-mer file Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Demonstrates how to open a k-mer reader and stream its content or dump it as text. ```APIDOC ## Read a counted k-mer file ### Description Opens a k-mer reader for a specified file and allows streaming its content or dumping it as text. ### Usage 1. Open a KmerReader: ```cpp KmerReader reader("./km_dir/counts/partition_0/D1.kmer"); ``` 2. Stream k-mers and counts: ```cpp uint32_t kmer_size = reader.infos().kmer_size; Kmer kmer; kmer.set_k(kmer_size); count_type count; while (reader.read(kmer, count)) { std::cout << kmer.to_string() << " " << std::to_string(count) << "\n"; } ``` 3. Dump as text: ```cpp reader.write_as_text(std::cout); ``` ``` -------------------------------- ### Dump Matrix Files to Binary Source: https://github.com/tlemane/kmtricks/wiki/IOs-API Compresses and writes matrix data to a single binary file. Supports LZ4 compression. ```cpp bool lz4_compress = true; pmfa.write_as_bin("./matrix.pa", lz4_compress); ``` -------------------------------- ### Define Input Files for K-mer Matrix Source: https://github.com/tlemane/kmtricks/wiki/Count-matrix-example Specify the input FASTA files for each sample in a file list format. ```bash data ├── 1.fasta ├── 2.fasta └── kmtricks.fof ``` ```bash > cat data/kmtricks.fof D1: data/1.fasta D2: data/2.fasta ``` -------------------------------- ### Stream and Merge Count Matrix Files Source: https://context7.com/tlemane/kmtricks/llms.txt Enables streaming individual count matrix partitions and merging multiple partitions into a globally sorted output stream or binary file. Requires WITH_KM_IO. ```cpp #define WITH_KM_IO #include using namespace km; // --- Stream a count matrix partition --- MatrixReader<4096> reader("./km_run/matrices/matrix_0.count"); uint32_t k = reader.infos().kmer_size; Kmer<32> kmer; kmer.set_k(k); std::vector counts(reader.infos().nb_counts); while (reader.read<32, 4>(kmer, counts)) { std::cout << kmer.to_string(); for (auto c : counts) std::cout << " " << (int)c; std::cout << "\n"; } // --- Merge multiple matrix partitions into globally sorted output --- std::vector parts = { "./km_run/matrices/matrix_0.count", "./km_run/matrices/matrix_1.count" }; MatrixFileMerger<32, 4> mfm(parts, k); while (mfm.next()) { std::cout << mfm.current().to_string(); for (auto c : mfm.counts()) std::cout << " " << (int)c; std::cout << "\n"; } bool lz4 = true; mfm.write_as_bin("./matrix.sorted.count", lz4); ``` -------------------------------- ### Build K-mer Count Matrix Source: https://github.com/tlemane/kmtricks/wiki/Count-matrix-example Construct a k-mer count matrix from specified input files using the pipeline command. The `--hard-min`, `--soft-min`, and `--share-min` parameters control k-mer filtering during counting and merging. ```bash kmtricks pipeline --file ./data/kmtricks.fof \ --run-dir ./matrix_example \ --mode kmer:count:bin \ --hard-min 1 \ --soft-min 3 \ --share-min 1 \ --cpr ```