### Install System Dependencies for Kmdiff (Ubuntu/Debian) Source: https://context7.com/tlemane/kmdiff/llms.txt Installs necessary system libraries required for Kmdiff on Ubuntu/Debian-based systems. ```bash sudo apt-get install \ libgsl-dev libopenblas-dev liblapacke-dev \ libbz2-dev zlib1g-dev zlib1g ``` -------------------------------- ### Install Kmdiff with Default and Custom Options Source: https://context7.com/tlemane/kmdiff/llms.txt Demonstrates default installation and custom builds with options for build type, k-mer sizes, parallel jobs, and dependency management. ```bash ./install.sh ``` ```bash ./install.sh -r Debug -k "32 64" -s 0 -p -j 16 ``` ```bash ./install.sh -e -j 8 ``` -------------------------------- ### Build kmdiff from source Source: https://context7.com/tlemane/kmdiff/llms.txt Clone the kmdiff repository, including submodules, and navigate into the directory to build from source using the provided install script. ```bash git clone --recursive https://github.com/tlemane/kmdiff cd kmdiff ``` -------------------------------- ### Install Build Dependencies on Fedora Source: https://github.com/tlemane/kmdiff/blob/main/README.md Install necessary build dependencies for kmdiff on Fedora systems using dnf. ```bash sudo dnf install openblas openblas-devel lapack lapack-devel gsl gsl-devel bzip2-devel ``` -------------------------------- ### IModel Helper Methods for Statistical Analysis Source: https://context7.com/tlemane/kmdiff/llms.txt Examples of static utility methods available within the `process` method of an `IModel` implementation for calculating statistics on k-mer counts. ```cpp // Inside a process() override, `base` = IModel // Arithmetic mean of a Range double m = base::mean(controls); // sum / n // Mean + count of samples with count > 0 auto [mean, positives] = base::mean_count(controls); // Sum + count of samples with count > 0 auto [sum, positives] = base::sum_count(cases); // Standard deviation double s = base::sd(controls); // sqrt(E[x^2] - E[x]^2) ``` -------------------------------- ### Install Build Dependencies on Ubuntu/Debian Source: https://github.com/tlemane/kmdiff/blob/main/README.md Install necessary build dependencies for kmdiff on Ubuntu or Debian-based systems using apt-get. ```bash sudo apt-get install libgsl-dev libopenblas-dev liblapacke-dev libbz2-dev zlib1g-dev zlib1g ``` -------------------------------- ### Install Build Dependencies on Arch Linux Source: https://github.com/tlemane/kmdiff/blob/main/README.md Install necessary build dependencies for kmdiff on Arch Linux using pacman. ```bash sudo pacman -S lapack lapacke openblas gsl bzip2 zlib ``` -------------------------------- ### Build kmdiff from Source Source: https://github.com/tlemane/kmdiff/blob/main/README.md Build kmdiff from source using the provided install script. Various build options are available to customize the compilation process. ```bash kmdiff build script. Usage: ./install.sh [-r str] [-k LIST[int]] [-t int] [-c int] [-j int] [-s int] [-p] [-e] [-d] [-h] Options: -r -> build type {Release}. -k -> k-mer size {"32 64 96 128"}. -t <0|1|2> -> tests: 0 = disabled, 1 = compile, 2 = compile and run {2}. -c <1|2|4> -> byte per count {4}. -j -> nb threads {8}. -s <0|1> -> population stratification correction 0 = disabled, 1 = enabled {1} (-s 1 requires GSL + lapacke + OpenBLAS) -p -> compile with plugins support {disabled} -e -> use conda to install compilers/dependencies {disabled} -d -> delete cmake cache {disabled} -h -> show help. ``` -------------------------------- ### Install kmdiff via Conda Source: https://github.com/tlemane/kmdiff/blob/main/README.md Use Conda to create an environment and install kmdiff. This is a convenient method for managing dependencies. ```bash conda create -p /kmdiff-env conda activate ./kmdiff-env conda install -c bioconda -c tlemane kmdiff ``` -------------------------------- ### Install Build Dependencies on macOS Source: https://github.com/tlemane/kmdiff/blob/main/README.md Install necessary build dependencies for kmdiff on macOS using Homebrew. ```bash brew install gsl lapack openblas bzip2 zlib ``` -------------------------------- ### Install kmdiff via Conda Source: https://context7.com/tlemane/kmdiff/llms.txt Install kmdiff and its dependencies into a new Conda environment using the provided commands. Activate the environment before using kmdiff. ```bash conda create -p ./kmdiff-env conda activate ./kmdiff-env conda install -c bioconda -c tlemane kmdiff ``` -------------------------------- ### Full End-to-End kmdiff Workflow Source: https://context7.com/tlemane/kmdiff/llms.txt A complete example of the differential k-mer analysis workflow, including preparing a file-of-files, counting k-mers, detecting differential k-mers, and inspecting the results. ```bash # 1. Prepare file-of-files (controls listed before cases) cat > fof.txt << 'EOF' CTRL0: /data/ctrl_0_R1.fastq.gz ; /data/ctrl_0_R2.fastq.gz CTRL1: /data/ctrl_1_R1.fastq.gz ; /data/ctrl_1_R2.fastq.gz CTRL2: /data/ctrl_2_R1.fastq.gz ; /data/ctrl_2_R2.fastq.gz CASE0: /data/case_0_R1.fastq.gz ; /data/case_0_R2.fastq.gz CASE1: /data/case_1_R1.fastq.gz ; /data/case_1_R2.fastq.gz CASE2: /data/case_2_R1.fastq.gz ; /data/case_2_R2.fastq.gz EOF # 2. Count k-mers (skip if kmtricks matrix already exists with --hist) kmdiff count \ --file fof.txt \ --run-dir km_run \ --kmer-size 31 \ --hard-min 2 \ --recurrence-min 1 \ --threads 16 # 3. Detect differentially represented k-mers kmdiff diff \ --km-run km_run \ -1 3 \ -2 3 \ --output-dir results \ --significance 0.05 \ --correction bonferroni \ --threads 16 # 4. Inspect results grep -c ">" results/case_kmers.fasta # number of case-enriched k-mers grep -c ">" results/control_kmers.fasta # number of control-enriched k-mers # 5. Example FASTA output head -4 results/case_kmers.fasta # >0|pvalue=1.2e-15|mean_ctrl=0.33|mean_case=18.6 # ACGTTAGCTAGCTAGCTAGCTAGCTAGCTAG # >1|pvalue=4.7e-12|mean_ctrl=0.00|mean_case=11.2 # TTAGCTAGCTAGCTAGCTAGCTAGCTAGCTA ``` -------------------------------- ### Get kmdiff build and environment diagnostics Source: https://context7.com/tlemane/kmdiff/llms.txt Run `kmdiff infos` to print detailed diagnostic information about the build environment, including host system, compiler versions, build flags, and third-party library versions. This output is useful for bug reports. ```bash kmdiff infos # - HOST - # build host: Linux-5.15.0-x86_64 # run host: Linux 5.15.0 x86_64 # - BUILD - # c compiler: gcc-11.3.0 # cxx compiler: g++-11.3.0 # conda: OFF # static: OFF # dev: OFF # popstrat: ON # plugin: OFF # kmer: 32,64,96,128 # max_c: 4294967295 # # - GIT SHA1 / VERSION - # kmdiff: a3f1c2d... # kmtricks: b9e0d7a... # ... ``` -------------------------------- ### Custom Statistical Model Plugin for Kmdiff Source: https://context7.com/tlemane/kmdiff/llms.txt Example C++ code for creating a custom statistical model plugin. Compile as a shared library and load at runtime using --cmodel. ```cpp // File: my_model.cpp // Compile: g++ -O3 -shared -fPIC -o libmy_model.so my_model.cpp \ // -I/path/to/kmdiff/include -std=c++17 #include #include #include template class MyModel : public kmdiff::IModel { using base = kmdiff::IModel; using count_type = typename base::count_type; public: MyModel() = default; virtual ~MyModel() {} // Called once with the string passed via --config (e.g., a config file path) void configure(const std::string& config) override { // parse config if needed; leave empty if unused } // Called for every k-mer; returns (p-value, significance, mean_controls, mean_cases) // kmdiff::Range supports const iteration and operator[] kmdiff::model_ret_t process( const kmdiff::Range& controls, const kmdiff::Range& cases) override { double mean_ctrl = base::mean(controls); double mean_case = base::mean(cases); // Example: trivial fold-change based p-value (replace with real test) double pvalue = (mean_case > 0 && mean_ctrl > 0) ? std::min(mean_ctrl / mean_case, mean_case / mean_ctrl) : 1.0; kmdiff::Significance sign = (mean_case > mean_ctrl) ? kmdiff::Significance::CASE : kmdiff::Significance::CONTROL; return std::make_tuple(pvalue, sign, mean_ctrl, mean_case); } }; // Required C linkage exports extern "C" std::string plugin_name() { return "MyModel"; } extern "C" MyModel* create8() { return new MyModel(); } extern "C" MyModel* create16() { return new MyModel(); } extern "C" MyModel* create32() { return new MyModel(); } ``` -------------------------------- ### Build and Use Kmdiff with Custom Model Plugin Source: https://context7.com/tlemane/kmdiff/llms.txt Commands to build Kmdiff with plugin support and then use a custom statistical model at runtime. ```bash # Build kmdiff with plugin support ./install.sh -p # Use the custom model at runtime ./kmdiff_build/bin/kmdiff diff \ --km-run kcount_dir \ -1 10 -2 10 \ --cmodel ./libmy_model.so \ --config /path/to/model_config.cfg \ --output-dir out_custom ``` -------------------------------- ### Compile and Run kmdiff with Plugin Source: https://github.com/tlemane/kmdiff/blob/main/plugins/README.md Compile the kmdiff project and then run the kmdiff executable with your custom plugin. The plugin is specified using the --cmodel flag. ```bash ./install.sh -p ./kmdiff_build/bin/kmdiff diff --cmodel ./kmdiff_build/plugins/libex_model.so --config plugin_config.cfg [kmdiff args ...] ``` -------------------------------- ### Configure Test Executable and Link Libraries Source: https://github.com/tlemane/kmdiff/blob/main/tests/CMakeLists.txt Sets the runtime output directory for the test executable and links necessary libraries. This ensures the test executable can be found and run with its dependencies. ```cmake set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/tests) add_executable(${PROJECT_NAME}-tests ${TEST_FILES}) target_link_libraries(${PROJECT_NAME}-tests ${PROJECT_NAME} headers tests) ``` -------------------------------- ### Define Test Files for Executable Source: https://github.com/tlemane/kmdiff/blob/main/tests/CMakeLists.txt Lists the C++ source files that will be compiled into the test executable. Ensure all test files are included here. ```cmake set(TEST_FILES "main_test.cpp" "accumulator_test.cpp" "linear_test.cpp" "corrector_test.cpp" "kmer_test.cpp" "kff_test.cpp" "factorial_test.cpp" "model_test.cpp" "utils_test.cpp" "merge_test.cpp") ``` -------------------------------- ### Write a Custom kmdiff Model Plugin Source: https://github.com/tlemane/kmdiff/blob/main/plugins/README.md Implement a custom model by inheriting from kmdiff::IModel. The process function is called for each k-mer and should return a tuple containing p-value, significance, mean controls, and mean cases. Ensure to export plugin_name and creation functions for different k-mer sizes. ```cpp #include template class ExModel : public kmdiff::IModel { using base = kmdiff::IModel; using count_type = typename base::count_type; public: ExModel() = default; virtual ~ExModel() {} // The string is passed to kmdiff with --config // It could be a path to a config file for instance void configure(const std::string& config) override {} // Called on each k-mer during matrix streaming // kmdiff::Range is a view on a vector slice, it supports // const iterations and const random access with the subscript operator // // This function returns a tuple (kmdiff::model_ret_t = std::tuple): // p-value // significance -> kmdiff::Significance::CONTROL or kmdiff::Significance::CASE // mean controls // mean cases kmdiff::model_ret_t process(const kmdiff::Range& controls, const kmdiff::Range& cases) override { double mean_controls = base::mean(controls); double mean_cases = base::mean(cases); double pvalue = 0.000000000001; return std::make_tuple(pvalue, kmdiff::Significance::CONTROL, mean_controls, mean_cases); } }; // Make the plugin loadable extern "C" std::string plugin_name() { return "ExModel"; } extern "C" ExModel* create8() { return new ExModel(); } extern "C" ExModel* create16() { return new ExModel(); } extern "C" ExModel* create32() { return new ExModel(); } ``` -------------------------------- ### Run kmdiff with Existing kmtricks Run Source: https://context7.com/tlemane/kmdiff/llms.txt Use this command to run kmdiff when a pre-existing kmtricks run (built with --hist) is available. Specify the path to the kmtricks run, output directory, and significance level. ```bash kmdiff diff \ --km-run /path/to/existing/kmtricks_run \ -1 50 \ -2 50 \ --output-dir results2 \ --significance 0.01 \ --kff-output \ --threads 32 ``` -------------------------------- ### Create Executable Target Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Defines an executable target, linking it with necessary headers and the project library. ```cmake add_executable(${PROJECT_NAME}-bin main.cc) target_link_libraries(${PROJECT_NAME}-bin PRIVATE headers) target_link_libraries(${PROJECT_NAME}-bin PRIVATE ${PROJECT_NAME}) ``` -------------------------------- ### Build Plugins with CMake Source: https://github.com/tlemane/kmdiff/blob/main/plugins/CMakeLists.txt This snippet finds all C++ source files for plugins and compiles them into shared libraries. It links against the 'headers' target. ```cmake file(GLOB_RECURSE PLUGIN_SOURCES "*.cpp") foreach(PLUGIN ${PLUGIN_SOURCES}) get_filename_component(PLUGIN_LIB ${PLUGIN} NAME_WLE) add_library(${PLUGIN_LIB} SHARED ${PLUGIN}) target_link_libraries(${PLUGIN_LIB} headers) endforeach() ``` -------------------------------- ### Apply Multiple-Testing Correction in Kmdiff Source: https://context7.com/tlemane/kmdiff/llms.txt Demonstrates using different multiple-testing correction methods (Bonferroni, Benjamini-Hochberg, Holm, Šidák) via the --correction flag in kmdiff diff. ```bash # Bonferroni (default): α_adjusted = α / N_tests — most conservative kmdiff diff --km-run kc_dir -1 10 -2 10 --correction bonferroni -s 0.05 # Benjamini-Hochberg: controls false discovery rate — less conservative kmdiff diff --km-run kc_dir -1 10 -2 10 --correction benjamini -s 0.05 # Holm (step-down Bonferroni): uniformly more powerful than Bonferroni kmdiff diff --km-run kc_dir -1 10 -2 10 --correction holm -s 0.05 # Šidák: α_adjusted = 1 - (1 - α)^(1/N) kmdiff diff --km-run kc_dir -1 10 -2 10 --correction sidak -s 0.05 ``` -------------------------------- ### Create Library Target Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Defines a library target for the project, linking it with specified flags and dependencies. ```cmake add_library(${PROJECT_NAME} ${CPP_FILES}) target_link_libraries(${PROJECT_NAME} PUBLIC warning_flags build_type_flags headers links deps) ``` -------------------------------- ### Aggregate Project Files Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Iterates through found source files and appends them to a PROJECT_FILES list for use in targets. ```cmake set(PROJECT_FILES "") foreach(project_file ${CPP_FILES_2}) list(APPEND PROJECT_FILES ${project_file}) endforeach() foreach(project_file ${HPP_FILES}) list(APPEND PROJECT_FILES ${project_file}) endforeach() ``` -------------------------------- ### Clone kmdiff Repository Source: https://github.com/tlemane/kmdiff/blob/main/plugins/README.md Clone the kmdiff repository including submodules. ```bash git clone --recursive https://github.com/tlemane/kmdiff ``` -------------------------------- ### Add Compile Definitions Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Adds preprocessor definitions to the build. KMD_USE_IRLS is always defined. KMD_DEV_MODE, WITH_POPSTRAT, and WITH_PLUGIN are defined based on corresponding CMake variables. ```cmake add_compile_definitions(KMD_USE_IRLS) ``` ```cmake if(DEV_BUILD) add_compile_definitions(KMD_DEV_MODE) endif() ``` ```cmake if(WITH_POPSTRAT) add_compile_definitions(WITH_POPSTRAT) endif() ``` ```cmake if(WITH_PLUGIN) add_compile_definitions(WITH_PLUGIN) endif() ``` -------------------------------- ### Basic kmdiff diff command Source: https://context7.com/tlemane/kmdiff/llms.txt Run kmdiff diff with raw p-value threshold only. No correction is applied. ```bash kmdiff diff --km-run kc_dir -1 10 -2 10 --correction disabled -s 0.05 ``` -------------------------------- ### Count k-mers with kmtricks Source: https://github.com/tlemane/kmdiff/blob/main/README.md Use 'kmdiff count' to count k-mers from input read files. Specify the file containing read paths and the output directory. Optional arguments control k-mer size, abundance thresholds, and performance. ```bash control1: /path/to/control1_read1.fastq ; /path/to/control1_read2.fastq control2: /path/to/control2_read1.fastq ; /path/to/control2_read2.fastq case1: /path/to/case1_read1.fastq ; /path/to/case1_read2.fastq case2: /path/to/case2_read1.fastq ; /path/to/case2_read2.fastq ``` ```bash kmdiff count v1.1.0 DESCRIPTION Count k-mers with kmtricks. USAGE kmdiff count -f/--file -d/--run-dir [-k/--kmer-size ] [-c/--hard-min ] [-r/--recurrence-min ] [--minimizer-type ] [--minimizer-size ] [--repartition-type ] [--nb-partitions ] [-t/--threads ] [-v/--verbose ] [-h/--help] [--version] OPTIONS [global] -f --file - fof that contains path of read files -d --run-dir - output directory. -k --kmer-size - size of k-mers [8, 127] {31} -c --hard-min - min abundance to keep a k-mer {1} -r --recurrence-min - min recurrence to keep a k-mer {1} [advanced performance tweaks] --minimizer-type - minimizer type (0=lexi, 1=freq) {0} --minimizer-size - size of minimizer {10} --repartition-type - minimizer repartition (0=unordered, 1=ordered) {0} --nb-partitions - number of partitions (0=auto) {0} [common] -t --threads - number of threads. {8} -h --help - show this message and exit. [⚑] --version - show version and exit. [⚑] -v --verbose - Verbosity level [debug|info|warning|error]. {info} ``` -------------------------------- ### Configure Static Build for Non-Apple Platforms Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Conditionally links the executable with '-static' flag when STATIC_BUILD is enabled and the platform is not Apple. ```cmake if (STATIC_BUILD AND NOT APPLE) target_link_libraries(${PROJECT_NAME}-bin PRIVATE -static) endif() ``` -------------------------------- ### Count 31-mers with Minimum Abundance Source: https://github.com/tlemane/kmdiff/blob/main/examples/README.md Use this command to count 31-mers from a file and discard those with an abundance less than 2. Specify the input file, output directory for counts, k-mer size, and the hard minimum abundance. ```bash kmdiff count --file fof.txt --run-dir kcount_dir --kmer-size 31 --hard-min 2 ``` -------------------------------- ### Full population stratification workflow with kmdiff Source: https://context7.com/tlemane/kmdiff/llms.txt Perform differential k-mer analysis with population stratification correction using logistic regression. This requires a gender file and specifies parameters for PCA and the number of principal components to use. ```bash # Gender file format: one sample per line, space-separated id and gender code cat gender.txt # CONTROL0 F # CONTROL1 M # CONTROL2 U # U = unknown # CASE0 F # CASE1 M # Full population stratification workflow kmdiff diff \ --km-run kcount_dir \ -1 100 \ -2 100 \ --output-dir out_ps \ --significance 0.05 \ --correction bonferroni \ --pop-correction \ --gender gender.txt \ --kmer-pca 0.001 \ --n-pc 4 \ --ploidy 2 \ --threads 16 # Outputs (same as without --pop-correction): # out_ps/control_kmers.fasta # out_ps/case_kmers.fasta ``` -------------------------------- ### Link Options for Non-Apple Platforms Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Conditionally applies '-export-dynamic' link option to the library target when the platform is not Apple. ```cmake if (NOT APPLE) target_link_options(${PROJECT_NAME} PUBLIC -export-dynamic) endif() ``` -------------------------------- ### Include ClangFormat Module Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Includes the ClangFormat module to enable code formatting checks within the build system. ```cmake include(ClangFormat) ``` -------------------------------- ### Add ClangFormat Target Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Creates a CMake target for code formatting using the included ClangFormat module and the aggregated project files. ```cmake add_format_project_target("${PROJECT_FILES}") ``` -------------------------------- ### kmdiff diff with cutoff for pre-filtering Source: https://context7.com/tlemane/kmdiff/llms.txt Use the --cutoff option to pre-filter k-mers by alpha/N before the final correction. This saves memory and time, especially for large k-mer counts. The default N is 100000, but can be increased. ```bash kmdiff diff --km-run kc_dir -1 10 -2 10 --correction bonferroni \ --cutoff 500000 -s 0.05 ``` -------------------------------- ### Set Executable Output Name Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Sets the output name of the executable target to 'kmdiff'. ```cmake set_target_properties(${PROJECT_NAME}-bin PROPERTIES OUTPUT_NAME kmdiff) ``` -------------------------------- ### Perform differential k-mer analysis with kmdiff diff Source: https://context7.com/tlemane/kmdiff/llms.txt Use `kmdiff diff` to identify significantly over- or under-represented k-mers. Specify the k-mer run directory, number of controls and cases, significance level, and desired multiple-testing correction method. Output can be FASTA or KFF format. Population stratification correction is available with additional parameters. ```bash # Basic run: 10 controls, 10 cases, significance 0.01, Bonferroni correction (default) kmdiff diff \ --km-run kcount_dir \ -1 10 \ -2 10 \ --output-dir out \ --significance 0.01 # Expected outputs: # out/control_kmers.fasta — k-mers enriched in controls # out/case_kmers.fasta — k-mers enriched in cases # # FASTA header format: # >kmer_id|pvalue=

|mean_control=|mean_case= # Benjamini-Hochberg FDR correction, KFF output format, in-memory mode kmdiff diff \ --km-run kcount_dir \ -1 20 \ -2 20 \ --output-dir out_bh \ --significance 0.05 \ --correction benjamini \ --kff-output \ --in-memory \ --threads 16 # With population stratification correction (requires GSL + OpenBLAS build) kmdiff diff \ --km-run kcount_dir \ -1 50 \ -2 50 \ --output-dir out_popstrat \ --significance 0.05 \ --pop-correction \ --gender gender.txt \ --kmer-pca 0.001 \ --n-pc 4 \ --ploidy 2 \ --threads 16 # Save significant k-mer matrix for downstream inspection kmdiff diff \ --km-run kcount_dir \ -1 10 -2 10 \ --output-dir out \ --save-sk # Dump the significant k-mer matrix as text with kmtricks kmtricks aggregate \ --run-dir out/positive_kmer_matrix \ --matrix kmer \ --cpr-in ``` -------------------------------- ### Perform differential k-mer analysis Source: https://github.com/tlemane/kmdiff/blob/main/README.md Use 'kmdiff diff' to analyze differential k-mers. This command aggregates k-mers and identifies significant ones. It requires the directory from a kmtricks run and the number of controls and cases. Options include significance thresholds, correction methods, and output formats. ```bash kmdiff diff v1.1.0 DESCRIPTION Differential k-mers analysis. USAGE kmdiff diff -d/--km-run

-1/--nb-controls -2/--nb-cases [-o/--output-dir ] [-s/--significance ] [-u/--cutoff ] [-c/--correction ] [--gender ] [--kmer-pca ] [--ploidy ] [--n-pc ] [-t/--threads ] [-v/--verbose ] [-f/--kff-output] [-m/--in-memory] [--keep-tmp] [--pop-correction] [-h/--help] [--version] OPTIONS [global] -d --km-run - kmtricks run directory. -o --output-dir - output directory. {./kmdiff_output} -1 --nb-controls - number of controls. -2 --nb-cases - number of cases. -s --significance - significance threshold. {0.05} -u --cutoff - Divide the significance threshold by N. Since a large number of k-mers are tested, k-mers with p-values too close to the significance threshold will not pass the last steps of correction. It allows to discard some k-mers a bit earlier and thus save space and time. {100000} -c --correction - significance correction. (bonferroni|benjamini|sidak|holm|disabled) {bonferroni} -f --kff-output - output significant k-mers in kff format. [⚑] -m --in-memory - in-memory correction. [⚑] --keep-tmp - keep tmp files. [⚑] --save-sk - build the matrix of significant k-mers. [⚑] [population stratification] --pop-correction - apply correction for population stratification. [⚑] --gender - gender file, one sample per line with the id and the gender (M,F,U), space-separated. --kmer-pca - proportion of k-mers used for PCA (in [0.0, 0.05]). {0.001} --ploidy - ploidy level. {2} --n-pc - number of principal components (in [2, 10]). {2} [common] -t --threads - number of threads. {8} -h --help - show this message and exit. [⚑] --version - show version and exit. [⚑] -v --verbose - Verbosity level [debug|info|warning|error]. {info} ``` -------------------------------- ### Glob Source Files Source: https://github.com/tlemane/kmdiff/blob/main/src/CMakeLists.txt Uses file(GLOB) to find C++ source files. The RELATIVE option ensures paths are relative to the project source directory. ```cmake file(GLOB CPP_FILES RELATIVE ${PROJECT_SOURCE_DIR}/src "*.cpp") ``` ```cmake file(GLOB CPP_FILES_2 RELATIVE ${PROJECT_SOURCE_DIR} "*.cpp") ``` -------------------------------- ### Detect Significant K-mers Source: https://github.com/tlemane/kmdiff/blob/main/examples/README.md This command detects significant k-mers from a k-mer count directory. It requires the k-mer run directory, the number of controls and cases, an output directory, and a significance threshold. ```bash kmdiff diff --km-run kcount_dir -1 10 -2 10 --output-dir out -s 0.01 ``` -------------------------------- ### Count k-mers with kmdiff count Source: https://context7.com/tlemane/kmdiff/llms.txt Use `kmdiff count` to generate a k-mer count matrix from sequencing samples. Ensure controls precede cases in the file-of-files (FOF). The `--hist` flag is automatically enabled for compatibility with `kmdiff diff`. ```bash # FOF format: one sample per line, controls first #