### Quick Start: Install Kalign Python with All Example Dependencies Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Install the Kalign Python package along with all its example-specific dependencies, including `psutil` for performance monitoring, as part of the quick start guide. ```bash pip install kalign[all] pip install psutil # For performance monitoring ``` -------------------------------- ### Install Kalign Python Library Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Installs the core Kalign library for Python using pip, the standard Python package installer. This is the most basic installation method. ```bash pip install kalign ``` -------------------------------- ### Verify Kalign Python Installation Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Checks the installed Kalign version and performs a quick alignment test with sample sequences to confirm the library is working correctly after installation. ```python import kalign print(f"Kalign version: {kalign.__version__}") # Quick test sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"] aligned = kalign.align(sequences) print("āœ… Kalign is working!") ``` -------------------------------- ### Quick Start: Run Kalign Python Basic Usage Example Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Execute the basic usage example script from the `python-examples` directory to quickly see Kalign's core functionalities in action. ```bash python python-examples/basic_usage.py ``` -------------------------------- ### Install Kalign Python with Ecosystem Integrations Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Installs Kalign along with optional dependencies for integration with other bioinformatics libraries like Biopython or scikit-bio, or all available integrations for extended functionality. ```bash pip install kalign[biopython] ``` ```bash pip install kalign[skbio] ``` ```bash pip install kalign[all] ``` -------------------------------- ### Example R Package Installation Command Source: https://github.com/timolassmann/kalign/blob/main/scripts/benchmark.org This R command provides an example of how to install a new R package, specifically 'tidyverse', using the 'install.packages()' function. This is a common way to add new libraries to an R environment. ```R install.packages("tidyverse") ``` -------------------------------- ### Quick Start: Run Kalign Python Ecosystem Integration Example Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Execute the ecosystem integration example script from the `python-examples` directory to explore Kalign's interoperability with other bioinformatics tools. ```bash python python-examples/ecosystem_integration.py ``` -------------------------------- ### Retrieve kalign Package Information and Function Documentation in Python Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md This Python code demonstrates how to access metadata for the installed `kalign` package, including its version, author, and contact details. It also illustrates the use of Python's `help()` function to display in-line documentation for specific `kalign` functions, such as `kalign.align` and `kalign.utils.alignment_stats`, directly from the console. ```python import kalign # Check version and info print(f"Kalign version: {kalign.__version__}") print(f"Author: {kalign.__author__}") print(f"Contact: {kalign.__email__}") # View help for specific functions help(kalign.align) help(kalign.utils.alignment_stats) ``` -------------------------------- ### Install Kalign Python Ecosystem Integration Dependencies Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Install optional Python packages such as Biopython, scikit-bio, Pandas, and Matplotlib required for the Kalign ecosystem integration examples. ```bash pip install kalign[all] # or individual packages: pip install biopython scikit-bio pandas matplotlib ``` -------------------------------- ### Quick Start: Run Kalign Python Performance Benchmarks Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Execute the performance benchmarks script from the `python-examples` directory to evaluate and optimize Kalign's performance on your specific system. ```bash python python-examples/performance_benchmarks.py ``` -------------------------------- ### Basic Sequence Analysis Workflow with kalign Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md This comprehensive function outlines a typical basic sequence analysis workflow using `kalign`. It covers performing the alignment, calculating various alignment statistics (length, gap fraction, conservation, identity) using `kalign.utils.alignment_stats`, and generating a consensus sequence with `kalign.utils.consensus_sequence`, providing a complete end-to-end example. ```python import kalign def analyze_sequences(sequences, name="analysis"): """Complete sequence analysis workflow.""" print(f"🧬 Analyzing {len(sequences)} sequences ({name})") # 1. Align sequences print(" Aligning...") aligned = kalign.align(sequences, seq_type="auto", n_threads=4) # 2. Calculate statistics print(" Calculating statistics...") stats = kalign.utils.alignment_stats(aligned) print(f" āœ… Alignment length: {stats['length']}") print(f" āœ… Gap fraction: {stats['gap_fraction']:.2%}") print(f" āœ… Conservation: {stats['conservation']:.2%}") print(f" āœ… Average identity: {stats['identity']:.2%}") # 3. Generate consensus consensus = kalign.utils.consensus_sequence(aligned, threshold=0.5) print(f" āœ… Consensus: {consensus}") return aligned, stats, consensus ``` -------------------------------- ### Install Kalign Python Performance Benchmarks Dependencies Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Install optional Python packages like `psutil` and `matplotlib` for memory monitoring and plotting within the Kalign performance benchmarks. ```bash pip install psutil matplotlib # For memory monitoring and plotting ``` -------------------------------- ### Kalign Installation Issue Reporting Template Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md Template for reporting problems encountered during Kalign installation, detailing system, installation method, error messages, and build tools. This helps in diagnosing environment-specific installation failures. ```Markdown **System**: [OS and version] **Installation method**: [pip/conda/source] **Error message**: [complete error output] **Build tools**: [cmake version, compiler version] ``` -------------------------------- ### Implement Complete File-Based Alignment Workflow Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Provides a comprehensive Python function demonstrating a full workflow for reading sequences from a file, performing alignment with optimal settings (auto-detection, multi-threading), and writing the results to multiple output file formats. ```python import kalign def align_file_workflow(input_file, output_prefix): """Complete file-based alignment workflow.""" # Read sequences print(f"Reading sequences from {input_file}...") sequences, ids = kalign.io.read_sequences(input_file) print(f"Found {len(sequences)} sequences") # Align with optimal settings print("Aligning sequences...") aligned = kalign.align( sequences, seq_type="auto", # Auto-detect n_threads=4 # Use 4 threads ) # Write in multiple formats print(f"Writing alignments to {output_prefix}.*") kalign.io.write_fasta(aligned, f"{output_prefix}.fasta", ids=ids) kalign.io.write_clustal(aligned, f"{output_prefix}.aln", ids=ids) print("āœ… Complete!") return aligned, ids # Usage aligned, ids = align_file_workflow("input.fasta", "aligned") ``` -------------------------------- ### Integrate kalign with Biopython Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md This example demonstrates seamless integration between `kalign` and Biopython. It shows how to perform an alignment and directly receive the result as a Biopython `MultipleSeqAlignment` object, enabling immediate access to Biopython's extensive functionalities for sequence manipulation, analysis, and file format conversion (e.g., Clustal, Phylip). ```python import kalign from Bio import AlignIO # Align and get Biopython object sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"] ids = ["seq1", "seq2", "seq3"] aln_bp = kalign.align(sequences, fmt="biopython", ids=ids) # Use Biopython's rich functionality print(f"Alignment length: {aln_bp.get_alignment_length()}") print(f"Number of sequences: {len(aln_bp)}") # Export to different formats AlignIO.write(aln_bp, "output.clustal", "clustal") AlignIO.write(aln_bp, "output.phylip", "phylip") # Access individual sequences for record in aln_bp: print(f"{record.id}: {record.seq}") ``` -------------------------------- ### Kalign Basic Sequence Alignment Example Source: https://github.com/timolassmann/kalign/blob/main/README-python.md A quick start example demonstrating how to perform a basic sequence alignment with Kalign. It shows importing the library, defining a list of sequences, and calling the `align` function, followed by a success message. ```Python import kalign sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"] aligned = kalign.align(sequences) print("āœ… Alignment successful!") ``` -------------------------------- ### Basic Sequence Alignment with kalign Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Illustrates a fundamental usage of the kalign library to align a list of input sequences. It demonstrates how to call `analyze_sequences` to obtain aligned sequences, alignment statistics, and a consensus sequence for a given set of biological sequences. ```python sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"] aligned, stats, consensus = analyze_sequences(sequences, "DNA test") ``` -------------------------------- ### Read Sequences from File with Kalign Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Shows two methods to read sequences from a FASTA file: using `align_from_file` for direct alignment or `read_sequences` for more control over the sequences before alignment. ```python import kalign # Method 1: Use align_from_file (simple) aligned = kalign.align_from_file("sequences.fasta", seq_type="protein") # Method 2: Read then align (more control) sequences, ids = kalign.io.read_sequences("sequences.fasta") aligned = kalign.align(sequences, seq_type="protein") # Print with IDs for seq_id, seq in zip(ids, aligned): print(f"{seq_id}: {seq}") ``` -------------------------------- ### Optimize kalign Alignment Performance Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md This Python function provides a practical guide to optimizing `kalign` performance. It covers key strategies such as explicitly specifying the sequence type, dynamically setting the number of threads based on CPU count (with a practical upper limit), and advising on chunking for very large datasets to improve efficiency. ```python import kalign def optimize_alignment(sequences, seq_type_hint=None): """Optimize alignment for performance.""" # 1. Specify sequence type if known if seq_type_hint is None: # Auto-detect for first few sequences sample = sequences[:min(10, len(sequences))] seq_type_hint = "auto" # 2. Use appropriate thread count import os n_threads = min(16, os.cpu_count()) # Diminishing returns after 16 # 3. For large datasets, consider chunking if len(sequences) > 10000: print(f"Large dataset ({len(sequences)} sequences)") print("Consider splitting into smaller chunks") # 4. Perform alignment aligned = kalign.align( sequences, seq_type=seq_type_hint, n_threads=n_threads ) return aligned # Usage aligned = optimize_alignment(sequences, seq_type_hint="protein") ``` -------------------------------- ### Install Kalign dependencies Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Installs essential Python packages such as Biopython, scikit-bio, pandas, matplotlib, and psutil, which are required for Kalign's functionality and related data processing tasks. ```bash pip install biopython scikit-bio pandas matplotlib psutil ``` -------------------------------- ### Handle Sequence Types in Kalign Alignment Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Explains how Kalign manages sequence types, including automatic detection for convenience and explicit specification for performance optimization and handling divergent protein sequences. Using string constants for types is recommended. ```python import kalign # Kalign automatically detects sequence type mixed_sequences = ["ATCGATCG", "AUCGAUCG", "ACDEFGHIK"] aligned = kalign.align(mixed_sequences) # seq_type="auto" is default ``` ```python import kalign # DNA sequences dna_seqs = ["ATCGATCG", "ATCGTCG", "ATCGCG"] aligned = kalign.align(dna_seqs, seq_type="dna") # Protein sequences protein_seqs = ["ACDEFGHIK", "ACDEFGH", "ACDEFGHIKL"] aligned = kalign.align(protein_seqs, seq_type="protein") # Divergent proteins (for highly divergent sequences) divergent_seqs = ["ACDEFGHIK", "MNPQRSTVW", "ACDEFGHIKL"] aligned = kalign.align(divergent_seqs, seq_type="divergent") ``` ```python import kalign # Using string constants (recommended) aligned = kalign.align(sequences, seq_type="protein") ``` -------------------------------- ### Write Aligned Sequences to File with Kalign Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Illustrates how to save aligned sequences to different file formats (FASTA, Clustal) using Kalign's `write_alignment` and `kalign.io` functions, including options for custom sequence IDs. ```python import kalign # Align sequences sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"] aligned = kalign.align(sequences) # Method 1: Simple write kalign.write_alignment(aligned, "output.fasta") # Method 2: With custom IDs and format ids = ["sequence_1", "sequence_2", "sequence_3"] kalign.io.write_fasta(aligned, "output.fasta", ids=ids) kalign.io.write_clustal(aligned, "output.aln", ids=ids) ``` -------------------------------- ### Create Minimal Reproducible Example for Kalign Issues Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md This Python code provides a template for constructing a minimal, self-contained example that reliably reproduces a specific issue encountered with the Kalign library. By isolating the problem to the fewest lines of code, it significantly aids in debugging and facilitates more efficient issue reporting and resolution. ```python import kalign # Minimal example that reproduces the issue sequences = ["ATCG", "ATCG"] # Replace with your problematic sequences try: result = kalign.align(sequences) print("Success") except Exception as e: print(f"Error: {e}") ``` -------------------------------- ### Kalign Exploring Example Scripts Source: https://github.com/timolassmann/kalign/blob/main/README-python.md Commands to run various example scripts provided with the Kalign library. These examples cover basic usage, ecosystem integration, and performance benchmarking, helping users understand different functionalities and optimization techniques. ```Bash python python-examples/basic_usage.py python python-examples/ecosystem_integration.py python python-examples/performance_benchmarks.py ``` -------------------------------- ### Develop a custom Kalign application template in Python Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Presents a comprehensive Python script template for developing custom applications that leverage the Kalign library. It demonstrates a typical workflow including sequence alignment, statistical analysis using `kalign.utils.alignment_stats`, and custom output generation, serving as a starting point for user-specific projects. ```python #!/usr/bin/env python3 """ Custom Kalign Application Adapt this template for your specific use case. """ import kalign def your_custom_analysis(sequences): """Your custom analysis pipeline.""" # Step 1: Align sequences aligned = kalign.align(sequences, seq_type="auto", n_threads=4) # Step 2: Your custom analysis stats = kalign.utils.alignment_stats(aligned) # Step 3: Custom output print(f"Analysis complete: {stats['conservation']:.2%} conservation") return aligned, stats # Your application logic here if __name__ == "__main__": sequences = ["ATCG", "ATCG", "TACG"] aligned, stats = your_custom_analysis(sequences) ``` -------------------------------- ### Python Kalign Batch Processing, Reporting, and Example Usage Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md This snippet demonstrates core functionalities for managing and reporting on batch processing tasks using Kalign. It includes methods for printing a summary of processing results, saving a detailed JSON report, and a complete example of how to set up and execute a batch processing workflow, including the creation of dummy input files and processing a directory. ```Python self.stats['failed_alignments'] += sum(1 for r in results if r['status'] != 'success') self.stats['total_sequences'] += sum(r.get('n_sequences', 0) for r in results) self.stats['total_time'] += end_time - start_time # Print summary self.print_summary(results, end_time - start_time) return results def print_summary(self, results, total_time): """Print processing summary.""" successful = [r for r in results if r['status'] == 'success'] failed = [r for r in results if r['status'] != 'success'] print(f"\nšŸ“Š Batch Processing Summary:") print(f" Total files: {len(results)}") print(f" Successful: {len(successful)}") print(f" Failed: {len(failed)}") print(f" Total time: {total_time:.2f} seconds") print(f" Average time per file: {total_time/len(results):.2f} seconds") if successful: total_sequences = sum(r['n_sequences'] for r in successful) total_alignment_time = sum(r['time'] for r in successful) print(f" Total sequences aligned: {total_sequences}") print(f" Average sequences per second: {total_sequences/total_alignment_time:.1f}") if failed: print(f"\nāŒ Failed files:") for result in failed[:5]: # Show first 5 failures print(f" {result['input_file']}: {result['message']}") if len(failed) > 5: print(f" ... and {len(failed) - 5} more") def save_report(self, results, output_file="kalign_batch_report.json"): """Save detailed processing report.""" report = { 'processing_stats': self.stats, 'timestamp': time.time(), 'results': results } with open(output_file, 'w') as f: json.dump(report, f, indent=2) print(f"šŸ“„ Report saved to {output_file}") # Example usage def batch_processing_example(): """Example of batch processing workflow.""" # Create test files (in real usage, you'd have existing files) test_dir = Path("./test_sequences") test_dir.mkdir(exist_ok=True) # Create sample files for i in range(5): sequences = [ f"ATCGATCG{'A' * i}TCGATCG", f"ATCGATCG{'T' * i}TCGATCG", f"ATCGATCG{'C' * i}TCGATCG" ] with open(test_dir / f"sequences_{i}.fasta", 'w') as f: for j, seq in enumerate(sequences): f.write(f">seq_{i}_{j}\n{seq}\n") # Process with batch processor processor = KalignBatchProcessor(n_workers=2) results = processor.process_directory( test_dir, output_dir=test_dir / "aligned", seq_type="dna", n_threads=1 # Use 1 thread per process since we're using multiple processes ) # Save report processor.save_report(results, "batch_report.json") return results # results = batch_processing_example() ``` -------------------------------- ### Example DNA Sequence Input for Diversity Analysis Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-ecosystem.md Defines a list of DNA sequences and demonstrates how they are passed to a skbio_diversity_analysis function to initiate a diversity analysis workflow, returning alignment, position statistics, variable regions, and distance matrix. ```python dna_sequences = [ "ATCGATCGATCGATCGATCG", "ATCGATCGTCGATCGATCG", "ATCGATCGATCATCGATCG", "ATCGATCGAGCTCGATCG", "ATCGATCGATCGATCAACG" ] alignment, position_stats, variable_regions, distances = skbio_diversity_analysis(dna_sequences) ``` -------------------------------- ### Install Kalign Python Package Source: https://github.com/timolassmann/kalign/blob/main/README-python.md Command to install the basic Kalign Python package using pip. ```bash pip install kalign ``` -------------------------------- ### Automate Sequence File Processing with kalign Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Presents a Python function `process_sequence_files` designed for batch processing of sequence files. It iterates through specified input directories, reads sequences using `kalign.io.read_sequences`, performs alignment, writes results to an output directory, and generates alignment statistics. The function includes robust error handling and a summary report for processed files. ```python import kalign import os from pathlib import Path def process_sequence_files(input_dir, output_dir, file_pattern="*.fasta"): """Process multiple sequence files in a directory.""" input_path = Path(input_dir) output_path = Path(output_dir) output_path.mkdir(exist_ok=True) # Find all matching files files = list(input_path.glob(file_pattern)) print(f"Found {len(files)} files to process") results = [] for file_path in files: print(f"Processing {file_path.name}...") try: # Read sequences sequences, ids = kalign.io.read_sequences(str(file_path)) # Align aligned = kalign.align(sequences, n_threads=4) # Write aligned sequences output_file = output_path / f"aligned_{file_path.name}" kalign.io.write_fasta(aligned, str(output_file), ids=ids) # Calculate stats stats = kalign.utils.alignment_stats(aligned) results.append({ 'file': file_path.name, 'n_sequences': len(sequences), 'alignment_length': stats['length'], 'conservation': stats['conservation'] }) print(f" āœ… {len(sequences)} sequences, length {stats['length']}") except Exception as e: print(f" āŒ Error: {e}") # Summary report print(f"\nšŸ“Š Summary of {len(results)} successful alignments:") for result in results: print(f" {result['file']}: {result['n_sequences']} seqs, " f"length {result['alignment_length']}, " f"conservation {result['conservation']:.2%}") return results # Usage # results = process_sequence_files("input_sequences", "aligned_sequences") ``` -------------------------------- ### Troubleshooting: Resolve Kalign Python Module Installation Issues Source: https://github.com/timolassmann/kalign/blob/main/README.md Offers steps to resolve common Python installation problems for the Kalign module. It suggests upgrading pip, setuptools, and wheel, then installing directly from the GitHub repository. ```bash pip install --upgrade pip setuptools wheel pip install git+https://github.com/TimoLassmann/kalign.git ``` -------------------------------- ### Perform Basic Sequence Alignment with Kalign Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Demonstrates how to perform multiple sequence alignment for different biological sequence types (DNA, Protein, RNA) using Kalign's `align` function, including auto-detection and explicit type specification for optimized results. ```python import kalign # Define your sequences dna_sequences = [ "ATCGATCGATCGATCG", "ATCGATCGTCGATCG", "ATCGATCGATCATCG", "ATCGATCGAGATCG" ] # Align them (auto-detects DNA) aligned = kalign.align(dna_sequences) # Print results for i, seq in enumerate(aligned): print(f"Seq {i+1}: {seq}") ``` ```python import kalign protein_sequences = [ "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQ", "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQF", "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALP" ] # Explicitly specify protein type for better performance aligned = kalign.align(protein_sequences, seq_type="protein") for i, seq in enumerate(aligned): print(f"Protein {i+1}: {seq}") ``` ```python import kalign rna_sequences = [ "AUCGAUCGAUCGAUCG", "AUCGAUCGUCGAUCG", "AUCGAUCGAUCAUCG" ] aligned = kalign.align(rna_sequences, seq_type="rna") for i, seq in enumerate(aligned): print(f"RNA {i+1}: {seq}") ``` -------------------------------- ### Kalign Installation and Verification Source: https://github.com/timolassmann/kalign/blob/main/README-python.md Steps to install the Kalign Python library using pip and verify the installation by importing the library and printing its version. This ensures the package is correctly set up and ready for use. ```Bash pip install kalign[all] python -c "import kalign; print(f'Kalign {kalign.__version__} ready!')" ``` -------------------------------- ### Perform Comparative Sequence Analysis with kalign Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md Defines a Python function `compare_sequences` that performs pairwise identity analysis. It utilizes `kalign.align` for sequence alignment and `kalign.utils.pairwise_identity_matrix` to calculate similarity, identifying the most and least similar pairs among a set of sequences. The function also includes a practical usage example. ```python import kalign import numpy as np def compare_sequences(sequences, ids=None): """Compare sequences with pairwise identity analysis.""" if ids is None: ids = [f"Seq{i+1}" for i in range(len(sequences))] # Align sequences aligned = kalign.align(sequences) # Calculate pairwise identities identity_matrix = kalign.utils.pairwise_identity_matrix(aligned) # Find most and least similar pairs n = len(sequences) max_identity = 0 min_identity = 1 max_pair = None min_pair = None for i in range(n): for j in range(i+1, n): identity = identity_matrix[i, j] if identity > max_identity: max_identity = identity max_pair = (ids[i], ids[j]) if identity < min_identity: min_identity = identity min_pair = (ids[i], ids[j]) print(f"Most similar: {max_pair[0]} vs {max_pair[1]} ({max_identity:.2%})") print(f"Least similar: {min_pair[0]} vs {min_pair[1]} ({min_identity:.2%})") return aligned, identity_matrix # Usage sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG", "GCTAGCTAGCTA"] ids = ["Human", "Mouse", "Rat", "Chicken"] aligned, matrix = compare_sequences(sequences, ids) ``` -------------------------------- ### Run Kalign Python Examples Source: https://github.com/timolassmann/kalign/blob/main/README-python.md Commands to execute the basic usage and ecosystem integration example scripts provided with the Kalign package. ```bash python python-examples/basic_usage.py python python-examples/ecosystem_integration.py ``` -------------------------------- ### Install Biopython for Kalign Integration Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md These commands provide two methods to install Biopython: either directly using pip or by installing Kalign with its optional Biopython dependencies, enabling Biopython-formatted output. ```bash pip install kalign[biopython] # or pip install biopython ``` -------------------------------- ### Run Kalign Python Basic Usage Example Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Execute the `basic_usage.py` script to demonstrate core Kalign features like sequence alignment, file I/O, and multi-threading. ```bash python basic_usage.py ``` -------------------------------- ### Create Kalign Benchmark Directory Structure Source: https://github.com/timolassmann/kalign/blob/main/scripts/benchmark.org Initial setup script to create the necessary directories for the Kalign benchmark project, including data, programs, and scratch space. This ensures a clean and organized environment for subsequent steps. ```bash cd mkdir -p kalignbenchmark cd kalignbenchmark mkdir -p data mkdir -p programs mkdir -p scratch ``` -------------------------------- ### Fix CMake Not Found Errors during Kalign Build Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md Provides solutions for `CMake must be installed` errors encountered during Kalign installation, including pip/conda installation and version checks to ensure compatibility (requires CMake 3.18+). ```bash pip install cmake # or conda install cmake # or system package manager ``` ```bash cmake --version ``` -------------------------------- ### Customize Gap Penalties in kalign Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md This snippet demonstrates how to fine-tune the alignment process by adjusting gap penalties. It shows examples for creating more conservative alignments (fewer gaps) by increasing `gap_open` and `gap_extend` penalties, and more aggressive alignments (more gaps) by decreasing them. It also illustrates how to set `terminal_gap_extend` to control penalties for gaps at the ends of sequences. ```python import kalign # Conservative alignment (fewer gaps) aligned = kalign.align( sequences, gap_open=-15.0, # Higher penalty for opening gaps gap_extend=-2.0 # Higher penalty for extending gaps ) # Aggressive alignment (more gaps) aligned = kalign.align( sequences, gap_open=-5.0, # Lower penalty for opening gaps gap_extend=-0.5 # Lower penalty for extending gaps ) # Custom terminal gap handling aligned = kalign.align( sequences, gap_open=-10.0, gap_extend=-1.0, terminal_gap_extend=0.0 # No penalty for terminal gaps ) ``` -------------------------------- ### Set Up Kalign Python Bindings for Development Source: https://github.com/timolassmann/kalign/blob/main/CONTRIBUTING.md Instructions for setting up the Python development environment for Kalign. This involves navigating to the Python module directory, installing the package in editable mode for local development, and verifying the installation by importing the `kalign` module and printing its version. ```bash cd python pip install -e . python -c "import kalign; print(kalign.__version__)" ``` -------------------------------- ### Convert kalign Output to Biopython Objects Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md This example demonstrates how to convert a standard `kalign` alignment result (a list of aligned sequence strings) into a Biopython `MultipleSeqAlignment` object. This utility function highlights the flexibility of `kalign`'s output, allowing users to integrate it with other bioinformatics libraries even if the initial alignment wasn't requested in a specific format. ```python import kalign # Start with plain alignment aligned = kalign.align(sequences) # Convert to ecosystem objects when needed def convert_to_biopython(aligned_seqs, ids=None): """Convert plain alignment to Biopython.""" if ids is None: ids = [f"seq{i}" for i in range(len(aligned_seqs))] return kalign.align( [seq.replace('-', '') for seq in aligned_seqs], # Remove gaps fmt="biopython", ids=ids ) # Usage ids = ["sequence_1", "sequence_2", "sequence_3"] aln_bp = convert_to_biopython(aligned, ids) ``` -------------------------------- ### Download and Compile Kalign3 from GitHub Source: https://github.com/timolassmann/kalign/blob/main/scripts/benchmark.org Instructions to clone the Kalign3 source code from its GitHub repository, configure it for installation, compile the program, run self-checks, and finally install the executable into the benchmark's designated programs directory. ```bash cd ~/kalignbenchmark/programs mkdir -p kalign3_src cd kalign3_src git clone https://github.com/TimoLassmann/kalign.git cd kalign ./autogen.sh ./configure --bindir=$HOME/kalignbenchmark/programs make make check make install ``` -------------------------------- ### Verify Biopython Installation and Module Accessibility in Python Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md This Python script checks if Biopython is correctly installed and if its core modules, such as `Bio.Align`, `Bio.SeqRecord`, and `Bio.Seq`, are accessible. It provides clear feedback on the installation status, helping to confirm readiness for Kalign integration. ```python try: import Bio print(f"Biopython version: {Bio.__version__}") # Test specific modules from Bio.Align import MultipleSeqAlignment from Bio.SeqRecord import SeqRecord from Bio.Seq import Seq print("āœ… Biopython modules accessible") except ImportError as e: print(f"āŒ Biopython issue: {e}") ``` -------------------------------- ### Example Output: Kalign Python Performance Benchmarking Suite Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Console output from the Kalign performance benchmarking suite, detailing system information, thread scaling results, and a recommended optimal thread count. ```text šŸ”¬ Kalign Performance Benchmarking Suite šŸ“Š System Information: Platform: darwin CPU cores: 8 Memory: 16.0 GB Kalign version: 3.4.1 Thread Scaling Benchmark Testing 1 threads... Average: 0.245s (±0.003s) Speedup: 1.00x Efficiency: 100.0% Testing 4 threads... Average: 0.089s (±0.002s) Speedup: 2.75x Efficiency: 68.8% šŸŽÆ Recommended thread count: 4 (efficiency: 68.8%) ``` -------------------------------- ### Configure Threading for kalign Alignments Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md This example illustrates how to manage the number of CPU threads used by `kalign` for alignments. It shows setting a global default thread count using `kalign.set_num_threads` based on available CPU cores, and also how to override this setting for individual alignment calls using the `n_threads` parameter. The current default thread count can be queried with `kalign.get_num_threads`. ```python import kalign import os # Set global thread count cpu_count = os.cpu_count() kalign.set_num_threads(cpu_count) # All subsequent alignments use all CPUs aligned1 = kalign.align(sequences1) aligned2 = kalign.align(sequences2) # Override for specific alignment aligned3 = kalign.align(sequences3, n_threads=1) # Single-threaded # Check current setting print(f"Using {kalign.get_num_threads()} threads by default") ``` -------------------------------- ### Run MUSCLE and Kalign on Balifam Dataset Source: https://github.com/timolassmann/kalign/blob/main/scripts/balibase_test.org Illustrates a more complex benchmark scenario involving a Balifam dataset. It combines input files, runs MUSCLE, and then uses Kalign and "qscore" for evaluation. The "<>" placeholder is for environment setup. ```bash <> IN=~/kalignbenchmark/data/BB20021-package.vie ADD=~/kalignbenchmark/data/bb3_release/RV20/BB20021.tfa REF=~/kalignbenchmark/data/bb3_release/RV20/BB20021.msf OUT=~/kalignbenchmark/scratch/BB20021_muscle.fa cd ~/kalignbenchmark/scratch cat $IN $ADD > test.fa time muscle3.8.31_i86linux32 -maxiters 2 -in test.fa -out $OUT kalign $REF -r -f fasta -o ref.fa qscore -test $OUT -ref ref.fa ``` -------------------------------- ### Resolve Kalign Package Installation Failures Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md Addresses common issues leading to `pip install kalign` failures, including missing build tools and system dependencies on Linux, macOS, and Windows. Solutions involve updating pip components, installing system-level packages, or using conda-forge. ```bash pip install --upgrade pip setuptools wheel pip install --upgrade cmake scikit-build-core pybind11 pip install kalign ``` ```bash # Ubuntu/Debian sudo apt-get update sudo apt-get install cmake build-essential # CentOS/RHEL sudo yum install cmake gcc gcc-c++ make ``` ```bash # Install Xcode command line tools xcode-select --install # Or install via Homebrew brew install cmake ``` ```bash # Install Visual Studio Build Tools # Or use conda-forge conda install -c conda-forge kalign ``` -------------------------------- ### Python Example for Distributed Kalign Alignment Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md This Python function `distributed_alignment_example` showcases how to use the `DistributedKalign` class to process sequence alignments in a distributed manner. It generates sample DNA sequence groups, submits them to the distributed processor, waits for results, and reports on successful and failed tasks. Essential setup steps for Redis and Celery workers are also provided for running the distributed system. ```python def distributed_alignment_example(): """Example of distributed alignment processing.""" # Initialize distributed processor distributed = DistributedKalign() # Create sequence groups for processing sequence_groups = [] for i in range(10): group = [ f"ATCGATCG{'A' * (i % 5)}TCGATCG", f"ATCGATCG{'T' * (i % 5)}TCGATCG", f"ATCGATCG{'C' * (i % 5)}TCGATCG", f"ATCGATCG{'G' * (i % 5)}TCGATCG" ] sequence_groups.append(group) # Submit batch task_ids = distributed.submit_batch( sequence_groups, seq_type="dna", n_threads=2 ) # Wait for completion results = distributed.wait_for_completion(task_ids, timeout=600) # Analyze results successful = [r for r in results if r['status'] == 'success'] failed = [r for r in results if r['status'] == 'error'] print(f"\nDistributed processing complete:") print(f" Successful: {len(successful)}") print(f" Failed: {len(failed)}") return results # To run distributed processing: # 1. Start Redis: redis-server # 2. Start Celery workers: celery -A your_module worker --loglevel=info # 3. Run: results = distributed_alignment_example() ``` -------------------------------- ### Access Kalign module help documentation Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Runs a Python command directly from the shell to invoke the built-in `help()` function for the `kalign` module. This provides detailed information about its functions, classes, and methods, useful for quick reference. ```bash python -c "import kalign; help(kalign)" ``` -------------------------------- ### Install scikit-bio for Kalign Integration Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md These commands provide methods to install scikit-bio: either directly via Kalign's extra dependencies using pip or through Conda, enabling scikit-bio-formatted output from Kalign. ```bash pip install kalign[skbio] # or conda install -c conda-forge scikit-bio ``` -------------------------------- ### Build Kalign from Source Source: https://github.com/timolassmann/kalign/blob/main/CONTRIBUTING.md Provides a step-by-step guide to compile the Kalign project from source. This includes cloning the repository, setting up the build environment with CMake, compiling the code, and running the integrated tests to ensure a successful build. ```bash # Clone your fork git clone https://github.com/yourusername/kalign.git cd kalign # Create build directory mkdir build && cd build # Configure and build cmake .. make # Run tests make test ``` -------------------------------- ### Python: Local Development Setup for Kalign Module Source: https://github.com/timolassmann/kalign/blob/main/README.md Explains how to set up the Kalign Python module for local development using `uv pip install -e .`. This command installs the package in editable mode, allowing changes to be reflected without reinstallation. ```bash uv pip install -e . ``` -------------------------------- ### Run Kalign Python Performance Benchmarks Example Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Execute the `performance_benchmarks.py` script to perform comprehensive performance testing, including thread scaling and memory usage analysis for Kalign. ```bash python performance_benchmarks.py ``` -------------------------------- ### Python Example: Demonstrating Memory-Efficient Alignment Usage Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md This example function demonstrates how to use the `memory_efficient_alignment` function. It generates a large synthetic dataset of sequences and then processes them in chunks, showcasing the memory monitoring capabilities. This helps in understanding how the chunking and garbage collection work in practice. ```python def memory_usage_example(): """Example of memory usage monitoring.""" # Create large sequence set print("Creating large sequence set...") base_seq = "ATCGATCGATCG" * 100 # 1200 bp sequences large_sequence_set = [] for i in range(500): # 500 sequences seq = list(base_seq) # Add mutations n_mutations = len(seq) // 50 # 2% mutations positions = np.random.choice(len(seq), n_mutations, replace=False) for pos in positions: seq[pos] = np.random.choice(['A', 'T', 'C', 'G']) large_sequence_set.append(''.join(seq)) print(f"Created {len(large_sequence_set)} sequences of length {len(large_sequence_set[0])}") # Process with memory monitoring aligned = memory_efficient_alignment( large_sequence_set, chunk_size=100, monitor_memory=True ) print(f"Final result: {len(aligned)} aligned sequences") return aligned ``` -------------------------------- ### Example Usage for Gap Penalty Optimization Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md This Python snippet provides an example of input sequences that can be used with a gap penalty optimization function (e.g., `optimize_gap_penalties`). It demonstrates how to define test sequences for alignment and includes a commented-out line showing how the optimization function might be called. ```Python test_sequences = [ "ATCGATCGATCGATCGATCG", "ATCGATCGTCGATCGATCG", "ATCGATCGATCATCGATCG", "ATCGATCGAGCTCGATCG" ] # best_params, param_results = optimize_gap_penalties(test_sequences) ``` -------------------------------- ### Install BAliBASE data files Source: https://github.com/timolassmann/kalign/blob/main/scripts/balibase_test.org This script navigates to the home directory, creates a 'data' directory, and then downloads and extracts the BAliBASE_R1-5.tar.gz archive if it doesn't already exist. This is the first step to set up the BAliBASE test environment. ```sh cd mkdir -p data cd data if [ ! -f BAliBASE_R1-5.tar.gz ]; then wget https://www.lbgi.fr/balibase/BalibaseDownload/BAliBASE_R1-5.tar.gz fi tar -zxvf BAliBASE_R1-5.tar.gz ``` -------------------------------- ### Example: Parallel Kalign Sequence Alignment with Thread Pool (Python) Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md This example demonstrates how to use the `KalignThreadPool` class to perform parallel sequence alignment. It sets up multiple groups of sequences and processes them concurrently using a thread pool, showcasing the performance benefits and proper usage of the context manager. ```python # Example usage def parallel_alignment_example(): """Example of parallel alignment processing.""" # Create multiple sequence groups sequence_groups = [] for group in range(5): group_sequences = [ f"ATCGATCG{'A' * group}TCGATCG", f"ATCGATCG{'T' * group}TCGATCG", f"ATCGATCG{'C' * group}TCGATCG" ] sequence_groups.append(group_sequences) print(f"Aligning {len(sequence_groups)} groups with thread pool...") start_time = time.time() with KalignThreadPool(max_workers=4) as pool: alignments = pool.align_multiple(sequence_groups, n_threads=1) end_time = time.time() print(f"Completed in {end_time - start_time:.2f} seconds") print(f"Processed {len(alignments)} alignments") return alignments # alignments = parallel_alignment_example() ``` -------------------------------- ### Troubleshoot Kalign ImportError on First Use Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md Helps resolve `ImportError` issues (e.g., DLL load failed on Windows, cannot find shared library on Linux) when importing Kalign. Solutions include verifying installation paths, reinstalling the package, checking dependencies, or trying alternative installation methods like conda-forge. ```python import kalign print(kalign.__version__) print(kalign.__file__) ``` ```bash pip uninstall kalign pip install kalign --no-cache-dir ``` ```python import numpy # Should work ``` ```bash conda install -c conda-forge kalign ``` -------------------------------- ### Running Kalign Project Tests Source: https://github.com/timolassmann/kalign/blob/main/CONTRIBUTING.md Provides command-line instructions for executing both C/C++ and Python test suites within the Kalign project. Users can navigate to the respective build or python directories and run the specified commands to verify functionality. ```Bash # C/C++ tests cd build make test # Python tests cd python python -m pytest ``` -------------------------------- ### Example Output: Kalign Python Basic Sequence Alignment Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md Console output demonstrating the results of a basic Kalign sequence alignment, including input sequences, aligned sequences, and various alignment statistics. ```text 🧬 Kalign Python Examples ================================================== Example 1: Basic Sequence Alignment ================================================== Input sequences: Seq 1: ATCGATCGATCGATCG Seq 2: ATCGATCGTCGATCG Seq 3: ATCGATCGATCATCG Seq 4: ATCGATCGAGATCG Aligned sequences: Seq 1: ATCGATCGATCGATCG Seq 2: ATCGATCG-TCGATCG Seq 3: ATCGATCGATC-ATCG Seq 4: ATCGATCGA--GATCG Alignment statistics: Length: 16 Gap fraction: 18.75% Conservation: 75.00% Average identity: 81.67% ``` -------------------------------- ### Troubleshooting Kalign Python Optional Dependency Import Errors Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md A common error message indicating missing optional dependencies for Kalign, prompting the user to install them for full functionality. ```text Import errors for optional dependencies: ```