### Quick Start: Install Kalign Python with All Example Dependencies

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Install the Kalign Python package along with all its example-specific dependencies, including `psutil` for performance monitoring, as part of the quick start guide.

```bash
pip install kalign[all]
pip install psutil  # For performance monitoring
```

--------------------------------

### Install Kalign Python Library

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Installs the core Kalign library for Python using pip, the standard Python package installer. This is the most basic installation method.

```bash
pip install kalign
```

--------------------------------

### Verify Kalign Python Installation

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Checks the installed Kalign version and performs a quick alignment test with sample sequences to confirm the library is working correctly after installation.

```python
import kalign
print(f"Kalign version: {kalign.__version__}")

# Quick test
sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"]
aligned = kalign.align(sequences)
print("✅ Kalign is working!")
```

--------------------------------

### Quick Start: Run Kalign Python Basic Usage Example

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Execute the basic usage example script from the `python-examples` directory to quickly see Kalign's core functionalities in action.

```bash
python python-examples/basic_usage.py
```

--------------------------------

### Install Kalign Python with Ecosystem Integrations

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Installs Kalign along with optional dependencies for integration with other bioinformatics libraries like Biopython or scikit-bio, or all available integrations for extended functionality.

```bash
pip install kalign[biopython]
```

```bash
pip install kalign[skbio]
```

```bash
pip install kalign[all]
```

--------------------------------

### Example R Package Installation Command

Source: https://github.com/timolassmann/kalign/blob/main/scripts/benchmark.org

This R command provides an example of how to install a new R package, specifically 'tidyverse', using the 'install.packages()' function. This is a common way to add new libraries to an R environment.

```R
install.packages("tidyverse")
```

--------------------------------

### Quick Start: Run Kalign Python Ecosystem Integration Example

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Execute the ecosystem integration example script from the `python-examples` directory to explore Kalign's interoperability with other bioinformatics tools.

```bash
python python-examples/ecosystem_integration.py
```

--------------------------------

### Retrieve kalign Package Information and Function Documentation in Python

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

This Python code demonstrates how to access metadata for the installed `kalign` package, including its version, author, and contact details. It also illustrates the use of Python's `help()` function to display in-line documentation for specific `kalign` functions, such as `kalign.align` and `kalign.utils.alignment_stats`, directly from the console.

```python
import kalign

# Check version and info
print(f"Kalign version: {kalign.__version__}")
print(f"Author: {kalign.__author__}")
print(f"Contact: {kalign.__email__}")

# View help for specific functions
help(kalign.align)
help(kalign.utils.alignment_stats)
```

--------------------------------

### Install Kalign Python Ecosystem Integration Dependencies

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Install optional Python packages such as Biopython, scikit-bio, Pandas, and Matplotlib required for the Kalign ecosystem integration examples.

```bash
pip install kalign[all]
# or individual packages:
pip install biopython scikit-bio pandas matplotlib
```

--------------------------------

### Quick Start: Run Kalign Python Performance Benchmarks

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Execute the performance benchmarks script from the `python-examples` directory to evaluate and optimize Kalign's performance on your specific system.

```bash
python python-examples/performance_benchmarks.py
```

--------------------------------

### Basic Sequence Analysis Workflow with kalign

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

This comprehensive function outlines a typical basic sequence analysis workflow using `kalign`. It covers performing the alignment, calculating various alignment statistics (length, gap fraction, conservation, identity) using `kalign.utils.alignment_stats`, and generating a consensus sequence with `kalign.utils.consensus_sequence`, providing a complete end-to-end example.

```python
import kalign

def analyze_sequences(sequences, name="analysis"):
    """Complete sequence analysis workflow."""
    
    print(f"🧬 Analyzing {len(sequences)} sequences ({name})")
    
    # 1. Align sequences
    print("  Aligning...")
    aligned = kalign.align(sequences, seq_type="auto", n_threads=4)
    
    # 2. Calculate statistics
    print("  Calculating statistics...")
    stats = kalign.utils.alignment_stats(aligned)
    
    print(f"  ✅ Alignment length: {stats['length']}")
    print(f"  ✅ Gap fraction: {stats['gap_fraction']:.2%}")
    print(f"  ✅ Conservation: {stats['conservation']:.2%}")
    print(f"  ✅ Average identity: {stats['identity']:.2%}")
    
    # 3. Generate consensus
    consensus = kalign.utils.consensus_sequence(aligned, threshold=0.5)
    print(f"  ✅ Consensus: {consensus}")
    
    return aligned, stats, consensus
```

--------------------------------

### Install Kalign Python Performance Benchmarks Dependencies

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Install optional Python packages like `psutil` and `matplotlib` for memory monitoring and plotting within the Kalign performance benchmarks.

```bash
pip install psutil matplotlib  # For memory monitoring and plotting
```

--------------------------------

### Kalign Installation Issue Reporting Template

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md

Template for reporting problems encountered during Kalign installation, detailing system, installation method, error messages, and build tools. This helps in diagnosing environment-specific installation failures.

```Markdown
**System**: [OS and version]
**Installation method**: [pip/conda/source]
**Error message**: [complete error output]
**Build tools**: [cmake version, compiler version]
```

--------------------------------

### Implement Complete File-Based Alignment Workflow

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Provides a comprehensive Python function demonstrating a full workflow for reading sequences from a file, performing alignment with optimal settings (auto-detection, multi-threading), and writing the results to multiple output file formats.

```python
import kalign

def align_file_workflow(input_file, output_prefix):
    """Complete file-based alignment workflow."""
    
    # Read sequences
    print(f"Reading sequences from {input_file}...")
    sequences, ids = kalign.io.read_sequences(input_file)
    print(f"Found {len(sequences)} sequences")
    
    # Align with optimal settings
    print("Aligning sequences...")
    aligned = kalign.align(
        sequences,
        seq_type="auto",  # Auto-detect
        n_threads=4       # Use 4 threads
    )
    
    # Write in multiple formats
    print(f"Writing alignments to {output_prefix}.*")
    kalign.io.write_fasta(aligned, f"{output_prefix}.fasta", ids=ids)
    kalign.io.write_clustal(aligned, f"{output_prefix}.aln", ids=ids)
    
    print("✅ Complete!")
    return aligned, ids

# Usage
aligned, ids = align_file_workflow("input.fasta", "aligned")
```

--------------------------------

### Integrate kalign with Biopython

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

This example demonstrates seamless integration between `kalign` and Biopython. It shows how to perform an alignment and directly receive the result as a Biopython `MultipleSeqAlignment` object, enabling immediate access to Biopython's extensive functionalities for sequence manipulation, analysis, and file format conversion (e.g., Clustal, Phylip).

```python
import kalign
from Bio import AlignIO

# Align and get Biopython object
sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"]
ids = ["seq1", "seq2", "seq3"]

aln_bp = kalign.align(sequences, fmt="biopython", ids=ids)

# Use Biopython's rich functionality
print(f"Alignment length: {aln_bp.get_alignment_length()}")
print(f"Number of sequences: {len(aln_bp)}")

# Export to different formats
AlignIO.write(aln_bp, "output.clustal", "clustal")
AlignIO.write(aln_bp, "output.phylip", "phylip")

# Access individual sequences
for record in aln_bp:
    print(f"{record.id}: {record.seq}")
```

--------------------------------

### Kalign Basic Sequence Alignment Example

Source: https://github.com/timolassmann/kalign/blob/main/README-python.md

A quick start example demonstrating how to perform a basic sequence alignment with Kalign. It shows importing the library, defining a list of sequences, and calling the `align` function, followed by a success message.

```Python
import kalign

sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"]
aligned = kalign.align(sequences)
print("✅ Alignment successful!")
```

--------------------------------

### Basic Sequence Alignment with kalign

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Illustrates a fundamental usage of the kalign library to align a list of input sequences. It demonstrates how to call `analyze_sequences` to obtain aligned sequences, alignment statistics, and a consensus sequence for a given set of biological sequences.

```python
sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"]
aligned, stats, consensus = analyze_sequences(sequences, "DNA test")
```

--------------------------------

### Read Sequences from File with Kalign

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Shows two methods to read sequences from a FASTA file: using `align_from_file` for direct alignment or `read_sequences` for more control over the sequences before alignment.

```python
import kalign

# Method 1: Use align_from_file (simple)
aligned = kalign.align_from_file("sequences.fasta", seq_type="protein")

# Method 2: Read then align (more control)
sequences, ids = kalign.io.read_sequences("sequences.fasta")
aligned = kalign.align(sequences, seq_type="protein")

# Print with IDs
for seq_id, seq in zip(ids, aligned):
    print(f"{seq_id}: {seq}")
```

--------------------------------

### Optimize kalign Alignment Performance

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

This Python function provides a practical guide to optimizing `kalign` performance. It covers key strategies such as explicitly specifying the sequence type, dynamically setting the number of threads based on CPU count (with a practical upper limit), and advising on chunking for very large datasets to improve efficiency.

```python
import kalign

def optimize_alignment(sequences, seq_type_hint=None):
    """Optimize alignment for performance."""
    
    # 1. Specify sequence type if known
    if seq_type_hint is None:
        # Auto-detect for first few sequences
        sample = sequences[:min(10, len(sequences))]
        seq_type_hint = "auto"
    
    # 2. Use appropriate thread count
    import os
    n_threads = min(16, os.cpu_count())  # Diminishing returns after 16
    
    # 3. For large datasets, consider chunking
    if len(sequences) > 10000:
        print(f"Large dataset ({len(sequences)} sequences)")
        print("Consider splitting into smaller chunks")
    
    # 4. Perform alignment
    aligned = kalign.align(
        sequences,
        seq_type=seq_type_hint,
        n_threads=n_threads
    )
    
    return aligned

# Usage
aligned = optimize_alignment(sequences, seq_type_hint="protein")
```

--------------------------------

### Install Kalign dependencies

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Installs essential Python packages such as Biopython, scikit-bio, pandas, matplotlib, and psutil, which are required for Kalign's functionality and related data processing tasks.

```bash
pip install biopython scikit-bio pandas matplotlib psutil
```

--------------------------------

### Handle Sequence Types in Kalign Alignment

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Explains how Kalign manages sequence types, including automatic detection for convenience and explicit specification for performance optimization and handling divergent protein sequences. Using string constants for types is recommended.

```python
import kalign

# Kalign automatically detects sequence type
mixed_sequences = ["ATCGATCG", "AUCGAUCG", "ACDEFGHIK"]
aligned = kalign.align(mixed_sequences)  # seq_type="auto" is default
```

```python
import kalign

# DNA sequences
dna_seqs = ["ATCGATCG", "ATCGTCG", "ATCGCG"]
aligned = kalign.align(dna_seqs, seq_type="dna")

# Protein sequences
protein_seqs = ["ACDEFGHIK", "ACDEFGH", "ACDEFGHIKL"]
aligned = kalign.align(protein_seqs, seq_type="protein")

# Divergent proteins (for highly divergent sequences)
divergent_seqs = ["ACDEFGHIK", "MNPQRSTVW", "ACDEFGHIKL"]
aligned = kalign.align(divergent_seqs, seq_type="divergent")
```

```python
import kalign

# Using string constants (recommended)
aligned = kalign.align(sequences, seq_type="protein")
```

--------------------------------

### Write Aligned Sequences to File with Kalign

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Illustrates how to save aligned sequences to different file formats (FASTA, Clustal) using Kalign's `write_alignment` and `kalign.io` functions, including options for custom sequence IDs.

```python
import kalign

# Align sequences
sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG"]
aligned = kalign.align(sequences)

# Method 1: Simple write
kalign.write_alignment(aligned, "output.fasta")

# Method 2: With custom IDs and format
ids = ["sequence_1", "sequence_2", "sequence_3"]
kalign.io.write_fasta(aligned, "output.fasta", ids=ids)
kalign.io.write_clustal(aligned, "output.aln", ids=ids)
```

--------------------------------

### Create Minimal Reproducible Example for Kalign Issues

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md

This Python code provides a template for constructing a minimal, self-contained example that reliably reproduces a specific issue encountered with the Kalign library. By isolating the problem to the fewest lines of code, it significantly aids in debugging and facilitates more efficient issue reporting and resolution.

```python
import kalign

# Minimal example that reproduces the issue
sequences = ["ATCG", "ATCG"]  # Replace with your problematic sequences
try:
    result = kalign.align(sequences)
    print("Success")
except Exception as e:
    print(f"Error: {e}")
```

--------------------------------

### Kalign Exploring Example Scripts

Source: https://github.com/timolassmann/kalign/blob/main/README-python.md

Commands to run various example scripts provided with the Kalign library. These examples cover basic usage, ecosystem integration, and performance benchmarking, helping users understand different functionalities and optimization techniques.

```Bash
python python-examples/basic_usage.py
python python-examples/ecosystem_integration.py
python python-examples/performance_benchmarks.py
```

--------------------------------

### Develop a custom Kalign application template in Python

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Presents a comprehensive Python script template for developing custom applications that leverage the Kalign library. It demonstrates a typical workflow including sequence alignment, statistical analysis using `kalign.utils.alignment_stats`, and custom output generation, serving as a starting point for user-specific projects.

```python
#!/usr/bin/env python3
"""
Custom Kalign Application

Adapt this template for your specific use case.
"""

import kalign

def your_custom_analysis(sequences):
    """Your custom analysis pipeline."""
    
    # Step 1: Align sequences
    aligned = kalign.align(sequences, seq_type="auto", n_threads=4)
    
    # Step 2: Your custom analysis
    stats = kalign.utils.alignment_stats(aligned)
    
    # Step 3: Custom output
    print(f"Analysis complete: {stats['conservation']:.2%} conservation")
    
    return aligned, stats

# Your application logic here
if __name__ == "__main__":
    sequences = ["ATCG", "ATCG", "TACG"]
    aligned, stats = your_custom_analysis(sequences)
```

--------------------------------

### Python Kalign Batch Processing, Reporting, and Example Usage

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md

This snippet demonstrates core functionalities for managing and reporting on batch processing tasks using Kalign. It includes methods for printing a summary of processing results, saving a detailed JSON report, and a complete example of how to set up and execute a batch processing workflow, including the creation of dummy input files and processing a directory.

```Python
        self.stats['failed_alignments'] += sum(1 for r in results if r['status'] != 'success')
        self.stats['total_sequences'] += sum(r.get('n_sequences', 0) for r in results)
        self.stats['total_time'] += end_time - start_time
        
        # Print summary
        self.print_summary(results, end_time - start_time)
        
        return results
    
    def print_summary(self, results, total_time):
        """Print processing summary."""
        
        successful = [r for r in results if r['status'] == 'success']
        failed = [r for r in results if r['status'] != 'success']
        
        print(f"\n📊 Batch Processing Summary:")
        print(f"   Total files: {len(results)}")
        print(f"   Successful: {len(successful)}")
        print(f"   Failed: {len(failed)}")
        print(f"   Total time: {total_time:.2f} seconds")
        print(f"   Average time per file: {total_time/len(results):.2f} seconds")
        
        if successful:
            total_sequences = sum(r['n_sequences'] for r in successful)
            total_alignment_time = sum(r['time'] for r in successful)
            print(f"   Total sequences aligned: {total_sequences}")
            print(f"   Average sequences per second: {total_sequences/total_alignment_time:.1f}")
        
        if failed:
            print(f"\n❌ Failed files:")
            for result in failed[:5]:  # Show first 5 failures
                print(f"   {result['input_file']}: {result['message']}")
            if len(failed) > 5:
                print(f"   ... and {len(failed) - 5} more")
    
    def save_report(self, results, output_file="kalign_batch_report.json"):
        """Save detailed processing report."""
        
        report = {
            'processing_stats': self.stats,
            'timestamp': time.time(),
            'results': results
        }
        
        with open(output_file, 'w') as f:
            json.dump(report, f, indent=2)
        
        print(f"📄 Report saved to {output_file}")

# Example usage
def batch_processing_example():
    """Example of batch processing workflow."""
    
    # Create test files (in real usage, you'd have existing files)
    test_dir = Path("./test_sequences")
    test_dir.mkdir(exist_ok=True)
    
    # Create sample files
    for i in range(5):
        sequences = [
            f"ATCGATCG{'A' * i}TCGATCG",
            f"ATCGATCG{'T' * i}TCGATCG",
            f"ATCGATCG{'C' * i}TCGATCG"
        ]
        
        with open(test_dir / f"sequences_{i}.fasta", 'w') as f:
            for j, seq in enumerate(sequences):
                f.write(f">seq_{i}_{j}\n{seq}\n")
    
    # Process with batch processor
    processor = KalignBatchProcessor(n_workers=2)
    
    results = processor.process_directory(
        test_dir,
        output_dir=test_dir / "aligned",
        seq_type="dna",
        n_threads=1  # Use 1 thread per process since we're using multiple processes
    )
    
    # Save report
    processor.save_report(results, "batch_report.json")
    
    return results

# results = batch_processing_example()
```

--------------------------------

### Example DNA Sequence Input for Diversity Analysis

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-ecosystem.md

Defines a list of DNA sequences and demonstrates how they are passed to a skbio_diversity_analysis function to initiate a diversity analysis workflow, returning alignment, position statistics, variable regions, and distance matrix.

```python
dna_sequences = [
    "ATCGATCGATCGATCGATCG",
    "ATCGATCGTCGATCGATCG",
    "ATCGATCGATCATCGATCG",
    "ATCGATCGAGCTCGATCG",
    "ATCGATCGATCGATCAACG"
]

alignment, position_stats, variable_regions, distances = skbio_diversity_analysis(dna_sequences)
```

--------------------------------

### Install Kalign Python Package

Source: https://github.com/timolassmann/kalign/blob/main/README-python.md

Command to install the basic Kalign Python package using pip.

```bash
pip install kalign
```

--------------------------------

### Automate Sequence File Processing with kalign

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Presents a Python function `process_sequence_files` designed for batch processing of sequence files. It iterates through specified input directories, reads sequences using `kalign.io.read_sequences`, performs alignment, writes results to an output directory, and generates alignment statistics. The function includes robust error handling and a summary report for processed files.

```python
import kalign
import os
from pathlib import Path

def process_sequence_files(input_dir, output_dir, file_pattern="*.fasta"):
    """Process multiple sequence files in a directory."""
    
    input_path = Path(input_dir)
    output_path = Path(output_dir)
    output_path.mkdir(exist_ok=True)
    
    # Find all matching files
    files = list(input_path.glob(file_pattern))
    print(f"Found {len(files)} files to process")
    
    results = []
    
    for file_path in files:
        print(f"Processing {file_path.name}...")
        
        try:
            # Read sequences
            sequences, ids = kalign.io.read_sequences(str(file_path))
            
            # Align
            aligned = kalign.align(sequences, n_threads=4)
            
            # Write aligned sequences
            output_file = output_path / f"aligned_{file_path.name}"
            kalign.io.write_fasta(aligned, str(output_file), ids=ids)
            
            # Calculate stats
            stats = kalign.utils.alignment_stats(aligned)
            
            results.append({
                'file': file_path.name,
                'n_sequences': len(sequences),
                'alignment_length': stats['length'],
                'conservation': stats['conservation']
            })
            
            print(f"  ✅ {len(sequences)} sequences, length {stats['length']}")
            
        except Exception as e:
            print(f"  ❌ Error: {e}")
    
    # Summary report
    print(f"\n📊 Summary of {len(results)} successful alignments:")
    for result in results:
        print(f"  {result['file']}: {result['n_sequences']} seqs, "
              f"length {result['alignment_length']}, "
              f"conservation {result['conservation']:.2%}")
    
    return results

# Usage
# results = process_sequence_files("input_sequences", "aligned_sequences")
```

--------------------------------

### Troubleshooting: Resolve Kalign Python Module Installation Issues

Source: https://github.com/timolassmann/kalign/blob/main/README.md

Offers steps to resolve common Python installation problems for the Kalign module. It suggests upgrading pip, setuptools, and wheel, then installing directly from the GitHub repository.

```bash
pip install --upgrade pip setuptools wheel
pip install git+https://github.com/TimoLassmann/kalign.git
```

--------------------------------

### Perform Basic Sequence Alignment with Kalign

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Demonstrates how to perform multiple sequence alignment for different biological sequence types (DNA, Protein, RNA) using Kalign's `align` function, including auto-detection and explicit type specification for optimized results.

```python
import kalign

# Define your sequences
dna_sequences = [
    "ATCGATCGATCGATCG",
    "ATCGATCGTCGATCG",
    "ATCGATCGATCATCG",
    "ATCGATCGAGATCG"
]

# Align them (auto-detects DNA)
aligned = kalign.align(dna_sequences)

# Print results
for i, seq in enumerate(aligned):
    print(f"Seq {i+1}: {seq}")
```

```python
import kalign

protein_sequences = [
    "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQ",
    "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQF",
    "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALP"
]

# Explicitly specify protein type for better performance
aligned = kalign.align(protein_sequences, seq_type="protein")

for i, seq in enumerate(aligned):
    print(f"Protein {i+1}: {seq}")
```

```python
import kalign

rna_sequences = [
    "AUCGAUCGAUCGAUCG",
    "AUCGAUCGUCGAUCG",
    "AUCGAUCGAUCAUCG"
]

aligned = kalign.align(rna_sequences, seq_type="rna")

for i, seq in enumerate(aligned):
    print(f"RNA {i+1}: {seq}")
```

--------------------------------

### Kalign Installation and Verification

Source: https://github.com/timolassmann/kalign/blob/main/README-python.md

Steps to install the Kalign Python library using pip and verify the installation by importing the library and printing its version. This ensures the package is correctly set up and ready for use.

```Bash
pip install kalign[all]
python -c "import kalign; print(f'Kalign {kalign.__version__} ready!')"
```

--------------------------------

### Perform Comparative Sequence Analysis with kalign

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

Defines a Python function `compare_sequences` that performs pairwise identity analysis. It utilizes `kalign.align` for sequence alignment and `kalign.utils.pairwise_identity_matrix` to calculate similarity, identifying the most and least similar pairs among a set of sequences. The function also includes a practical usage example.

```python
import kalign
import numpy as np

def compare_sequences(sequences, ids=None):
    """Compare sequences with pairwise identity analysis."""
    
    if ids is None:
        ids = [f"Seq{i+1}" for i in range(len(sequences))]
    
    # Align sequences
    aligned = kalign.align(sequences)
    
    # Calculate pairwise identities
    identity_matrix = kalign.utils.pairwise_identity_matrix(aligned)
    
    # Find most and least similar pairs
    n = len(sequences)
    max_identity = 0
    min_identity = 1
    max_pair = None
    min_pair = None
    
    for i in range(n):
        for j in range(i+1, n):
            identity = identity_matrix[i, j]
            if identity > max_identity:
                max_identity = identity
                max_pair = (ids[i], ids[j])
            if identity < min_identity:
                min_identity = identity
                min_pair = (ids[i], ids[j])
    
    print(f"Most similar: {max_pair[0]} vs {max_pair[1]} ({max_identity:.2%})")
    print(f"Least similar: {min_pair[0]} vs {min_pair[1]} ({min_identity:.2%})")
    
    return aligned, identity_matrix

# Usage
sequences = ["ATCGATCGATCG", "ATCGTCGATCG", "ATCGATCATCG", "GCTAGCTAGCTA"]
ids = ["Human", "Mouse", "Rat", "Chicken"]
aligned, matrix = compare_sequences(sequences, ids)
```

--------------------------------

### Run Kalign Python Examples

Source: https://github.com/timolassmann/kalign/blob/main/README-python.md

Commands to execute the basic usage and ecosystem integration example scripts provided with the Kalign package.

```bash
python python-examples/basic_usage.py
python python-examples/ecosystem_integration.py
```

--------------------------------

### Install Biopython for Kalign Integration

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md

These commands provide two methods to install Biopython: either directly using pip or by installing Kalign with its optional Biopython dependencies, enabling Biopython-formatted output.

```bash
pip install kalign[biopython]
# or
pip install biopython
```

--------------------------------

### Run Kalign Python Basic Usage Example

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Execute the `basic_usage.py` script to demonstrate core Kalign features like sequence alignment, file I/O, and multi-threading.

```bash
python basic_usage.py
```

--------------------------------

### Create Kalign Benchmark Directory Structure

Source: https://github.com/timolassmann/kalign/blob/main/scripts/benchmark.org

Initial setup script to create the necessary directories for the Kalign benchmark project, including data, programs, and scratch space. This ensures a clean and organized environment for subsequent steps.

```bash
cd 
mkdir -p kalignbenchmark
cd  kalignbenchmark
mkdir -p data
mkdir -p programs 
mkdir -p scratch
```

--------------------------------

### Fix CMake Not Found Errors during Kalign Build

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md

Provides solutions for `CMake must be installed` errors encountered during Kalign installation, including pip/conda installation and version checks to ensure compatibility (requires CMake 3.18+).

```bash
pip install cmake
# or
conda install cmake
# or system package manager
```

```bash
cmake --version
```

--------------------------------

### Customize Gap Penalties in kalign

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

This snippet demonstrates how to fine-tune the alignment process by adjusting gap penalties. It shows examples for creating more conservative alignments (fewer gaps) by increasing `gap_open` and `gap_extend` penalties, and more aggressive alignments (more gaps) by decreasing them. It also illustrates how to set `terminal_gap_extend` to control penalties for gaps at the ends of sequences.

```python
import kalign

# Conservative alignment (fewer gaps)
aligned = kalign.align(
    sequences,
    gap_open=-15.0,      # Higher penalty for opening gaps
    gap_extend=-2.0      # Higher penalty for extending gaps
)

# Aggressive alignment (more gaps)
aligned = kalign.align(
    sequences,
    gap_open=-5.0,       # Lower penalty for opening gaps
    gap_extend=-0.5      # Lower penalty for extending gaps
)

# Custom terminal gap handling
aligned = kalign.align(
    sequences,
    gap_open=-10.0,
    gap_extend=-1.0,
    terminal_gap_extend=0.0  # No penalty for terminal gaps
)
```

--------------------------------

### Set Up Kalign Python Bindings for Development

Source: https://github.com/timolassmann/kalign/blob/main/CONTRIBUTING.md

Instructions for setting up the Python development environment for Kalign. This involves navigating to the Python module directory, installing the package in editable mode for local development, and verifying the installation by importing the `kalign` module and printing its version.

```bash
cd python
pip install -e .
python -c "import kalign; print(kalign.__version__)"
```

--------------------------------

### Convert kalign Output to Biopython Objects

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

This example demonstrates how to convert a standard `kalign` alignment result (a list of aligned sequence strings) into a Biopython `MultipleSeqAlignment` object. This utility function highlights the flexibility of `kalign`'s output, allowing users to integrate it with other bioinformatics libraries even if the initial alignment wasn't requested in a specific format.

```python
import kalign

# Start with plain alignment
aligned = kalign.align(sequences)

# Convert to ecosystem objects when needed
def convert_to_biopython(aligned_seqs, ids=None):
    """Convert plain alignment to Biopython."""
    if ids is None:
        ids = [f"seq{i}" for i in range(len(aligned_seqs))]
    
    return kalign.align(
        [seq.replace('-', '') for seq in aligned_seqs],  # Remove gaps
        fmt="biopython",
        ids=ids
    )

# Usage
ids = ["sequence_1", "sequence_2", "sequence_3"]
aln_bp = convert_to_biopython(aligned, ids)
```

--------------------------------

### Download and Compile Kalign3 from GitHub

Source: https://github.com/timolassmann/kalign/blob/main/scripts/benchmark.org

Instructions to clone the Kalign3 source code from its GitHub repository, configure it for installation, compile the program, run self-checks, and finally install the executable into the benchmark's designated programs directory.

```bash
cd ~/kalignbenchmark/programs 
mkdir -p kalign3_src 
cd kalign3_src 
git clone https://github.com/TimoLassmann/kalign.git
cd kalign 
./autogen.sh 
./configure --bindir=$HOME/kalignbenchmark/programs
make 
make check  
make install
```

--------------------------------

### Verify Biopython Installation and Module Accessibility in Python

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md

This Python script checks if Biopython is correctly installed and if its core modules, such as `Bio.Align`, `Bio.SeqRecord`, and `Bio.Seq`, are accessible. It provides clear feedback on the installation status, helping to confirm readiness for Kalign integration.

```python
try:
    import Bio
    print(f"Biopython version: {Bio.__version__}")
    
    # Test specific modules
    from Bio.Align import MultipleSeqAlignment
    from Bio.SeqRecord import SeqRecord
    from Bio.Seq import Seq
    print("✅ Biopython modules accessible")
    
except ImportError as e:
    print(f"❌ Biopython issue: {e}")
```

--------------------------------

### Example Output: Kalign Python Performance Benchmarking Suite

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Console output from the Kalign performance benchmarking suite, detailing system information, thread scaling results, and a recommended optimal thread count.

```text
🔬 Kalign Performance Benchmarking Suite
📊 System Information:
   Platform: darwin
   CPU cores: 8
   Memory: 16.0 GB
   Kalign version: 3.4.1

Thread Scaling Benchmark
Testing 1 threads...
  Average: 0.245s (±0.003s)
  Speedup: 1.00x
  Efficiency: 100.0%

Testing 4 threads...
  Average: 0.089s (±0.002s)
  Speedup: 2.75x
  Efficiency: 68.8%

🎯 Recommended thread count: 4 (efficiency: 68.8%)
```

--------------------------------

### Configure Threading for kalign Alignments

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-quickstart.md

This example illustrates how to manage the number of CPU threads used by `kalign` for alignments. It shows setting a global default thread count using `kalign.set_num_threads` based on available CPU cores, and also how to override this setting for individual alignment calls using the `n_threads` parameter. The current default thread count can be queried with `kalign.get_num_threads`.

```python
import kalign
import os

# Set global thread count
cpu_count = os.cpu_count()
kalign.set_num_threads(cpu_count)

# All subsequent alignments use all CPUs
aligned1 = kalign.align(sequences1)
aligned2 = kalign.align(sequences2)

# Override for specific alignment
aligned3 = kalign.align(sequences3, n_threads=1)  # Single-threaded

# Check current setting
print(f"Using {kalign.get_num_threads()} threads by default")
```

--------------------------------

### Run MUSCLE and Kalign on Balifam Dataset

Source: https://github.com/timolassmann/kalign/blob/main/scripts/balibase_test.org

Illustrates a more complex benchmark scenario involving a Balifam dataset. It combines input files, runs MUSCLE, and then uses Kalign and "qscore" for evaluation. The "<<path>>" placeholder is for environment setup.

```bash
<<path>>

IN=~/kalignbenchmark/data/BB20021-package.vie
ADD=~/kalignbenchmark/data/bb3_release/RV20/BB20021.tfa
REF=~/kalignbenchmark/data/bb3_release/RV20/BB20021.msf

OUT=~/kalignbenchmark/scratch/BB20021_muscle.fa

cd ~/kalignbenchmark/scratch

cat $IN $ADD > test.fa
time muscle3.8.31_i86linux32 -maxiters 2 -in test.fa -out $OUT
kalign $REF -r -f fasta -o ref.fa
qscore -test $OUT -ref ref.fa
```

--------------------------------

### Resolve Kalign Package Installation Failures

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md

Addresses common issues leading to `pip install kalign` failures, including missing build tools and system dependencies on Linux, macOS, and Windows. Solutions involve updating pip components, installing system-level packages, or using conda-forge.

```bash
pip install --upgrade pip setuptools wheel
pip install --upgrade cmake scikit-build-core pybind11
pip install kalign
```

```bash
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install cmake build-essential

# CentOS/RHEL
sudo yum install cmake gcc gcc-c++ make
```

```bash
# Install Xcode command line tools
xcode-select --install

# Or install via Homebrew
brew install cmake
```

```bash
# Install Visual Studio Build Tools
# Or use conda-forge
conda install -c conda-forge kalign
```

--------------------------------

### Python Example for Distributed Kalign Alignment

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md

This Python function `distributed_alignment_example` showcases how to use the `DistributedKalign` class to process sequence alignments in a distributed manner. It generates sample DNA sequence groups, submits them to the distributed processor, waits for results, and reports on successful and failed tasks. Essential setup steps for Redis and Celery workers are also provided for running the distributed system.

```python
def distributed_alignment_example():
    """Example of distributed alignment processing."""
    
    # Initialize distributed processor
    distributed = DistributedKalign()
    
    # Create sequence groups for processing
    sequence_groups = []
    for i in range(10):
        group = [
            f"ATCGATCG{'A' * (i % 5)}TCGATCG",
            f"ATCGATCG{'T' * (i % 5)}TCGATCG",
            f"ATCGATCG{'C' * (i % 5)}TCGATCG",
            f"ATCGATCG{'G' * (i % 5)}TCGATCG"
        ]
        sequence_groups.append(group)
    
    # Submit batch
    task_ids = distributed.submit_batch(
        sequence_groups,
        seq_type="dna",
        n_threads=2
    )
    
    # Wait for completion
    results = distributed.wait_for_completion(task_ids, timeout=600)
    
    # Analyze results
    successful = [r for r in results if r['status'] == 'success']
    failed = [r for r in results if r['status'] == 'error']
    
    print(f"\nDistributed processing complete:")
    print(f"  Successful: {len(successful)}")
    print(f"  Failed: {len(failed)}")
    
    return results

# To run distributed processing:
# 1. Start Redis: redis-server
# 2. Start Celery workers: celery -A your_module worker --loglevel=info
# 3. Run: results = distributed_alignment_example()
```

--------------------------------

### Access Kalign module help documentation

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Runs a Python command directly from the shell to invoke the built-in `help()` function for the `kalign` module. This provides detailed information about its functions, classes, and methods, useful for quick reference.

```bash
python -c "import kalign; help(kalign)"
```

--------------------------------

### Install scikit-bio for Kalign Integration

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md

These commands provide methods to install scikit-bio: either directly via Kalign's extra dependencies using pip or through Conda, enabling scikit-bio-formatted output from Kalign.

```bash
pip install kalign[skbio]
# or
conda install -c conda-forge scikit-bio
```

--------------------------------

### Build Kalign from Source

Source: https://github.com/timolassmann/kalign/blob/main/CONTRIBUTING.md

Provides a step-by-step guide to compile the Kalign project from source. This includes cloning the repository, setting up the build environment with CMake, compiling the code, and running the integrated tests to ensure a successful build.

```bash
# Clone your fork
git clone https://github.com/yourusername/kalign.git
cd kalign

# Create build directory
mkdir build && cd build

# Configure and build
cmake ..
make

# Run tests
make test
```

--------------------------------

### Python: Local Development Setup for Kalign Module

Source: https://github.com/timolassmann/kalign/blob/main/README.md

Explains how to set up the Kalign Python module for local development using `uv pip install -e .`. This command installs the package in editable mode, allowing changes to be reflected without reinstallation.

```bash
uv pip install -e .
```

--------------------------------

### Run Kalign Python Performance Benchmarks Example

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Execute the `performance_benchmarks.py` script to perform comprehensive performance testing, including thread scaling and memory usage analysis for Kalign.

```bash
python performance_benchmarks.py
```

--------------------------------

### Python Example: Demonstrating Memory-Efficient Alignment Usage

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md

This example function demonstrates how to use the `memory_efficient_alignment` function. It generates a large synthetic dataset of sequences and then processes them in chunks, showcasing the memory monitoring capabilities. This helps in understanding how the chunking and garbage collection work in practice.

```python
def memory_usage_example():
    """Example of memory usage monitoring."""
    
    # Create large sequence set
    print("Creating large sequence set...")
    base_seq = "ATCGATCGATCG" * 100  # 1200 bp sequences
    large_sequence_set = []
    
    for i in range(500):  # 500 sequences
        seq = list(base_seq)
        # Add mutations
        n_mutations = len(seq) // 50  # 2% mutations
        positions = np.random.choice(len(seq), n_mutations, replace=False)
        for pos in positions:
            seq[pos] = np.random.choice(['A', 'T', 'C', 'G'])
        large_sequence_set.append(''.join(seq))
    
    print(f"Created {len(large_sequence_set)} sequences of length {len(large_sequence_set[0])}")
    
    # Process with memory monitoring
    aligned = memory_efficient_alignment(
        large_sequence_set, 
        chunk_size=100, 
        monitor_memory=True
    )
    
    print(f"Final result: {len(aligned)} aligned sequences")
    return aligned
```

--------------------------------

### Example Usage for Gap Penalty Optimization

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md

This Python snippet provides an example of input sequences that can be used with a gap penalty optimization function (e.g., `optimize_gap_penalties`). It demonstrates how to define test sequences for alignment and includes a commented-out line showing how the optimization function might be called.

```Python
test_sequences = [
    "ATCGATCGATCGATCGATCG",
    "ATCGATCGTCGATCGATCG",
    "ATCGATCGATCATCGATCG",
    "ATCGATCGAGCTCGATCG"
]

# best_params, param_results = optimize_gap_penalties(test_sequences)
```

--------------------------------

### Install BAliBASE data files

Source: https://github.com/timolassmann/kalign/blob/main/scripts/balibase_test.org

This script navigates to the home directory, creates a 'data' directory, and then downloads and extracts the BAliBASE_R1-5.tar.gz archive if it doesn't already exist. This is the first step to set up the BAliBASE test environment.

```sh
cd 
mkdir -p data
cd data
if [ ! -f BAliBASE_R1-5.tar.gz ]; then
    wget https://www.lbgi.fr/balibase/BalibaseDownload/BAliBASE_R1-5.tar.gz

fi
tar -zxvf  BAliBASE_R1-5.tar.gz
```

--------------------------------

### Example: Parallel Kalign Sequence Alignment with Thread Pool (Python)

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-performance.md

This example demonstrates how to use the `KalignThreadPool` class to perform parallel sequence alignment. It sets up multiple groups of sequences and processes them concurrently using a thread pool, showcasing the performance benefits and proper usage of the context manager.

```python
# Example usage
def parallel_alignment_example():
    """Example of parallel alignment processing."""
    
    # Create multiple sequence groups
    sequence_groups = []
    for group in range(5):
        group_sequences = [
            f"ATCGATCG{'A' * group}TCGATCG",
            f"ATCGATCG{'T' * group}TCGATCG", 
            f"ATCGATCG{'C' * group}TCGATCG"
        ]
        sequence_groups.append(group_sequences)
    
    print(f"Aligning {len(sequence_groups)} groups with thread pool...")
    
    start_time = time.time()
    
    with KalignThreadPool(max_workers=4) as pool:
        alignments = pool.align_multiple(sequence_groups, n_threads=1)
    
    end_time = time.time()
    
    print(f"Completed in {end_time - start_time:.2f} seconds")
    print(f"Processed {len(alignments)} alignments")
    
    return alignments

# alignments = parallel_alignment_example()
```

--------------------------------

### Troubleshoot Kalign ImportError on First Use

Source: https://github.com/timolassmann/kalign/blob/main/python-docs/python-troubleshooting.md

Helps resolve `ImportError` issues (e.g., DLL load failed on Windows, cannot find shared library on Linux) when importing Kalign. Solutions include verifying installation paths, reinstalling the package, checking dependencies, or trying alternative installation methods like conda-forge.

```python
import kalign
print(kalign.__version__)
print(kalign.__file__)
```

```bash
pip uninstall kalign
pip install kalign --no-cache-dir
```

```python
import numpy  # Should work
```

```bash
conda install -c conda-forge kalign
```

--------------------------------

### Running Kalign Project Tests

Source: https://github.com/timolassmann/kalign/blob/main/CONTRIBUTING.md

Provides command-line instructions for executing both C/C++ and Python test suites within the Kalign project. Users can navigate to the respective build or python directories and run the specified commands to verify functionality.

```Bash
# C/C++ tests
cd build
make test

# Python tests
cd python
python -m pytest
```

--------------------------------

### Example Output: Kalign Python Basic Sequence Alignment

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

Console output demonstrating the results of a basic Kalign sequence alignment, including input sequences, aligned sequences, and various alignment statistics.

```text
🧬 Kalign Python Examples
==================================================
Example 1: Basic Sequence Alignment
==================================================
Input sequences:
  Seq 1: ATCGATCGATCGATCG
  Seq 2: ATCGATCGTCGATCG
  Seq 3: ATCGATCGATCATCG
  Seq 4: ATCGATCGAGATCG

Aligned sequences:
  Seq 1: ATCGATCGATCGATCG
  Seq 2: ATCGATCG-TCGATCG
  Seq 3: ATCGATCGATC-ATCG
  Seq 4: ATCGATCGA--GATCG

Alignment statistics:
  Length: 16
  Gap fraction: 18.75%
  Conservation: 75.00%
  Average identity: 81.67%
```

--------------------------------

### Troubleshooting Kalign Python Optional Dependency Import Errors

Source: https://github.com/timolassmann/kalign/blob/main/python-examples/README.md

A common error message indicating missing optional dependencies for Kalign, prompting the user to install them for full functionality.

```text
Import errors for optional dependencies:
```