### Install microWakeWord and Dependencies (Python)

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

Installs the microWakeWord library and its dependencies, including forks of `pymicro-features` and `audio-metadata` for compatibility. It also clones the microWakeWord repository and installs it in editable mode. A session restart is recommended after installation.

```python
# Installs microWakeWord. Be sure to restart the session after this is finished.
import platform

if platform.system() == "Darwin":
    # `pymicro-features` is installed from a fork to support building on macOS
    !pip install 'git+https://github.com/puddly/pymicro-features@puddly/minimum-cpp-version'

# `audio-metadata` is installed from a fork to unpin `attrs` from a version that breaks Jupyter
!pip install 'git+https://github.com/whatsnowplaying/audio-metadata@d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f'

!git clone https://github.com/kahrendt/microWakeWord
!pip install -e ./microWakeWord

```

--------------------------------

### Initialize Feature Handler and Get Training Data

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Initializes a FeatureHandler object with configuration and retrieves training data. It supports data augmentation using SpecAugment policies for time and frequency masking.

```python
feature_handler = FeatureHandler(config)

# Get training batch with augmentation
train_data, train_labels, train_weights = feature_handler.get_data(
    mode="training",
    batch_size=128,
    features_length=150,
    truncation_strategy="default",
    augmentation_policy={
        "time_mask_max_size": 5,   # SpecAugment time masking
        "time_mask_count": 2,
        "freq_mask_max_size": 5,   # SpecAugment frequency masking
        "freq_mask_count": 2,
        "freq_mix_prob": 0.0,
        "mix_up_prob": 0.0,
    },
)
print(f"Batch shapes - data: {train_data.shape}, labels: {train_labels.shape}")

# Get validation data (no augmentation)
val_data, val_labels, val_weights = feature_handler.get_data(
    mode="validation",
    batch_size=128,
    features_length=150,
    truncation_strategy="truncate_start",
)

# Check dataset statistics
print(f"Training samples: {feature_handler.get_mode_size('training')}")
print(f"Validation duration: {feature_handler.get_mode_duration('validation_ambient'):.2f}s")
```

--------------------------------

### Generate Multiple Wake Word Samples (Python)

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

Generates a larger quantity of wake word samples (up to 1000) for model training. This script is the starting point for improving model performance. It suggests experimenting with noise parameters, generating negative samples, and using diverse phonetic pronunciations for the wake word.

```python
# Generates a larger amount of wake word samples.
# Start here when trying to improve your model.
# See https://github.com/rhasspy/piper-sample-generator for the full set of
# parameters. In particular, experiment with noise-scales and noise-scale-ws,
# generating negative samples similar to the wake word, and generating many more
# wake word samples, possibly with different phonetic pronunciations.

!python3 piper-sample-generator/generate_samples.py "{target_word}" \
--max-samples 1000 \
--batch-size 100 \
--output-dir generated_samples

```

--------------------------------

### Generate Single Wake Word Sample (Python)

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

Generates a single audio sample of a target wake word using the piper-sample-generator. It handles cloning the generator repository, downloading a pre-trained model, and installing necessary libraries like PyTorch. The generated sample is provided as an audio element for verification.

```python
# Generates 1 sample of the target word for manual verification.

target_word = 'khum_puter'  # Phonetic spellings may produce better samples

import os
import sys
import platform

from IPython.display import Audio

if not os.path.exists("./piper-sample-generator"):
    if platform.system() == "Darwin":
        !git clone -b mps-support https://github.com/kahrendt/piper-sample-generator
    else:
        !git clone https://github.com/rhasspy/piper-sample-generator

    !wget -O piper-sample-generator/models/en_US-libritts_r-medium.pt 'https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt'

    # Install system dependencies
    !pip install torch torchaudio piper-phonemize-cross==1.2.1

    if "piper-sample-generator/" not in sys.path:
        sys.path.append("piper-sample-generator/")

!python3 piper-sample-generator/generate_samples.py "{target_word}" \
--max-samples 1 \
--batch-size 1 \
--output-dir generated_samples

Audio("generated_samples/0.wav", autoplay=True)

```

--------------------------------

### Command Line Training and Evaluation

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Demonstrates how to train and evaluate wake word models using the command-line interface. It covers training new models, resuming from checkpoints, and evaluating existing models with various test options.

```bash
# Train a new wake word model with MixedNet architecture
python -m microwakeword.model_train_eval \
    --training_config='training_parameters.yaml' \
    --train 1 \
    --restore_checkpoint 0 \
    --test_tflite_streaming_quantized 1 \
    --use_weights "best_weights" \
    mixednet \
    --pointwise_filters "64,64,64,64" \
    --repeat_in_block "1,1,1,1" \
    --mixconv_kernel_sizes '[5], [7,11], [9,15], [23]' \
    --residual_connection "0,0,0,0" \
    --first_conv_filters 32 \
    --first_conv_kernel_size 5 \
    --stride 3

# Resume training from checkpoint
python -m microwakeword.model_train_eval \
    --training_config='training_parameters.yaml' \
    --train 1 \
    --restore_checkpoint 1 \
    --test_tflite_streaming_quantized 1 \
    mixednet \
    --pointwise_filters "48,48,48,48" \
    --stride 3

# Only evaluate an existing model (no training)
python -m microwakeword.model_train_eval \
    --training_config='training_parameters.yaml' \
    --train 0 \
    --test_tf_nonstreaming 1 \
    --test_tflite_nonstreaming 1 \
    --test_tflite_streaming_quantized 1 \
    --use_weights "best_weights" \
    mixednet
```

--------------------------------

### Audio Clip Loading and Management with Clips Class (Python)

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Shows how to use the `Clips` class to load, preprocess, and manage audio files for wake word training. It supports filtering by duration, splitting datasets, removing silence, and repeating short clips. This class is essential for preparing audio data.

```python
from microwakeword.audio.clips import Clips

# Load wake word audio samples from a directory
clips = Clips(
    input_directory="generated_samples",
    file_pattern="*.wav",
    min_clip_duration_s=0.5,        # Filter out clips shorter than 0.5s
    max_clip_duration_s=3.0,        # Filter out clips longer than 3s
    repeat_clip_min_duration_s=2.0, # Repeat short clips until they reach 2s
    remove_silence=True,            # Use WebRTC VAD to trim silence
    random_split_seed=42,           # Seed for reproducible train/test splits
    split_count=0.1,                # 10% for test and validation each
    trimmed_clip_duration_s=None,   # Optional: trim clips to fixed duration
    trim_zeros=False                # Optional: remove leading/trailing zeros
)

# Get a random audio clip as numpy array
random_clip = clips.get_random_clip()
print(f"Clip shape: {random_clip.shape}, dtype: {random_clip.dtype}")

# Iterate over all clips in a specific split
for clip_audio in clips.audio_generator(split="train", repeat=1):
    print(f"Processing clip with {len(clip_audio)} samples")

# Use a generator for random sampling during training
for clip_audio in clips.random_audio_generator(max_clips=100):
    # Process each random clip
    pass
```

--------------------------------

### Download and Prepare Audioset Dataset

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This code snippet downloads the Audioset dataset, extracts it, and converts FLAC audio files to 16-bit PCM WAV format at a 16000Hz sampling rate. It handles directory creation and uses `wget` for downloading and `tar` for extraction.

```python
import os
import datasets
import scipy.io.wavfile
import numpy as np
from pathlib import Path
from tqdm import tqdm

if not os.path.exists("audioset"):
    os.mkdir("audioset")

    fname = "bal_train09.tar"
    out_dir = f"audioset/{fname}"
    link = "https://huggingface.co/datasets/agkphysics/AudioSet/resolve/main/data/" + fname
    !wget -O {out_dir} {link}
    !cd audioset && tar -xf bal_train09.tar

    output_dir = "./audioset_16k"
    if not os.path.exists(output_dir):
        os.mkdir(output_dir)

    # Save clips to 16-bit PCM wav files
    audioset_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("audioset/audio").glob("**/*.flac")]})
    audioset_dataset = audioset_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000))
    for row in tqdm(audioset_dataset):
        name = row['audio']['path'].split('/')[-1].replace(".flac", ".wav")
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))
```

--------------------------------

### Configure and Apply Audio Augmentation Pipeline

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Sets up an audio augmentation pipeline with various effects like EQ, distortion, pitch shift, noise, and background sounds. It then loads audio clips, applies augmentation, and saves the result. This is useful for increasing the diversity of training data.

```python
from microwakeword.audio.augmentation import Augmentation
from microwakeword.audio.clips import Clips
from microwakeword.audio.audio import save_clip

# Configure augmentation pipeline
augmenter = Augmentation(
    augmentation_duration_s=3.2,  # Output duration in seconds
    augmentation_probabilities={
        "SevenBandParametricEQ": 0.1,   # Random EQ adjustments
        "TanhDistortion": 0.1,           # Soft clipping distortion
        "PitchShift": 0.1,               # Pitch variation +-3 semitones
        "BandStopFilter": 0.1,           # Notch filtering
        "AddColorNoise": 0.25,           # Pink/brown/white noise
        "AddBackgroundNoise": 0.75,      # Mix in background audio
        "Gain": 1.0,                     # Random gain adjustment
        "GainTransition": 0.25,          # Gradual volume changes
        "RIR": 0.5,                       # Room impulse response reverb
    },
    impulse_paths=["mit_rirs"],           # Directories with IR wav files
    background_paths=["fma_16k", "audioset_16k"],  # Background audio dirs
    background_min_snr_db=-5,
    background_max_snr_db=10,
    min_gain_db=-45,
    max_gain_db=0,
    min_jitter_s=0.195,  # Random padding before wake word
    max_jitter_s=0.205,
)

# Load clips and apply augmentation
clips = Clips(input_directory="generated_samples", file_pattern="*.wav")
random_clip = clips.get_random_clip()
augmented_clip = augmenter.augment_clip(random_clip)

# Save augmented audio for verification
save_clip(augmented_clip, "augmented_output.wav")

# Use generator for batch processing
clip_generator = clips.audio_generator()
for augmented in augmenter.augment_generator(clip_generator):
    # Process each augmented clip
    pass
```

--------------------------------

### Download and Prepare Free Music Archive (FMA) Dataset

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This code snippet downloads the Free Music Archive (FMA) extra small dataset, extracts the zip file, and converts MP3 audio files to 16-bit PCM WAV format at a 16000Hz sampling rate. It manages directory creation and uses `wget` for downloading and `unzip` for extraction.

```python
import os
import datasets
import scipy.io.wavfile
import numpy as np
from pathlib import Path
from tqdm import tqdm

output_dir = "./fma"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    fname = "fma_xs.zip"
    link = "https://huggingface.co/datasets/mchl914/fma_xsmall/resolve/main/" + fname
    out_dir = f"fma/{fname}"
    !wget -O {out_dir} {link}
    !cd {output_dir} && unzip -q {fname}

    output_dir = "./fma_16k"
    if not os.path.exists(output_dir):
        os.mkdir(output_dir)

    # Save clips to 16-bit PCM wav files
    fma_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("fma/fma_small").glob("**/*.mp3")]})
    fma_dataset = fma_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000))
    for row in tqdm(fma_dataset):
        name = row['audio']['path'].split('/')[-1].replace(".mp3", ".wav")
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))
```

--------------------------------

### Save YAML Training Configuration for Micro Wake Word

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This script defines and saves a YAML configuration file that controls the training process for the micro wake word model. It includes parameters such as window step, training directory, and detailed feature set configurations including sampling weights, penalty weights, truth labels, and truncation strategies. These hyperparameters significantly impact model quality.

```python
# Save a yaml config that controls the training process
# These hyperparamters can make a huge different in model quality.
# Experiment with sampling and penalty weights and increasing the number of
# training steps.

import yaml
import os

config = {}

config["window_step_ms"] = 10

config["train_dir"] = (
    "trained_models/wakeword"
)


# Each feature_dir should have at least one of the following folders with this structure:
#  training/
#    ragged_mmap_folders_ending_in_mmap
#  testing/
#    ragged_mmap_folders_ending_in_mmap
#  testing_ambient/
#    ragged_mmap_folders_ending_in_mmap
#  validation/
#    ragged_mmap_folders_ending_in_mmap
#  validation_ambient/
#    ragged_mmap_folders_ending_in_mmap
#
#  sampling_weight: Weight for choosing a spectrogram from this set in the batch
#  penalty_weight: Penalizing weight for incorrect predictions from this set
#  truth: Boolean whether this set has positive samples or negative samples
#  truncation_strategy = If spectrograms in the set are longer than necessary for training, how are they truncated
#       - random: choose a random portion of the entire spectrogram - useful for long negative samples
#       - truncate_start: remove the start of the spectrogram
#       - truncate_end: remove the end of the spectrogram
#       - split: Split the longer spectrogram into separate spectrograms offset by 100 ms. Only for ambient sets

config["features"] = [
    {
        "features_dir": "generated_augmented_features",
        "sampling_weight": 2.0,
        "penalty_weight": 1.0,
        "truth": True,
        "truncation_strategy": "truncate_start",
        "type": "mmap",
    },
    {
        "features_dir": "negative_datasets/speech",
        "sampling_weight": 10.0,
        "penalty_weight": 1.0,
        "truth": False,
        "truncation_strategy": "random",
        "type": "mmap",
    },
    {
        "features_dir": "negative_datasets/dinner_party",
        "sampling_weight": 10.0,
        "penalty_weight": 1.0,
        "truth": False,
        "truncation_strategy": "random",
        "type": "mmap",
    },
    {
        "features_dir": "negative_datasets/no_speech",
        "sampling_weight": 5.0,
        "penalty_weight": 1.0,
        "truth": False,
        "truncation_strategy": "random",
        "type": "mmap",
    },
    {
        "features_dir": "negative_datasets/dinner_party_eval",
        "sampling_weight": 0.0,
        "penalty_weight": 1.0,
        "truth": False,
        "truncation_strategy": "split",
        "type": "mmap",
    },
]

# Number of training steps in each iteration - various other settings are configured as lists that corresponds to different steps
config["training_steps"] = [10000]

# To save this config to a YAML file:
# with open('config.yaml', 'w') as f:
#     yaml.dump(config, f, default_flow_style=False)

```

--------------------------------

### Configure Audio Augmentation Pipeline

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This Python code sets up the audio augmentation pipeline using classes from the `microwakeword.audio` module. It defines parameters for clip loading, augmentation types, and background noise/reverb integration, aiming to improve model robustness.

```python
# Sets up the augmentations.
# To improve your model, experiment with these settings and use more sources of
# background clips.

from microwakeword.audio.augmentation import Augmentation
from microwakeword.audio.clips import Clips
from microwakeword.audio.spectrograms import SpectrogramGeneration

clips = Clips(input_directory='generated_samples',
              file_pattern='*.wav',
              max_clip_duration_s=None,
              remove_silence=False,
              random_split_seed=10,
              split_count=0.1,
              )
augmenter = Augmentation(augmentation_duration_s=3.2,
                         augmentation_probabilities = {
                                "SevenBandParametricEQ": 0.1,
                                "TanhDistortion": 0.1,
                                "PitchShift": 0.1,
                                "BandStopFilter": 0.1,
                                "AddColorNoise": 0.1,
                                "AddBackgroundNoise": 0.75,
                                "Gain": 1.0,
                                "RIR": 0.5,
                            },
                         impulse_paths = ['mit_rirs'],
                         background_paths = ['fma_16k', 'audioset_16k'],
                         background_min_snr_db = -5,
                         background_max_snr_db = 10,
                         min_jitter_s = 0.195,
                         max_jitter_s = 0.205,
                         )

```

--------------------------------

### Train and Evaluate Micro Wake Word Model

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This command initiates the training and evaluation process for a micro wake word model using a specified configuration file. It supports resuming training from checkpoints, testing various model formats (TFLite, quantized, streaming), and uses the best-weighted model for final testing. The command also includes parameters for model architecture, such as filter counts and kernel sizes.

```bash
# Trains a model. When finished, it will quantize and convert the model to a
# streaming version suitable for on-device detection.
# It will resume if stopped, but it will start over at the configured training
# steps in the yaml file.
# Change --train 0 to only convert and test the best-weighted model.
# On Google colab, it doesn't print the mini-batch results, so it may appear
# stuck for several minutes! Additionally, it is very slow compared to training
# on a local GPU.

!python -m microwakeword.model_train_eval \
--training_config='training_parameters.yaml' \
--train 1 \
--restore_checkpoint 1 \
--test_tf_nonstreaming 0 \
--test_tflite_nonstreaming 0 \
--test_tflite_nonstreaming_quantized 0 \
--test_tflite_streaming 0 \
--test_tflite_streaming_quantized 1 \
--use_weights "best_weights" \
mixednet \
--pointwise_filters "64,64,64,64" \
--repeat_in_block  "1, 1, 1, 1" \
--mixconv_kernel_sizes '[5], [7,11], [9,15], [23]' \
--residual_connection "0,0,0,0" \
--first_conv_filters 32 \
--first_conv_kernel_size 5 \
--stride 3

```

--------------------------------

### Download Negative Datasets for Micro Wake Word

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This script downloads pre-generated spectrogram features for various negative datasets used in the micro wake word project. It creates a 'negative_datasets' directory and uses `wget` and `unzip` to download and extract zip files from Hugging Face. This process can be time-consuming.

```python
# Downloads pre-generated spectrogram features (made for microWakeWord in
# particular) for various negative datasets. This can be slow!

import os

output_dir = './negative_datasets'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    link_root = "https://huggingface.co/datasets/kahrendt/microwakeword/resolve/main/"
    filenames = ['dinner_party.zip', 'dinner_party_eval.zip', 'no_speech.zip', 'speech.zip']
    for fname in filenames:
        link = link_root + fname

        zip_path = f"negative_datasets/{fname}"
        !wget -O {zip_path} {link}
        !unzip -q {zip_path} -d {output_dir}
```

--------------------------------

### Training Configuration YAML

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Defines the training parameters for wake word models using a YAML file. It specifies feature set configurations, training hyperparameters, optimization metrics, and SpecAugment settings.

```yaml
# training_parameters.yaml
window_step_ms: 10
train_dir: "trained_models/my_wake_word"

features:
  - features_dir: "features/positive_samples"
    sampling_weight: 2.0
    penalty_weight: 1.0
    truth: true
    truncation_strategy: "truncate_start"
    type: "mmap"

  - features_dir: "features/negative_speech"
    sampling_weight: 10.0
    penalty_weight: 1.0
    truth: false
    truncation_strategy: "random"
    type: "mmap"

  - features_dir: "features/ambient_noise"
    sampling_weight: 5.0
    penalty_weight: 1.0
    truth: false
    truncation_strategy: "random"
    type: "mmap"

# Training schedule (lists correspond to training phases)
training_steps: [10000, 5000]
learning_rates: [0.001, 0.0001]
positive_class_weight: [1, 1]
negative_class_weight: [20, 20]

# SpecAugment settings per phase
time_mask_max_size: [5, 3]
time_mask_count: [2, 1]
freq_mask_max_size: [5, 3]
freq_mask_count: [2, 1]

batch_size: 128
eval_step_interval: 500
clip_duration_ms: 1500

# Best model selection criteria
target_minimization: 0.9
minimization_metric: "ambient_false_positives_per_hour"
maximization_metric: "average_viable_recall"
```

--------------------------------

### Configure Training Parameters for Micro Wake Word

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This Python code snippet configures various training parameters for a micro wake word model, including class weights, learning rates, batch size, SpecAugment settings, evaluation intervals, clip duration, and model selection metrics. The configuration is then saved to a YAML file named 'training_parameters.yaml'.

```python
import os
import yaml

config = {}

config["positive_class_weight"] = [1]
config["negative_class_weight"] = [20]

config["learning_rates"] = [
    0.001,
]  # Learning rates for Adam optimizer - list that corresponds to training steps
config["batch_size"] = 128

config["time_mask_max_size"] = [
    0
]  # SpecAugment - list that corresponds to training steps
config["time_mask_count"] = [0]  # SpecAugment - list that corresponds to training steps
config["freq_mask_max_size"] = [
    0
]  # SpecAugment - list that corresponds to training steps
config["freq_mask_count"] = [0]  # SpecAugment - list that corresponds to training steps

config["eval_step_interval"] = (
    500  # Test the validation sets after every this many steps
)
config["clip_duration_ms"] = (
    1500  # Maximum length of wake word that the streaming model will accept
)

# The best model weights are chosen first by minimizing the specified minimization metric below the specified target_minimization
# Once the target has been met, it chooses the maximum of the maximization metric. Set 'minimization_metric' to None to only maximize
# Available metrics:
#   - "loss" - cross entropy error on validation set
#   - "accuracy" - accuracy of validation set
#   - "recall" - recall of validation set
#   - "precision" - precision of validation set
#   - "false_positive_rate" - false positive rate of validation set
#   - "false_negative_rate" - false negative rate of validation set
#   - "ambient_false_positives" - count of false positives from the split validation_ambient set
#   - "ambient_false_positives_per_hour" - estimated number of false positives per hour on the split validation_ambient set
config["target_minimization"] = 0.9
config["minimization_metric"] = None  # Set to None to disable

config["maximization_metric"] = "average_viable_recall"

with open(os.path.join("training_parameters.yaml"), "w") as file:
    documents = yaml.dump(config, file)

```

--------------------------------

### Generate Spectrograms with Sliding Window for Streaming Inference

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Configures `SpectrogramGeneration` to create spectrogram features from audio clips, incorporating augmentation. It utilizes a sliding window approach (`slide_frames`) to simulate streaming inference and generates spectrograms for training data in RaggedMmap format.

```python
from microwakeword.audio.spectrograms import SpectrogramGeneration
from microwakeword.audio.augmentation import Augmentation
from microwakeword.audio.clips import Clips
from mmap_ninja.ragged import RaggedMmap
import os

# Setup clips and augmentation
clips = Clips(
    input_directory="generated_samples",
    file_pattern="*.wav",
    random_split_seed=42,
    split_count=0.1
)

augmenter = Augmentation(
    augmentation_duration_s=3.2,
    impulse_paths=["mit_rirs"],
    background_paths=["audioset_16k"],
)

# Configure spectrogram generation with sliding window
# slide_frames simulates streaming inference by generating multiple
# spectrograms from the same audio shifted by one frame each
spectrogram_gen = SpectrogramGeneration(
    clips=clips,
    augmenter=augmenter,
    step_ms=10,           # Window step size for features
    slide_frames=10,      # Generate 10 shifted versions per clip
    split_spectrogram_duration_s=None,  # Optional: split into fixed chunks
)

# Get a single random spectrogram
spectrogram = spectrogram_gen.get_random_spectrogram()
print(f"Spectrogram shape: {spectrogram.shape}")  # (time_steps, 40)

# Generate and save spectrograms to RaggedMmap format for training
output_dir = "features/training/wakeword_mmap"
os.makedirs(output_dir, exist_ok=True)

RaggedMmap.from_generator(
    out_dir=output_dir,
    sample_generator=spectrogram_gen.spectrogram_generator(
        split="train",  # Use training split
        repeat=2        # Repeat clips twice with different augmentations
    ),
    batch_size=100,
    verbose=True,
)
```

--------------------------------

### Augment and Play a Random Audio Clip

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This snippet demonstrates how to augment a randomly selected audio clip using the configured augmenter and then play it back. It saves the augmented clip to a WAV file and uses `IPython.display.Audio` for playback within an environment like a Jupyter notebook.

```python
# Augment a random clip and play it back to verify it works well

from IPython.display import Audio
from microwakeword.audio.audio_utils import save_clip

random_clip = clips.get_random_clip()
augmented_clip = augmenter.augment_clip(random_clip)
save_clip(augmented_clip, 'augmented_clip.wav')

Audio("augmented_clip.wav", autoplay=True)

```

--------------------------------

### Compute Wake Word Detection Metrics

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Calculates and prints key performance metrics such as accuracy, recall, precision, and false positive rate for wake word detection models. It takes true positives, true negatives, false positives, and false negatives as input.

```python
import numpy as np
from microwakeword.test import (
    compute_metrics,
    compute_false_accepts_per_hour,
    generate_roc_curve,
    tflite_streaming_model_roc
)
from microwakeword.inference import Model

# Compute classification metrics
metrics = compute_metrics(
    true_positives=950,
    true_negatives=9800,
    false_positives=200,
    false_negatives=50
)
print(f"Accuracy: {metrics['accuracy']:.4f}")
print(f"Recall: {metrics['recall']:.4f}")
print(f"Precision: {metrics['precision']:.4f}")
print(f"False Positive Rate: {metrics['false_positive_rate']:.4f}")

# Compute false accepts per hour from streaming probabilities
# streaming_probs_list: List of probability arrays from negative audio clips
streaming_probs_list = [
    np.random.random(1000),  # Simulated probabilities
    np.random.random(2000),
]
cutoffs = np.arange(0, 1.01, 0.01)

faph = compute_false_accepts_per_hour(
    streaming_probabilities_list=streaming_probs_list,
    cutoffs=cutoffs,
    ignore_slices_after_accept=75,  # Cooldown after detection
    stride=3,
    step_s=0.02,
)
print(f"False accepts/hour at 0.5 threshold: {faph[50]:.2f}")

# Generate ROC curve coordinates
false_rejections = np.linspace(0, 1, len(cutoffs))  # Example data
x_coords, y_coords, cutoffs_at_points = generate_roc_curve(
    false_accepts_per_hour=faph,
    false_rejections=false_rejections,
    cutoffs=cutoffs,
    max_faph=2.0,
)

# Area under curve
auc = np.trapz(y_coords, x_coords)
print(f"ROC AUC: {auc:.4f}")
```

--------------------------------

### Download Audio Data for Augmentation (Python)

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

Downloads Room Impulse Response (RIR) data from the MIT environmental impulse responses dataset and noise/background audio from the AudioSet dataset for data augmentation. The RIR data is saved as 16-bit PCM WAV files. Note that the downloaded data has mixed licenses, making models trained with it suitable only for non-commercial personal use.

```python
# Downloads audio data for augmentation. This can be slow!
# Borrowed from openWakeWord's automatic_model_training.ipynb, accessed March 4, 2024
#
# **Important note!** The data downloaded here has a mixture of difference
# licenses and usage restrictions. As such, any custom models trained with this
# data should be considered as appropriate for **non-commercial** personal use only.


import datasets
import scipy
import os

import numpy as np

from pathlib import Path
from tqdm import tqdm

## Download MIR RIR data

output_dir = "./mit_rirs"
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    rir_dataset = datasets.load_dataset("davidscripka/MIT_environmental_impulse_responses", split="train", streaming=True)
    # Save clips to 16-bit PCM wav files
    for row in tqdm(rir_dataset):
        name = row['audio']['path'].split('/')[-1]
        scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16))

## Download noise and background audio

# Audioset Dataset (https://research.google.com/audioset/dataset/index.html)
# Download one part of the audioset .tar files, extract, and convert to 16khz
# For full-scale training, it's recommended to download the entire dataset from
# https://huggingface.co/datasets/agkphysics/AudioSet, and

```

--------------------------------

### Audio Data Augmentation with Augmentation Class (Python)

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Illustrates the use of the `Augmentation` class to apply various transformations to audio data, enhancing model robustness and generalization. Supported augmentations include EQ, distortion, pitch shifting, and noise addition. This class is typically used in conjunction with the `Clips` class.

```python
from microwakeword.audio.augmentation import Augmentation
from microwakeword.audio.clips import Clips
from microwakeword.audio.audio_utils import save_clip

# Example usage (assuming clips and augmentation objects are initialized)
# augmentation = Augmentation(...)
# clips = Clips(...)
# for clip_audio in clips.audio_generator(split="train"):
#     augmented_clip = augmentation.augment(clip_audio)
#     save_clip(augmented_clip, "augmented_output.wav")
```

--------------------------------

### Audio Utilities for Feature Generation and Processing

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Provides functions for generating spectrogram features from audio, saving audio clips, and removing silence using WebRTC VAD. It utilizes `scipy.io.wavfile` for audio loading and `microwakeword.audio.audio_utils` for processing.

```python
import numpy as np
from scipy.io import wavfile
from microwakeword.audio.audio_utils import (
    generate_features_for_clip,
    save_clip,
    remove_silence_webrtc
)

# Load audio file
sample_rate, audio_data = wavfile.read("wake_word_sample.wav")

# Generate spectrogram features using micro_speech preprocessor
# Returns 40-channel features compatible with TFLite Micro
spectrogram = generate_features_for_clip(
    audio_samples=audio_data,
    step_ms=20,      # Window step size (10 or 20 ms typical)
    use_c=True       # Use C implementation (faster)
)
print(f"Spectrogram shape: {spectrogram.shape}")  # (time_steps, 40)

# Remove silence from audio using WebRTC VAD
trimmed_audio = remove_silence_webrtc(
    audio_data=audio_data.astype(np.float32) / 32768.0,  # Normalize to float
    frame_duration=0.030,  # 30ms frames
    sample_rate=16000,
    min_start=2000,        # Keep at least first 2000 samples
)

# Save processed audio
save_clip(
    audio_samples=trimmed_audio,
    output_file="processed_wake_word.wav"
)
```

--------------------------------

### TFLite Model Inference for Wake Word Detection (Python)

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Demonstrates how to load a trained TensorFlow Lite wake word model and perform inference on an audio clip. It handles audio loading, feature extraction, and probability thresholding for wake word detection. Requires numpy, scipy, and the microwakeword library.

```python
import numpy as np
from scipy.io import wavfile
from microwakeword.inference import Model

# Load a trained TFLite wake word model
model = Model(
    tflite_model_path="trained_models/wakeword/tflite_stream_state_internal_quant/stream_state_internal_quant.tflite",
    stride=3  # Match the stride used during training
)

# Load audio file (must be 16kHz, 16-bit PCM)
sample_rate, audio_data = wavfile.read("test_audio.wav")
assert sample_rate == 16000, "Audio must be 16kHz"

# Run inference on the audio clip
# Returns a list of probabilities for each inference window
probabilities = model.predict_clip(audio_data, step_ms=20)

# Check if wake word was detected (probability > threshold)
threshold = 0.5
for i, prob in enumerate(probabilities):
    if prob > threshold:
        print(f"Wake word detected at window {i} with probability {prob:.3f}")

# Alternative: Run inference directly on a spectrogram
from microwakeword.audio.audio_utils import generate_features_for_clip
spectrogram = generate_features_for_clip(audio_data, step_ms=20)
probabilities = model.predict_spectrogram(spectrogram)
```

--------------------------------

### Manage Training Data with FeatureHandler

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Initializes the `FeatureHandler` class for managing spectrogram features during model training. It configures multiple feature sets (positive wake word, negative speech, ambient) with different sampling weights and truncation strategies, supporting both mmap files and on-the-fly generation.

```python
from microwakeword.data import FeatureHandler

# Configuration for feature loading
config = {
    "stride": 3,
    "window_step_ms": 10,
    "batch_size": 128,
    "spectrogram_length": 150,  # Number of time frames
    "features": [
        {
            "type": "mmap",
            "features_dir": "features/positive",
            "truth": True,              # Positive wake word samples
            "sampling_weight": 2.0,     # Higher weight = sampled more often
            "penalty_weight": 1.0,      # Weight for incorrect predictions
            "truncation_strategy": "truncate_start",
        },
        {
            "type": "mmap",
            "features_dir": "features/negative_speech",
            "truth": False,             # Negative samples (not wake word)
            "sampling_weight": 10.0,
            "penalty_weight": 1.0,
            "truncation_strategy": "random",
        },
        {
            "type": "mmap",
            "features_dir": "features/ambient",
            "truth": False,
            "sampling_weight": 0.0,     # Only used for validation/testing
            "penalty_weight": 1.0,
            "truncation_strategy": "split",  # Split long clips for ambient
        },
    ],
}

# Example of initializing FeatureHandler (actual usage would involve training loop)
# feature_handler = FeatureHandler(config=config)
```

--------------------------------

### Save Model Summary to Text File (Python)

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Saves a summary of a trained streaming model to a specified text file. This function is useful for documenting model architecture and parameters during the training process.

```python
save_model_summary(
    model=streaming_model,
    path="trained_models/my_wake_word/stream_state_internal",
    file_name="model_summary.txt"
)
```

--------------------------------

### Save Augmented Training, Validation, and Test Sets

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This code segment outlines the process of augmenting samples and saving them into distinct training, validation, and testing sets. It highlights the importance of using realistic or differently generated samples for validation and testing to ensure accurate model benchmarking.

```python
# Augment samples and save the training, validation, and testing sets.
# Validating and testing samples generated the same way can make the model
# benchmark better than it performs in real-word use. Use real samples or TTS
# samples generated with a different TTS engine to potentially get more accurate

```

--------------------------------

### Convert Keras Models to TFLite with TensorFlow

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Provides utilities to convert trained Keras models into streaming TFLite format, with options for quantization. It supports saving both streaming and non-streaming inference versions and includes steps for feature handling and quantization calibration.

```python
import tensorflow as tf
from microwakeword.utils import (
    convert_model_saved,
    convert_saved_model_to_tflite,
    to_streaming_inference,
    save_model_summary
)
from microwakeword.layers.modes import Modes
from microwakeword.data import FeatureHandler

# Configuration
config = {
    "train_dir": "trained_models/my_wake_word",
    "stride": 3,
    "spectrogram_length": 150,
    "batch_size": 1,
}

# Assume 'model' is a trained Keras model
# Convert to streaming SavedModel format
streaming_model = convert_model_saved(
    model=model,
    config=config,
    folder="stream_state_internal",
    mode=Modes.STREAM_INTERNAL_STATE_INFERENCE,
)

# Also save non-streaming version
nonstreaming_model = convert_model_saved(
    model=model,
    config=config,
    folder="non_stream",
    mode=Modes.NON_STREAM_INFERENCE,
)

# Initialize feature handler for quantization calibration
feature_handler = FeatureHandler(config)

# Convert to quantized TFLite
convert_saved_model_to_tflite(
    config=config,
    audio_processor=feature_handler,
    path_to_model="trained_models/my_wake_word/stream_state_internal",
    folder="trained_models/my_wake_word/tflite_quant",
    fname="wake_word_quantized.tflite",
    quantize=True,  # Full int8 quantization
)

# Convert to non-quantized TFLite
convert_saved_model_to_tflite(
    config=config,
    audio_processor=feature_handler,
    path_to_model="trained_models/my_wake_word/stream_state_internal",
    folder="trained_models/my_wake_word/tflite",
    fname="wake_word.tflite",
    quantize=False,
)

```

--------------------------------

### Generate Augmented Spectrogram Features for Micro Wake Word

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This script generates augmented spectrogram features for training, validation, and testing of the micro wake word model. It creates output directories and utilizes `SpectrogramGeneration` to produce spectrograms, which are then saved using `RaggedMmap`. The `slide_frames` parameter is adjusted for different splits to simulate streaming inference.

```python
import os
from mmap_ninja.ragged import RaggedMmap

output_dir = 'generated_augmented_features'

if not os.path.exists(output_dir):
    os.mkdir(output_dir)

splits = ["training", "validation", "testing"]
for split in splits:
  out_dir = os.path.join(output_dir, split)
  if not os.path.exists(out_dir):
      os.mkdir(out_dir)


  split_name = "train"
  repetition = 2

  spectrograms = SpectrogramGeneration(clips=clips,
                                     augmenter=augmenter,
                                     slide_frames=10,    # Uses the same spectrogram repeatedly, just shifted over by one frame. This simulates the streaming inferences while training/validating in nonstreaming mode.
                                     step_ms=10,
                                     )
  if split == "validation":
    split_name = "validation"
    repetition = 1
  elif split == "testing":
    split_name = "test"
    repetition = 1
    spectrograms = SpectrogramGeneration(clips=clips,
                                     augmenter=augmenter,
                                     slide_frames=1,    # The testing set uses the streaming version of the model, so no artificial repetition is necessary
                                     step_ms=10,
                                     )

  RaggedMmap.from_generator(
      out_dir=os.path.join(out_dir, 'wakeword_mmap'),
      sample_generator=spectrograms.spectrogram_generator(split=split_name, repeat=repetition),
      batch_size=100,
      verbose=True,
  )
```

--------------------------------

### Download Trained TFLite Micro Wake Word Model

Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb

This Python code snippet uses the `google.colab.files` module to download the trained TFLite streaming model file. The downloaded model can be used on-device, and further instructions for creating a Model JSON file and adjusting probability thresholds are provided in the accompanying documentation.

```python
# Downloads the tflite model file. To use on the device, you need to write a
# Model JSON file. See https://esphome.io/components/micro_wake_word for the
# documentation and
# https://github.com/esphome/micro-wake-word-models/tree/main/models/v2 for
# examples. Adjust the probability threshold based on the test results obtained
# after training is finished. You may also need to increase the Tensor arena
# model size if the model fails to load.

from google.colab import files

files.download(f"trained_models/wakeword/tflite_stream_state_internal_quant/stream_state_internal_quant.tflite")

```

--------------------------------

### Define MixedNet Model Architecture with TensorFlow

Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt

Constructs the MixedNet model architecture using TensorFlow, supporting configurable layers, residual connections, and spatial attention. It parses command-line arguments for model configuration and calculates the required spectrogram length.

```python
import tensorflow as tf
from microwakeword.mixednet import model, model_parameters, spectrogram_slices_dropped
import argparse

# Create argument parser with model parameters
parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers(dest="model_name")
parser_mixednet = subparsers.add_parser("mixednet")
model_parameters(parser_mixednet)

# Parse model configuration
flags = parser.parse_args([
    "mixednet",
    "--pointwise_filters", "64,64,64,64",
    "--repeat_in_block", "1,1,1,1",
    "--mixconv_kernel_sizes", "[5], [7,11], [9,15], [23]",
    "--residual_connection", "0,0,0,0",
    "--first_conv_filters", "32",
    "--first_conv_kernel_size", "5",
    "--stride", "3",
    "--pooled", "0",
    "--max_pool", "0",
    "--spatial_attention", "0",
])

# Calculate spectrogram length needed for the model
slices_dropped = spectrogram_slices_dropped(flags)
print(f"Spectrogram slices dropped due to valid padding: {slices_dropped}")

# Build the model
input_shape = (150, 40)  # (time_steps, features)
batch_size = 128

wake_word_model = model(
    flags=flags,
    shape=input_shape,
    batch_size=batch_size
)

wake_word_model.summary()

# Model outputs probability between 0 and 1
# Input: (batch, time, features)
# Output: (batch, 1) - wake word probability

```

=== COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.