### Install microWakeWord and Dependencies (Python) Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb Installs the microWakeWord library and its dependencies, including forks of `pymicro-features` and `audio-metadata` for compatibility. It also clones the microWakeWord repository and installs it in editable mode. A session restart is recommended after installation. ```python # Installs microWakeWord. Be sure to restart the session after this is finished. import platform if platform.system() == "Darwin": # `pymicro-features` is installed from a fork to support building on macOS !pip install 'git+https://github.com/puddly/pymicro-features@puddly/minimum-cpp-version' # `audio-metadata` is installed from a fork to unpin `attrs` from a version that breaks Jupyter !pip install 'git+https://github.com/whatsnowplaying/audio-metadata@d4ebb238e6a401bb1a5aaaac60c9e2b3cb30929f' !git clone https://github.com/kahrendt/microWakeWord !pip install -e ./microWakeWord ``` -------------------------------- ### Initialize Feature Handler and Get Training Data Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Initializes a FeatureHandler object with configuration and retrieves training data. It supports data augmentation using SpecAugment policies for time and frequency masking. ```python feature_handler = FeatureHandler(config) # Get training batch with augmentation train_data, train_labels, train_weights = feature_handler.get_data( mode="training", batch_size=128, features_length=150, truncation_strategy="default", augmentation_policy={ "time_mask_max_size": 5, # SpecAugment time masking "time_mask_count": 2, "freq_mask_max_size": 5, # SpecAugment frequency masking "freq_mask_count": 2, "freq_mix_prob": 0.0, "mix_up_prob": 0.0, }, ) print(f"Batch shapes - data: {train_data.shape}, labels: {train_labels.shape}") # Get validation data (no augmentation) val_data, val_labels, val_weights = feature_handler.get_data( mode="validation", batch_size=128, features_length=150, truncation_strategy="truncate_start", ) # Check dataset statistics print(f"Training samples: {feature_handler.get_mode_size('training')}") print(f"Validation duration: {feature_handler.get_mode_duration('validation_ambient'):.2f}s") ``` -------------------------------- ### Generate Multiple Wake Word Samples (Python) Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb Generates a larger quantity of wake word samples (up to 1000) for model training. This script is the starting point for improving model performance. It suggests experimenting with noise parameters, generating negative samples, and using diverse phonetic pronunciations for the wake word. ```python # Generates a larger amount of wake word samples. # Start here when trying to improve your model. # See https://github.com/rhasspy/piper-sample-generator for the full set of # parameters. In particular, experiment with noise-scales and noise-scale-ws, # generating negative samples similar to the wake word, and generating many more # wake word samples, possibly with different phonetic pronunciations. !python3 piper-sample-generator/generate_samples.py "{target_word}" \ --max-samples 1000 \ --batch-size 100 \ --output-dir generated_samples ``` -------------------------------- ### Generate Single Wake Word Sample (Python) Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb Generates a single audio sample of a target wake word using the piper-sample-generator. It handles cloning the generator repository, downloading a pre-trained model, and installing necessary libraries like PyTorch. The generated sample is provided as an audio element for verification. ```python # Generates 1 sample of the target word for manual verification. target_word = 'khum_puter' # Phonetic spellings may produce better samples import os import sys import platform from IPython.display import Audio if not os.path.exists("./piper-sample-generator"): if platform.system() == "Darwin": !git clone -b mps-support https://github.com/kahrendt/piper-sample-generator else: !git clone https://github.com/rhasspy/piper-sample-generator !wget -O piper-sample-generator/models/en_US-libritts_r-medium.pt 'https://github.com/rhasspy/piper-sample-generator/releases/download/v2.0.0/en_US-libritts_r-medium.pt' # Install system dependencies !pip install torch torchaudio piper-phonemize-cross==1.2.1 if "piper-sample-generator/" not in sys.path: sys.path.append("piper-sample-generator/") !python3 piper-sample-generator/generate_samples.py "{target_word}" \ --max-samples 1 \ --batch-size 1 \ --output-dir generated_samples Audio("generated_samples/0.wav", autoplay=True) ``` -------------------------------- ### Command Line Training and Evaluation Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Demonstrates how to train and evaluate wake word models using the command-line interface. It covers training new models, resuming from checkpoints, and evaluating existing models with various test options. ```bash # Train a new wake word model with MixedNet architecture python -m microwakeword.model_train_eval \ --training_config='training_parameters.yaml' \ --train 1 \ --restore_checkpoint 0 \ --test_tflite_streaming_quantized 1 \ --use_weights "best_weights" \ mixednet \ --pointwise_filters "64,64,64,64" \ --repeat_in_block "1,1,1,1" \ --mixconv_kernel_sizes '[5], [7,11], [9,15], [23]' \ --residual_connection "0,0,0,0" \ --first_conv_filters 32 \ --first_conv_kernel_size 5 \ --stride 3 # Resume training from checkpoint python -m microwakeword.model_train_eval \ --training_config='training_parameters.yaml' \ --train 1 \ --restore_checkpoint 1 \ --test_tflite_streaming_quantized 1 \ mixednet \ --pointwise_filters "48,48,48,48" \ --stride 3 # Only evaluate an existing model (no training) python -m microwakeword.model_train_eval \ --training_config='training_parameters.yaml' \ --train 0 \ --test_tf_nonstreaming 1 \ --test_tflite_nonstreaming 1 \ --test_tflite_streaming_quantized 1 \ --use_weights "best_weights" \ mixednet ``` -------------------------------- ### Audio Clip Loading and Management with Clips Class (Python) Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Shows how to use the `Clips` class to load, preprocess, and manage audio files for wake word training. It supports filtering by duration, splitting datasets, removing silence, and repeating short clips. This class is essential for preparing audio data. ```python from microwakeword.audio.clips import Clips # Load wake word audio samples from a directory clips = Clips( input_directory="generated_samples", file_pattern="*.wav", min_clip_duration_s=0.5, # Filter out clips shorter than 0.5s max_clip_duration_s=3.0, # Filter out clips longer than 3s repeat_clip_min_duration_s=2.0, # Repeat short clips until they reach 2s remove_silence=True, # Use WebRTC VAD to trim silence random_split_seed=42, # Seed for reproducible train/test splits split_count=0.1, # 10% for test and validation each trimmed_clip_duration_s=None, # Optional: trim clips to fixed duration trim_zeros=False # Optional: remove leading/trailing zeros ) # Get a random audio clip as numpy array random_clip = clips.get_random_clip() print(f"Clip shape: {random_clip.shape}, dtype: {random_clip.dtype}") # Iterate over all clips in a specific split for clip_audio in clips.audio_generator(split="train", repeat=1): print(f"Processing clip with {len(clip_audio)} samples") # Use a generator for random sampling during training for clip_audio in clips.random_audio_generator(max_clips=100): # Process each random clip pass ``` -------------------------------- ### Download and Prepare Audioset Dataset Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This code snippet downloads the Audioset dataset, extracts it, and converts FLAC audio files to 16-bit PCM WAV format at a 16000Hz sampling rate. It handles directory creation and uses `wget` for downloading and `tar` for extraction. ```python import os import datasets import scipy.io.wavfile import numpy as np from pathlib import Path from tqdm import tqdm if not os.path.exists("audioset"): os.mkdir("audioset") fname = "bal_train09.tar" out_dir = f"audioset/{fname}" link = "https://huggingface.co/datasets/agkphysics/AudioSet/resolve/main/data/" + fname !wget -O {out_dir} {link} !cd audioset && tar -xf bal_train09.tar output_dir = "./audioset_16k" if not os.path.exists(output_dir): os.mkdir(output_dir) # Save clips to 16-bit PCM wav files audioset_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("audioset/audio").glob("**/*.flac")]}) audioset_dataset = audioset_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000)) for row in tqdm(audioset_dataset): name = row['audio']['path'].split('/')[-1].replace(".flac", ".wav") scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16)) ``` -------------------------------- ### Configure and Apply Audio Augmentation Pipeline Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Sets up an audio augmentation pipeline with various effects like EQ, distortion, pitch shift, noise, and background sounds. It then loads audio clips, applies augmentation, and saves the result. This is useful for increasing the diversity of training data. ```python from microwakeword.audio.augmentation import Augmentation from microwakeword.audio.clips import Clips from microwakeword.audio.audio import save_clip # Configure augmentation pipeline augmenter = Augmentation( augmentation_duration_s=3.2, # Output duration in seconds augmentation_probabilities={ "SevenBandParametricEQ": 0.1, # Random EQ adjustments "TanhDistortion": 0.1, # Soft clipping distortion "PitchShift": 0.1, # Pitch variation +-3 semitones "BandStopFilter": 0.1, # Notch filtering "AddColorNoise": 0.25, # Pink/brown/white noise "AddBackgroundNoise": 0.75, # Mix in background audio "Gain": 1.0, # Random gain adjustment "GainTransition": 0.25, # Gradual volume changes "RIR": 0.5, # Room impulse response reverb }, impulse_paths=["mit_rirs"], # Directories with IR wav files background_paths=["fma_16k", "audioset_16k"], # Background audio dirs background_min_snr_db=-5, background_max_snr_db=10, min_gain_db=-45, max_gain_db=0, min_jitter_s=0.195, # Random padding before wake word max_jitter_s=0.205, ) # Load clips and apply augmentation clips = Clips(input_directory="generated_samples", file_pattern="*.wav") random_clip = clips.get_random_clip() augmented_clip = augmenter.augment_clip(random_clip) # Save augmented audio for verification save_clip(augmented_clip, "augmented_output.wav") # Use generator for batch processing clip_generator = clips.audio_generator() for augmented in augmenter.augment_generator(clip_generator): # Process each augmented clip pass ``` -------------------------------- ### Download and Prepare Free Music Archive (FMA) Dataset Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This code snippet downloads the Free Music Archive (FMA) extra small dataset, extracts the zip file, and converts MP3 audio files to 16-bit PCM WAV format at a 16000Hz sampling rate. It manages directory creation and uses `wget` for downloading and `unzip` for extraction. ```python import os import datasets import scipy.io.wavfile import numpy as np from pathlib import Path from tqdm import tqdm output_dir = "./fma" if not os.path.exists(output_dir): os.mkdir(output_dir) fname = "fma_xs.zip" link = "https://huggingface.co/datasets/mchl914/fma_xsmall/resolve/main/" + fname out_dir = f"fma/{fname}" !wget -O {out_dir} {link} !cd {output_dir} && unzip -q {fname} output_dir = "./fma_16k" if not os.path.exists(output_dir): os.mkdir(output_dir) # Save clips to 16-bit PCM wav files fma_dataset = datasets.Dataset.from_dict({"audio": [str(i) for i in Path("fma/fma_small").glob("**/*.mp3")]}) fma_dataset = fma_dataset.cast_column("audio", datasets.Audio(sampling_rate=16000)) for row in tqdm(fma_dataset): name = row['audio']['path'].split('/')[-1].replace(".mp3", ".wav") scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16)) ``` -------------------------------- ### Save YAML Training Configuration for Micro Wake Word Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This script defines and saves a YAML configuration file that controls the training process for the micro wake word model. It includes parameters such as window step, training directory, and detailed feature set configurations including sampling weights, penalty weights, truth labels, and truncation strategies. These hyperparameters significantly impact model quality. ```python # Save a yaml config that controls the training process # These hyperparamters can make a huge different in model quality. # Experiment with sampling and penalty weights and increasing the number of # training steps. import yaml import os config = {} config["window_step_ms"] = 10 config["train_dir"] = ( "trained_models/wakeword" ) # Each feature_dir should have at least one of the following folders with this structure: # training/ # ragged_mmap_folders_ending_in_mmap # testing/ # ragged_mmap_folders_ending_in_mmap # testing_ambient/ # ragged_mmap_folders_ending_in_mmap # validation/ # ragged_mmap_folders_ending_in_mmap # validation_ambient/ # ragged_mmap_folders_ending_in_mmap # # sampling_weight: Weight for choosing a spectrogram from this set in the batch # penalty_weight: Penalizing weight for incorrect predictions from this set # truth: Boolean whether this set has positive samples or negative samples # truncation_strategy = If spectrograms in the set are longer than necessary for training, how are they truncated # - random: choose a random portion of the entire spectrogram - useful for long negative samples # - truncate_start: remove the start of the spectrogram # - truncate_end: remove the end of the spectrogram # - split: Split the longer spectrogram into separate spectrograms offset by 100 ms. Only for ambient sets config["features"] = [ { "features_dir": "generated_augmented_features", "sampling_weight": 2.0, "penalty_weight": 1.0, "truth": True, "truncation_strategy": "truncate_start", "type": "mmap", }, { "features_dir": "negative_datasets/speech", "sampling_weight": 10.0, "penalty_weight": 1.0, "truth": False, "truncation_strategy": "random", "type": "mmap", }, { "features_dir": "negative_datasets/dinner_party", "sampling_weight": 10.0, "penalty_weight": 1.0, "truth": False, "truncation_strategy": "random", "type": "mmap", }, { "features_dir": "negative_datasets/no_speech", "sampling_weight": 5.0, "penalty_weight": 1.0, "truth": False, "truncation_strategy": "random", "type": "mmap", }, { "features_dir": "negative_datasets/dinner_party_eval", "sampling_weight": 0.0, "penalty_weight": 1.0, "truth": False, "truncation_strategy": "split", "type": "mmap", }, ] # Number of training steps in each iteration - various other settings are configured as lists that corresponds to different steps config["training_steps"] = [10000] # To save this config to a YAML file: # with open('config.yaml', 'w') as f: # yaml.dump(config, f, default_flow_style=False) ``` -------------------------------- ### Configure Audio Augmentation Pipeline Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This Python code sets up the audio augmentation pipeline using classes from the `microwakeword.audio` module. It defines parameters for clip loading, augmentation types, and background noise/reverb integration, aiming to improve model robustness. ```python # Sets up the augmentations. # To improve your model, experiment with these settings and use more sources of # background clips. from microwakeword.audio.augmentation import Augmentation from microwakeword.audio.clips import Clips from microwakeword.audio.spectrograms import SpectrogramGeneration clips = Clips(input_directory='generated_samples', file_pattern='*.wav', max_clip_duration_s=None, remove_silence=False, random_split_seed=10, split_count=0.1, ) augmenter = Augmentation(augmentation_duration_s=3.2, augmentation_probabilities = { "SevenBandParametricEQ": 0.1, "TanhDistortion": 0.1, "PitchShift": 0.1, "BandStopFilter": 0.1, "AddColorNoise": 0.1, "AddBackgroundNoise": 0.75, "Gain": 1.0, "RIR": 0.5, }, impulse_paths = ['mit_rirs'], background_paths = ['fma_16k', 'audioset_16k'], background_min_snr_db = -5, background_max_snr_db = 10, min_jitter_s = 0.195, max_jitter_s = 0.205, ) ``` -------------------------------- ### Train and Evaluate Micro Wake Word Model Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This command initiates the training and evaluation process for a micro wake word model using a specified configuration file. It supports resuming training from checkpoints, testing various model formats (TFLite, quantized, streaming), and uses the best-weighted model for final testing. The command also includes parameters for model architecture, such as filter counts and kernel sizes. ```bash # Trains a model. When finished, it will quantize and convert the model to a # streaming version suitable for on-device detection. # It will resume if stopped, but it will start over at the configured training # steps in the yaml file. # Change --train 0 to only convert and test the best-weighted model. # On Google colab, it doesn't print the mini-batch results, so it may appear # stuck for several minutes! Additionally, it is very slow compared to training # on a local GPU. !python -m microwakeword.model_train_eval \ --training_config='training_parameters.yaml' \ --train 1 \ --restore_checkpoint 1 \ --test_tf_nonstreaming 0 \ --test_tflite_nonstreaming 0 \ --test_tflite_nonstreaming_quantized 0 \ --test_tflite_streaming 0 \ --test_tflite_streaming_quantized 1 \ --use_weights "best_weights" \ mixednet \ --pointwise_filters "64,64,64,64" \ --repeat_in_block "1, 1, 1, 1" \ --mixconv_kernel_sizes '[5], [7,11], [9,15], [23]' \ --residual_connection "0,0,0,0" \ --first_conv_filters 32 \ --first_conv_kernel_size 5 \ --stride 3 ``` -------------------------------- ### Download Negative Datasets for Micro Wake Word Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This script downloads pre-generated spectrogram features for various negative datasets used in the micro wake word project. It creates a 'negative_datasets' directory and uses `wget` and `unzip` to download and extract zip files from Hugging Face. This process can be time-consuming. ```python # Downloads pre-generated spectrogram features (made for microWakeWord in # particular) for various negative datasets. This can be slow! import os output_dir = './negative_datasets' if not os.path.exists(output_dir): os.mkdir(output_dir) link_root = "https://huggingface.co/datasets/kahrendt/microwakeword/resolve/main/" filenames = ['dinner_party.zip', 'dinner_party_eval.zip', 'no_speech.zip', 'speech.zip'] for fname in filenames: link = link_root + fname zip_path = f"negative_datasets/{fname}" !wget -O {zip_path} {link} !unzip -q {zip_path} -d {output_dir} ``` -------------------------------- ### Training Configuration YAML Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Defines the training parameters for wake word models using a YAML file. It specifies feature set configurations, training hyperparameters, optimization metrics, and SpecAugment settings. ```yaml # training_parameters.yaml window_step_ms: 10 train_dir: "trained_models/my_wake_word" features: - features_dir: "features/positive_samples" sampling_weight: 2.0 penalty_weight: 1.0 truth: true truncation_strategy: "truncate_start" type: "mmap" - features_dir: "features/negative_speech" sampling_weight: 10.0 penalty_weight: 1.0 truth: false truncation_strategy: "random" type: "mmap" - features_dir: "features/ambient_noise" sampling_weight: 5.0 penalty_weight: 1.0 truth: false truncation_strategy: "random" type: "mmap" # Training schedule (lists correspond to training phases) training_steps: [10000, 5000] learning_rates: [0.001, 0.0001] positive_class_weight: [1, 1] negative_class_weight: [20, 20] # SpecAugment settings per phase time_mask_max_size: [5, 3] time_mask_count: [2, 1] freq_mask_max_size: [5, 3] freq_mask_count: [2, 1] batch_size: 128 eval_step_interval: 500 clip_duration_ms: 1500 # Best model selection criteria target_minimization: 0.9 minimization_metric: "ambient_false_positives_per_hour" maximization_metric: "average_viable_recall" ``` -------------------------------- ### Configure Training Parameters for Micro Wake Word Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This Python code snippet configures various training parameters for a micro wake word model, including class weights, learning rates, batch size, SpecAugment settings, evaluation intervals, clip duration, and model selection metrics. The configuration is then saved to a YAML file named 'training_parameters.yaml'. ```python import os import yaml config = {} config["positive_class_weight"] = [1] config["negative_class_weight"] = [20] config["learning_rates"] = [ 0.001, ] # Learning rates for Adam optimizer - list that corresponds to training steps config["batch_size"] = 128 config["time_mask_max_size"] = [ 0 ] # SpecAugment - list that corresponds to training steps config["time_mask_count"] = [0] # SpecAugment - list that corresponds to training steps config["freq_mask_max_size"] = [ 0 ] # SpecAugment - list that corresponds to training steps config["freq_mask_count"] = [0] # SpecAugment - list that corresponds to training steps config["eval_step_interval"] = ( 500 # Test the validation sets after every this many steps ) config["clip_duration_ms"] = ( 1500 # Maximum length of wake word that the streaming model will accept ) # The best model weights are chosen first by minimizing the specified minimization metric below the specified target_minimization # Once the target has been met, it chooses the maximum of the maximization metric. Set 'minimization_metric' to None to only maximize # Available metrics: # - "loss" - cross entropy error on validation set # - "accuracy" - accuracy of validation set # - "recall" - recall of validation set # - "precision" - precision of validation set # - "false_positive_rate" - false positive rate of validation set # - "false_negative_rate" - false negative rate of validation set # - "ambient_false_positives" - count of false positives from the split validation_ambient set # - "ambient_false_positives_per_hour" - estimated number of false positives per hour on the split validation_ambient set config["target_minimization"] = 0.9 config["minimization_metric"] = None # Set to None to disable config["maximization_metric"] = "average_viable_recall" with open(os.path.join("training_parameters.yaml"), "w") as file: documents = yaml.dump(config, file) ``` -------------------------------- ### Generate Spectrograms with Sliding Window for Streaming Inference Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Configures `SpectrogramGeneration` to create spectrogram features from audio clips, incorporating augmentation. It utilizes a sliding window approach (`slide_frames`) to simulate streaming inference and generates spectrograms for training data in RaggedMmap format. ```python from microwakeword.audio.spectrograms import SpectrogramGeneration from microwakeword.audio.augmentation import Augmentation from microwakeword.audio.clips import Clips from mmap_ninja.ragged import RaggedMmap import os # Setup clips and augmentation clips = Clips( input_directory="generated_samples", file_pattern="*.wav", random_split_seed=42, split_count=0.1 ) augmenter = Augmentation( augmentation_duration_s=3.2, impulse_paths=["mit_rirs"], background_paths=["audioset_16k"], ) # Configure spectrogram generation with sliding window # slide_frames simulates streaming inference by generating multiple # spectrograms from the same audio shifted by one frame each spectrogram_gen = SpectrogramGeneration( clips=clips, augmenter=augmenter, step_ms=10, # Window step size for features slide_frames=10, # Generate 10 shifted versions per clip split_spectrogram_duration_s=None, # Optional: split into fixed chunks ) # Get a single random spectrogram spectrogram = spectrogram_gen.get_random_spectrogram() print(f"Spectrogram shape: {spectrogram.shape}") # (time_steps, 40) # Generate and save spectrograms to RaggedMmap format for training output_dir = "features/training/wakeword_mmap" os.makedirs(output_dir, exist_ok=True) RaggedMmap.from_generator( out_dir=output_dir, sample_generator=spectrogram_gen.spectrogram_generator( split="train", # Use training split repeat=2 # Repeat clips twice with different augmentations ), batch_size=100, verbose=True, ) ``` -------------------------------- ### Augment and Play a Random Audio Clip Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This snippet demonstrates how to augment a randomly selected audio clip using the configured augmenter and then play it back. It saves the augmented clip to a WAV file and uses `IPython.display.Audio` for playback within an environment like a Jupyter notebook. ```python # Augment a random clip and play it back to verify it works well from IPython.display import Audio from microwakeword.audio.audio_utils import save_clip random_clip = clips.get_random_clip() augmented_clip = augmenter.augment_clip(random_clip) save_clip(augmented_clip, 'augmented_clip.wav') Audio("augmented_clip.wav", autoplay=True) ``` -------------------------------- ### Compute Wake Word Detection Metrics Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Calculates and prints key performance metrics such as accuracy, recall, precision, and false positive rate for wake word detection models. It takes true positives, true negatives, false positives, and false negatives as input. ```python import numpy as np from microwakeword.test import ( compute_metrics, compute_false_accepts_per_hour, generate_roc_curve, tflite_streaming_model_roc ) from microwakeword.inference import Model # Compute classification metrics metrics = compute_metrics( true_positives=950, true_negatives=9800, false_positives=200, false_negatives=50 ) print(f"Accuracy: {metrics['accuracy']:.4f}") print(f"Recall: {metrics['recall']:.4f}") print(f"Precision: {metrics['precision']:.4f}") print(f"False Positive Rate: {metrics['false_positive_rate']:.4f}") # Compute false accepts per hour from streaming probabilities # streaming_probs_list: List of probability arrays from negative audio clips streaming_probs_list = [ np.random.random(1000), # Simulated probabilities np.random.random(2000), ] cutoffs = np.arange(0, 1.01, 0.01) faph = compute_false_accepts_per_hour( streaming_probabilities_list=streaming_probs_list, cutoffs=cutoffs, ignore_slices_after_accept=75, # Cooldown after detection stride=3, step_s=0.02, ) print(f"False accepts/hour at 0.5 threshold: {faph[50]:.2f}") # Generate ROC curve coordinates false_rejections = np.linspace(0, 1, len(cutoffs)) # Example data x_coords, y_coords, cutoffs_at_points = generate_roc_curve( false_accepts_per_hour=faph, false_rejections=false_rejections, cutoffs=cutoffs, max_faph=2.0, ) # Area under curve auc = np.trapz(y_coords, x_coords) print(f"ROC AUC: {auc:.4f}") ``` -------------------------------- ### Download Audio Data for Augmentation (Python) Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb Downloads Room Impulse Response (RIR) data from the MIT environmental impulse responses dataset and noise/background audio from the AudioSet dataset for data augmentation. The RIR data is saved as 16-bit PCM WAV files. Note that the downloaded data has mixed licenses, making models trained with it suitable only for non-commercial personal use. ```python # Downloads audio data for augmentation. This can be slow! # Borrowed from openWakeWord's automatic_model_training.ipynb, accessed March 4, 2024 # # **Important note!** The data downloaded here has a mixture of difference # licenses and usage restrictions. As such, any custom models trained with this # data should be considered as appropriate for **non-commercial** personal use only. import datasets import scipy import os import numpy as np from pathlib import Path from tqdm import tqdm ## Download MIR RIR data output_dir = "./mit_rirs" if not os.path.exists(output_dir): os.mkdir(output_dir) rir_dataset = datasets.load_dataset("davidscripka/MIT_environmental_impulse_responses", split="train", streaming=True) # Save clips to 16-bit PCM wav files for row in tqdm(rir_dataset): name = row['audio']['path'].split('/')[-1] scipy.io.wavfile.write(os.path.join(output_dir, name), 16000, (row['audio']['array']*32767).astype(np.int16)) ## Download noise and background audio # Audioset Dataset (https://research.google.com/audioset/dataset/index.html) # Download one part of the audioset .tar files, extract, and convert to 16khz # For full-scale training, it's recommended to download the entire dataset from # https://huggingface.co/datasets/agkphysics/AudioSet, and ``` -------------------------------- ### Audio Data Augmentation with Augmentation Class (Python) Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Illustrates the use of the `Augmentation` class to apply various transformations to audio data, enhancing model robustness and generalization. Supported augmentations include EQ, distortion, pitch shifting, and noise addition. This class is typically used in conjunction with the `Clips` class. ```python from microwakeword.audio.augmentation import Augmentation from microwakeword.audio.clips import Clips from microwakeword.audio.audio_utils import save_clip # Example usage (assuming clips and augmentation objects are initialized) # augmentation = Augmentation(...) # clips = Clips(...) # for clip_audio in clips.audio_generator(split="train"): # augmented_clip = augmentation.augment(clip_audio) # save_clip(augmented_clip, "augmented_output.wav") ``` -------------------------------- ### Audio Utilities for Feature Generation and Processing Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Provides functions for generating spectrogram features from audio, saving audio clips, and removing silence using WebRTC VAD. It utilizes `scipy.io.wavfile` for audio loading and `microwakeword.audio.audio_utils` for processing. ```python import numpy as np from scipy.io import wavfile from microwakeword.audio.audio_utils import ( generate_features_for_clip, save_clip, remove_silence_webrtc ) # Load audio file sample_rate, audio_data = wavfile.read("wake_word_sample.wav") # Generate spectrogram features using micro_speech preprocessor # Returns 40-channel features compatible with TFLite Micro spectrogram = generate_features_for_clip( audio_samples=audio_data, step_ms=20, # Window step size (10 or 20 ms typical) use_c=True # Use C implementation (faster) ) print(f"Spectrogram shape: {spectrogram.shape}") # (time_steps, 40) # Remove silence from audio using WebRTC VAD trimmed_audio = remove_silence_webrtc( audio_data=audio_data.astype(np.float32) / 32768.0, # Normalize to float frame_duration=0.030, # 30ms frames sample_rate=16000, min_start=2000, # Keep at least first 2000 samples ) # Save processed audio save_clip( audio_samples=trimmed_audio, output_file="processed_wake_word.wav" ) ``` -------------------------------- ### TFLite Model Inference for Wake Word Detection (Python) Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Demonstrates how to load a trained TensorFlow Lite wake word model and perform inference on an audio clip. It handles audio loading, feature extraction, and probability thresholding for wake word detection. Requires numpy, scipy, and the microwakeword library. ```python import numpy as np from scipy.io import wavfile from microwakeword.inference import Model # Load a trained TFLite wake word model model = Model( tflite_model_path="trained_models/wakeword/tflite_stream_state_internal_quant/stream_state_internal_quant.tflite", stride=3 # Match the stride used during training ) # Load audio file (must be 16kHz, 16-bit PCM) sample_rate, audio_data = wavfile.read("test_audio.wav") assert sample_rate == 16000, "Audio must be 16kHz" # Run inference on the audio clip # Returns a list of probabilities for each inference window probabilities = model.predict_clip(audio_data, step_ms=20) # Check if wake word was detected (probability > threshold) threshold = 0.5 for i, prob in enumerate(probabilities): if prob > threshold: print(f"Wake word detected at window {i} with probability {prob:.3f}") # Alternative: Run inference directly on a spectrogram from microwakeword.audio.audio_utils import generate_features_for_clip spectrogram = generate_features_for_clip(audio_data, step_ms=20) probabilities = model.predict_spectrogram(spectrogram) ``` -------------------------------- ### Manage Training Data with FeatureHandler Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Initializes the `FeatureHandler` class for managing spectrogram features during model training. It configures multiple feature sets (positive wake word, negative speech, ambient) with different sampling weights and truncation strategies, supporting both mmap files and on-the-fly generation. ```python from microwakeword.data import FeatureHandler # Configuration for feature loading config = { "stride": 3, "window_step_ms": 10, "batch_size": 128, "spectrogram_length": 150, # Number of time frames "features": [ { "type": "mmap", "features_dir": "features/positive", "truth": True, # Positive wake word samples "sampling_weight": 2.0, # Higher weight = sampled more often "penalty_weight": 1.0, # Weight for incorrect predictions "truncation_strategy": "truncate_start", }, { "type": "mmap", "features_dir": "features/negative_speech", "truth": False, # Negative samples (not wake word) "sampling_weight": 10.0, "penalty_weight": 1.0, "truncation_strategy": "random", }, { "type": "mmap", "features_dir": "features/ambient", "truth": False, "sampling_weight": 0.0, # Only used for validation/testing "penalty_weight": 1.0, "truncation_strategy": "split", # Split long clips for ambient }, ], } # Example of initializing FeatureHandler (actual usage would involve training loop) # feature_handler = FeatureHandler(config=config) ``` -------------------------------- ### Save Model Summary to Text File (Python) Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Saves a summary of a trained streaming model to a specified text file. This function is useful for documenting model architecture and parameters during the training process. ```python save_model_summary( model=streaming_model, path="trained_models/my_wake_word/stream_state_internal", file_name="model_summary.txt" ) ``` -------------------------------- ### Save Augmented Training, Validation, and Test Sets Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This code segment outlines the process of augmenting samples and saving them into distinct training, validation, and testing sets. It highlights the importance of using realistic or differently generated samples for validation and testing to ensure accurate model benchmarking. ```python # Augment samples and save the training, validation, and testing sets. # Validating and testing samples generated the same way can make the model # benchmark better than it performs in real-word use. Use real samples or TTS # samples generated with a different TTS engine to potentially get more accurate ``` -------------------------------- ### Convert Keras Models to TFLite with TensorFlow Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Provides utilities to convert trained Keras models into streaming TFLite format, with options for quantization. It supports saving both streaming and non-streaming inference versions and includes steps for feature handling and quantization calibration. ```python import tensorflow as tf from microwakeword.utils import ( convert_model_saved, convert_saved_model_to_tflite, to_streaming_inference, save_model_summary ) from microwakeword.layers.modes import Modes from microwakeword.data import FeatureHandler # Configuration config = { "train_dir": "trained_models/my_wake_word", "stride": 3, "spectrogram_length": 150, "batch_size": 1, } # Assume 'model' is a trained Keras model # Convert to streaming SavedModel format streaming_model = convert_model_saved( model=model, config=config, folder="stream_state_internal", mode=Modes.STREAM_INTERNAL_STATE_INFERENCE, ) # Also save non-streaming version nonstreaming_model = convert_model_saved( model=model, config=config, folder="non_stream", mode=Modes.NON_STREAM_INFERENCE, ) # Initialize feature handler for quantization calibration feature_handler = FeatureHandler(config) # Convert to quantized TFLite convert_saved_model_to_tflite( config=config, audio_processor=feature_handler, path_to_model="trained_models/my_wake_word/stream_state_internal", folder="trained_models/my_wake_word/tflite_quant", fname="wake_word_quantized.tflite", quantize=True, # Full int8 quantization ) # Convert to non-quantized TFLite convert_saved_model_to_tflite( config=config, audio_processor=feature_handler, path_to_model="trained_models/my_wake_word/stream_state_internal", folder="trained_models/my_wake_word/tflite", fname="wake_word.tflite", quantize=False, ) ``` -------------------------------- ### Generate Augmented Spectrogram Features for Micro Wake Word Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This script generates augmented spectrogram features for training, validation, and testing of the micro wake word model. It creates output directories and utilizes `SpectrogramGeneration` to produce spectrograms, which are then saved using `RaggedMmap`. The `slide_frames` parameter is adjusted for different splits to simulate streaming inference. ```python import os from mmap_ninja.ragged import RaggedMmap output_dir = 'generated_augmented_features' if not os.path.exists(output_dir): os.mkdir(output_dir) splits = ["training", "validation", "testing"] for split in splits: out_dir = os.path.join(output_dir, split) if not os.path.exists(out_dir): os.mkdir(out_dir) split_name = "train" repetition = 2 spectrograms = SpectrogramGeneration(clips=clips, augmenter=augmenter, slide_frames=10, # Uses the same spectrogram repeatedly, just shifted over by one frame. This simulates the streaming inferences while training/validating in nonstreaming mode. step_ms=10, ) if split == "validation": split_name = "validation" repetition = 1 elif split == "testing": split_name = "test" repetition = 1 spectrograms = SpectrogramGeneration(clips=clips, augmenter=augmenter, slide_frames=1, # The testing set uses the streaming version of the model, so no artificial repetition is necessary step_ms=10, ) RaggedMmap.from_generator( out_dir=os.path.join(out_dir, 'wakeword_mmap'), sample_generator=spectrograms.spectrogram_generator(split=split_name, repeat=repetition), batch_size=100, verbose=True, ) ``` -------------------------------- ### Download Trained TFLite Micro Wake Word Model Source: https://github.com/ohf-voice/micro-wake-word/blob/main/notebooks/basic_training_notebook.ipynb This Python code snippet uses the `google.colab.files` module to download the trained TFLite streaming model file. The downloaded model can be used on-device, and further instructions for creating a Model JSON file and adjusting probability thresholds are provided in the accompanying documentation. ```python # Downloads the tflite model file. To use on the device, you need to write a # Model JSON file. See https://esphome.io/components/micro_wake_word for the # documentation and # https://github.com/esphome/micro-wake-word-models/tree/main/models/v2 for # examples. Adjust the probability threshold based on the test results obtained # after training is finished. You may also need to increase the Tensor arena # model size if the model fails to load. from google.colab import files files.download(f"trained_models/wakeword/tflite_stream_state_internal_quant/stream_state_internal_quant.tflite") ``` -------------------------------- ### Define MixedNet Model Architecture with TensorFlow Source: https://context7.com/ohf-voice/micro-wake-word/llms.txt Constructs the MixedNet model architecture using TensorFlow, supporting configurable layers, residual connections, and spatial attention. It parses command-line arguments for model configuration and calculates the required spectrogram length. ```python import tensorflow as tf from microwakeword.mixednet import model, model_parameters, spectrogram_slices_dropped import argparse # Create argument parser with model parameters parser = argparse.ArgumentParser() subparsers = parser.add_subparsers(dest="model_name") parser_mixednet = subparsers.add_parser("mixednet") model_parameters(parser_mixednet) # Parse model configuration flags = parser.parse_args([ "mixednet", "--pointwise_filters", "64,64,64,64", "--repeat_in_block", "1,1,1,1", "--mixconv_kernel_sizes", "[5], [7,11], [9,15], [23]", "--residual_connection", "0,0,0,0", "--first_conv_filters", "32", "--first_conv_kernel_size", "5", "--stride", "3", "--pooled", "0", "--max_pool", "0", "--spatial_attention", "0", ]) # Calculate spectrogram length needed for the model slices_dropped = spectrogram_slices_dropped(flags) print(f"Spectrogram slices dropped due to valid padding: {slices_dropped}") # Build the model input_shape = (150, 40) # (time_steps, features) batch_size = 128 wake_word_model = model( flags=flags, shape=input_shape, batch_size=batch_size ) wake_word_model.summary() # Model outputs probability between 0 and 1 # Input: (batch, time, features) # Output: (batch, 1) - wake word probability ``` === COMPLETE CONTENT === This response contains all available snippets from this library. No additional content exists. Do not make further requests.