Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Add Docs
RNNoise
https://github.com/xiph/rnnoise
Admin
RNNoise is a noise suppression library based on a recurrent neural network, designed for real-time
...
Tokens:
4,509
Snippets:
33
Trust Score:
7.7
Update:
5 months ago
Context
Skills
Chat
Benchmark
90.2
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# RNNoise: Neural Network Noise Suppression Library RNNoise is a real-time noise suppression library based on a recurrent neural network designed for full-band speech enhancement. The library uses a hybrid DSP/deep learning approach to denoise audio signals, operating on 16-bit PCM mono audio sampled at 48 kHz. It provides both a C library for production use and Python-based training tools for creating custom models. The project includes a complete training pipeline that generates noise-suppressed speech models using PyTorch or Keras, with support for model quantization and export to optimized C code. Models can be embedded in the library at compile time or loaded dynamically at runtime. The library is optimized for real-time performance with AVX2 and SSE4.1 support, making it suitable for low-latency applications like VoIP, podcasting, and live audio processing. ## API Reference ### Initialize DenoiseState with Default Model Creates a new denoising state using the built-in model. The state maintains internal buffers for frame processing and must be destroyed after use. ```c #include "rnnoise.h" int main() { // Create denoise state with default model DenoiseState *st = rnnoise_create(NULL); if (st == NULL) { fprintf(stderr, "Failed to create denoise state\n"); return 1; } // Process audio frames here... // Clean up rnnoise_destroy(st); return 0; } ``` ### Process Audio Frame Processes a single frame of audio (480 samples at 48 kHz = 10ms). Input and output buffers must contain exactly 480 float samples. Returns the voice activity detection probability. ```c #include <stdio.h> #include "rnnoise.h" #define FRAME_SIZE 480 int main(int argc, char **argv) { FILE *f_input, *f_output; DenoiseState *st; float x[FRAME_SIZE]; short tmp[FRAME_SIZE]; int i; st = rnnoise_create(NULL); f_input = fopen("noisy_speech.raw", "rb"); f_output = fopen("denoised_speech.raw", "wb"); while (1) { // Read 16-bit PCM samples if (fread(tmp, sizeof(short), FRAME_SIZE, f_input) != FRAME_SIZE) break; // Convert to float for (i = 0; i < FRAME_SIZE; i++) x[i] = tmp[i]; // Process frame (in-place processing supported) float vad_prob = rnnoise_process_frame(st, x, x); // Convert back to 16-bit for (i = 0; i < FRAME_SIZE; i++) tmp[i] = (short)x[i]; fwrite(tmp, sizeof(short), FRAME_SIZE, f_output); } rnnoise_destroy(st); fclose(f_input); fclose(f_output); return 0; } ``` ### Load Custom Model from File Loads a custom trained model from a binary weights file. The model object must remain valid throughout the lifetime of any DenoiseState using it. ```c #include <stdio.h> #include "rnnoise.h" int main() { RNNModel *model; DenoiseState *st; // Load custom model from file model = rnnoise_model_from_filename("weights_blob.bin"); if (model == NULL) { fprintf(stderr, "Failed to load model\n"); return 1; } // Create denoise state with custom model st = rnnoise_create(model); if (st == NULL) { fprintf(stderr, "Failed to create denoise state\n"); rnnoise_model_free(model); return 1; } // Process audio frames... // Clean up in correct order rnnoise_destroy(st); rnnoise_model_free(model); return 0; } ``` ### Load Model from Memory Buffer Loads a model from a pre-allocated memory buffer. Useful for embedded systems or when models are stored in application resources. ```c #include <stdio.h> #include <stdlib.h> #include "rnnoise.h" int main() { FILE *f; long file_size; unsigned char *buffer; RNNModel *model; DenoiseState *st; // Read entire model file into memory f = fopen("weights_blob.bin", "rb"); fseek(f, 0, SEEK_END); file_size = ftell(f); fseek(f, 0, SEEK_SET); buffer = malloc(file_size); fread(buffer, 1, file_size, f); fclose(f); // Load model from buffer (buffer must remain valid) model = rnnoise_model_from_buffer(buffer, file_size); st = rnnoise_create(model); // Process audio... rnnoise_destroy(st); rnnoise_model_free(model); free(buffer); return 0; } ``` ### Get Frame Size Retrieves the frame size required by the library. Always returns 480 samples for 48 kHz audio. ```c #include <stdio.h> #include "rnnoise.h" int main() { int frame_size = rnnoise_get_frame_size(); printf("RNNoise frame size: %d samples\n", frame_size); printf("At 48 kHz, this is %.1f ms\n", (float)frame_size / 48000.0f * 1000.0f); // Allocate buffers with correct size float *input_buffer = malloc(frame_size * sizeof(float)); float *output_buffer = malloc(frame_size * sizeof(float)); // Use buffers... free(input_buffer); free(output_buffer); return 0; } ``` ### Generate Training Features Creates training feature files from clean speech and noise samples. This is the first step in training a custom model. ```bash # Build the feature dumping tool ./autogen.sh ./configure make # Generate training features with RIR augmentation ./dump_features -rir_list rir_list.txt \ clean_speech.pcm \ background_noise.pcm \ foreground_noise.pcm \ training_features.f32 \ 200000 # Output: training_features.f32 containing 200,000 sequences # Each sequence is 98 float32 values per frame # Format: [65 input features, 32 gain targets, 1 VAD target] ``` ### Train Custom Model with PyTorch Trains a new RNNoise model using generated features. The training script supports custom architectures and hyperparameters. ```python #!/usr/bin/env python3 import sys sys.path.append('./torch/rnnoise') # Train model with custom parameters # Command line usage: """ python3 train_rnnoise.py \ training_features.f32 \ output_models/ \ --batch-size 128 \ --lr 0.001 \ --epochs 200 \ --sequence-length 2000 \ --cond-size 128 \ --gru-size 384 \ --gamma 0.25 # Training produces checkpoint files: # output_models/checkpoints/rnnoise_1.pth # output_models/checkpoints/rnnoise_2.pth # ... # output_models/checkpoints/rnnoise_200.pth # Choose epoch with best validation performance # Typically aim for ~75000 weight updates total """ ``` ### Export Trained Model to C Code Converts a trained PyTorch model to optimized C source files with optional quantization for smaller model size. ```python #!/usr/bin/env python3 import sys sys.path.append('./torch/rnnoise') # Export with quantization """ python3 dump_rnnoise_weights.py \ --quantize \ output_models/checkpoints/rnnoise_50.pth \ exported_c_code/ # Generated files: # exported_c_code/rnnoise_data.c # exported_c_code/rnnoise_data.h # Copy to src/ directory cp exported_c_code/rnnoise_data.c src/ cp exported_c_code/rnnoise_data.h src/ # Rebuild library with new model make clean make make install """ # Without quantization (larger but potentially more accurate) """ python3 dump_rnnoise_weights.py \ output_models/checkpoints/rnnoise_50.pth \ exported_c_code/ """ ``` ### Build Library with Custom Compilation Flags Compiles the library with optimizations for specific CPU architectures. AVX2 support significantly improves performance. ```bash # Clone and prepare git clone https://gitlab.xiph.org/xiph/rnnoise.git cd rnnoise # Build with AVX2 support (recommended for modern CPUs) ./autogen.sh ./configure --enable-x86-rtcd CFLAGS="-O3 -march=native" make sudo make install # Build without default model (for runtime loading only) ./configure CFLAGS="-O3 -march=native -DUSE_WEIGHTS_FILE" make # Build examples ./configure --enable-examples make ./examples/rnnoise_demo input.raw output.raw # Expected output format: 16-bit PCM, mono, 48 kHz ``` ### Export Model as Binary Blob Generates a loadable binary model file from a compiled library with embedded weights. ```bash # Build with your custom model cd src cp my_rnnoise_data.c rnnoise_data.c cp my_rnnoise_data.h rnnoise_data.h cd .. make # Export model to binary format ./examples/dump_weights_blob # Generated: weights_blob.bin # This file can be loaded at runtime using rnnoise_model_from_file() # Example: Load binary model in C """ FILE *model_file = fopen("weights_blob.bin", "rb"); RNNModel *model = rnnoise_model_from_file(model_file); DenoiseState *st = rnnoise_create(model); // Process audio... rnnoise_destroy(st); rnnoise_model_free(model); fclose(model_file); // Close only after freeing model """ ``` ### Command-Line Demo Tool Processes raw 16-bit PCM audio files through the noise suppression pipeline. Operates on raw audio without WAV headers. ```bash # Prepare input audio (must be 16-bit PCM, mono, 48kHz, raw format) # Convert from WAV using ffmpeg or sox: ffmpeg -i noisy_input.wav -f s16le -acodec pcm_s16le -ar 48000 -ac 1 noisy_input.raw # Run noise suppression ./examples/rnnoise_demo noisy_input.raw denoised_output.raw # Convert back to WAV for playback ffmpeg -f s16le -ar 48000 -ac 1 -i denoised_output.raw denoised_output.wav # Note: Input and output are RAW format, not WAV # Each sample is a 16-bit signed integer (machine endian) # No headers or metadata included ``` ### Pre-allocated State Initialization Uses a pre-allocated memory buffer for the denoise state instead of dynamic allocation. Useful for embedded systems. ```c #include <stdlib.h> #include "rnnoise.h" int main() { int state_size; void *state_memory; DenoiseState *st; // Get required memory size state_size = rnnoise_get_size(); printf("DenoiseState requires %d bytes\n", state_size); // Allocate memory state_memory = malloc(state_size); if (state_memory == NULL) { fprintf(stderr, "Memory allocation failed\n"); return 1; } // Initialize pre-allocated state st = (DenoiseState *)state_memory; if (rnnoise_init(st, NULL) != 0) { fprintf(stderr, "Initialization failed\n"); free(state_memory); return 1; } // Process audio frames... // No rnnoise_destroy() needed for pre-allocated state free(state_memory); return 0; } ``` ## Usage Summary RNNoise is primarily used for real-time audio noise suppression in applications requiring low-latency speech enhancement. Common use cases include VoIP clients, podcasting software, live streaming applications, voice assistants, and hearing aid signal processing. The library processes audio in 10ms frames (480 samples at 48 kHz), making it suitable for real-time applications with minimal latency overhead. The neural network model uses a GRU-based architecture that balances quality and computational efficiency. Integration typically involves reading audio samples, converting to float format, calling `rnnoise_process_frame()` for each 480-sample frame, and writing the denoised output. The library supports both embedded models compiled into the binary and external models loaded at runtime, allowing flexibility in deployment scenarios. Custom models can be trained on domain-specific datasets using the provided Python training scripts, enabling optimization for particular acoustic environments or speaker characteristics. The model export pipeline handles quantization and optimization, producing efficient C code suitable for embedded systems or resource-constrained environments.