RNNoise (xiph/rnnoise)

RNNoise

https://github.com/xiph/rnnoise
Admin
RNNoise is a noise suppression library based on a recurrent neural network, designed for real-time...

Tokens:4,509
Snippets:33
Trust Score:7.7
Update:5 months ago
Show doc for...
Context Summary (auto-generated)
Raw
# RNNoise: Neural Network Noise Suppression Library

RNNoise is a real-time noise suppression library based on a recurrent neural network designed for full-band speech enhancement. The library uses a hybrid DSP/deep learning approach to denoise audio signals, operating on 16-bit PCM mono audio sampled at 48 kHz. It provides both a C library for production use and Python-based training tools for creating custom models.

The project includes a complete training pipeline that generates noise-suppressed speech models using PyTorch or Keras, with support for model quantization and export to optimized C code. Models can be embedded in the library at compile time or loaded dynamically at runtime. The library is optimized for real-time performance with AVX2 and SSE4.1 support, making it suitable for low-latency applications like VoIP, podcasting, and live audio processing.

## API Reference

### Initialize DenoiseState with Default Model

Creates a new denoising state using the built-in model. The state maintains internal buffers for frame processing and must be destroyed after use.

```c
#include "rnnoise.h"

int main() {
    // Create denoise state with default model
    DenoiseState *st = rnnoise_create(NULL);
    if (st == NULL) {
        fprintf(stderr, "Failed to create denoise state\n");
        return 1;
    }

    // Process audio frames here...

    // Clean up
    rnnoise_destroy(st);
    return 0;
}
```

### Process Audio Frame

Processes a single frame of audio (480 samples at 48 kHz = 10ms). Input and output buffers must contain exactly 480 float samples. Returns the voice activity detection probability.

```c
#include <stdio.h>
#include "rnnoise.h"

#define FRAME_SIZE 480

int main(int argc, char **argv) {
    FILE *f_input, *f_output;
    DenoiseState *st;
    float x[FRAME_SIZE];
    short tmp[FRAME_SIZE];
    int i;

    st = rnnoise_create(NULL);
    f_input = fopen("noisy_speech.raw", "rb");
    f_output = fopen("denoised_speech.raw", "wb");

    while (1) {
        // Read 16-bit PCM samples
        if (fread(tmp, sizeof(short), FRAME_SIZE, f_input) != FRAME_SIZE)
            break;

        // Convert to float
        for (i = 0; i < FRAME_SIZE; i++)
            x[i] = tmp[i];

        // Process frame (in-place processing supported)
        float vad_prob = rnnoise_process_frame(st, x, x);

        // Convert back to 16-bit
        for (i = 0; i < FRAME_SIZE; i++)
            tmp[i] = (short)x[i];

        fwrite(tmp, sizeof(short), FRAME_SIZE, f_output);
    }

    rnnoise_destroy(st);
    fclose(f_input);
    fclose(f_output);
    return 0;
}
```

### Load Custom Model from File

Loads a custom trained model from a binary weights file. The model object must remain valid throughout the lifetime of any DenoiseState using it.

```c
#include <stdio.h>
#include "rnnoise.h"

int main() {
    RNNModel *model;
    DenoiseState *st;

    // Load custom model from file
    model = rnnoise_model_from_filename("weights_blob.bin");
    if (model == NULL) {
        fprintf(stderr, "Failed to load model\n");
        return 1;
    }

    // Create denoise state with custom model
    st = rnnoise_create(model);
    if (st == NULL) {
        fprintf(stderr, "Failed to create denoise state\n");
        rnnoise_model_free(model);
        return 1;
    }

    // Process audio frames...

    // Clean up in correct order
    rnnoise_destroy(st);
    rnnoise_model_free(model);

    return 0;
}
```

### Load Model from Memory Buffer

Loads a model from a pre-allocated memory buffer. Useful for embedded systems or when models are stored in application resources.

```c
#include <stdio.h>
#include <stdlib.h>
#include "rnnoise.h"

int main() {
    FILE *f;
    long file_size;
    unsigned char *buffer;
    RNNModel *model;
    DenoiseState *st;

    // Read entire model file into memory
    f = fopen("weights_blob.bin", "rb");
    fseek(f, 0, SEEK_END);
    file_size = ftell(f);
    fseek(f, 0, SEEK_SET);

    buffer = malloc(file_size);
    fread(buffer, 1, file_size, f);
    fclose(f);

    // Load model from buffer (buffer must remain valid)
    model = rnnoise_model_from_buffer(buffer, file_size);
    st = rnnoise_create(model);

    // Process audio...

    rnnoise_destroy(st);
    rnnoise_model_free(model);
    free(buffer);

    return 0;
}
```

### Get Frame Size

Retrieves the frame size required by the library. Always returns 480 samples for 48 kHz audio.

```c
#include <stdio.h>
#include "rnnoise.h"

int main() {
    int frame_size = rnnoise_get_frame_size();
    printf("RNNoise frame size: %d samples\n", frame_size);
    printf("At 48 kHz, this is %.1f ms\n",
           (float)frame_size / 48000.0f * 1000.0f);

    // Allocate buffers with correct size
    float *input_buffer = malloc(frame_size * sizeof(float));
    float *output_buffer = malloc(frame_size * sizeof(float));

    // Use buffers...

    free(input_buffer);
    free(output_buffer);
    return 0;
}
```

### Generate Training Features

Creates training feature files from clean speech and noise samples. This is the first step in training a custom model.

```bash
# Build the feature dumping tool
./autogen.sh
./configure
make

# Generate training features with RIR augmentation
./dump_features -rir_list rir_list.txt \
    clean_speech.pcm \
    background_noise.pcm \
    foreground_noise.pcm \
    training_features.f32 \
    200000

# Output: training_features.f32 containing 200,000 sequences
# Each sequence is 98 float32 values per frame
# Format: [65 input features, 32 gain targets, 1 VAD target]
```

### Train Custom Model with PyTorch

Trains a new RNNoise model using generated features. The training script supports custom architectures and hyperparameters.

```python
#!/usr/bin/env python3
import sys
sys.path.append('./torch/rnnoise')

# Train model with custom parameters
# Command line usage:
"""
python3 train_rnnoise.py \
    training_features.f32 \
    output_models/ \
    --batch-size 128 \
    --lr 0.001 \
    --epochs 200 \
    --sequence-length 2000 \
    --cond-size 128 \
    --gru-size 384 \
    --gamma 0.25

# Training produces checkpoint files:
# output_models/checkpoints/rnnoise_1.pth
# output_models/checkpoints/rnnoise_2.pth
# ...
# output_models/checkpoints/rnnoise_200.pth

# Choose epoch with best validation performance
# Typically aim for ~75000 weight updates total
"""
```

### Export Trained Model to C Code

Converts a trained PyTorch model to optimized C source files with optional quantization for smaller model size.

```python
#!/usr/bin/env python3
import sys
sys.path.append('./torch/rnnoise')

# Export with quantization
"""
python3 dump_rnnoise_weights.py \
    --quantize \
    output_models/checkpoints/rnnoise_50.pth \
    exported_c_code/

# Generated files:
# exported_c_code/rnnoise_data.c
# exported_c_code/rnnoise_data.h

# Copy to src/ directory
cp exported_c_code/rnnoise_data.c src/
cp exported_c_code/rnnoise_data.h src/

# Rebuild library with new model
make clean
make
make install
"""

# Without quantization (larger but potentially more accurate)
"""
python3 dump_rnnoise_weights.py \
    output_models/checkpoints/rnnoise_50.pth \
    exported_c_code/
"""
```

### Build Library with Custom Compilation Flags

Compiles the library with optimizations for specific CPU architectures. AVX2 support significantly improves performance.

```bash
# Clone and prepare
git clone https://gitlab.xiph.org/xiph/rnnoise.git
cd rnnoise

# Build with AVX2 support (recommended for modern CPUs)
./autogen.sh
./configure --enable-x86-rtcd CFLAGS="-O3 -march=native"
make
sudo make install

# Build without default model (for runtime loading only)
./configure CFLAGS="-O3 -march=native -DUSE_WEIGHTS_FILE"
make

# Build examples
./configure --enable-examples
make
./examples/rnnoise_demo input.raw output.raw

# Expected output format: 16-bit PCM, mono, 48 kHz
```

### Export Model as Binary Blob

Generates a loadable binary model file from a compiled library with embedded weights.

```bash
# Build with your custom model
cd src
cp my_rnnoise_data.c rnnoise_data.c
cp my_rnnoise_data.h rnnoise_data.h
cd ..
make

# Export model to binary format
./examples/dump_weights_blob

# Generated: weights_blob.bin
# This file can be loaded at runtime using rnnoise_model_from_file()

# Example: Load binary model in C
"""
FILE *model_file = fopen("weights_blob.bin", "rb");
RNNModel *model = rnnoise_model_from_file(model_file);
DenoiseState *st = rnnoise_create(model);

// Process audio...

rnnoise_destroy(st);
rnnoise_model_free(model);
fclose(model_file);  // Close only after freeing model
"""
```

### Command-Line Demo Tool

Processes raw 16-bit PCM audio files through the noise suppression pipeline. Operates on raw audio without WAV headers.

```bash
# Prepare input audio (must be 16-bit PCM, mono, 48kHz, raw format)
# Convert from WAV using ffmpeg or sox:
ffmpeg -i noisy_input.wav -f s16le -acodec pcm_s16le -ar 48000 -ac 1 noisy_input.raw

# Run noise suppression
./examples/rnnoise_demo noisy_input.raw denoised_output.raw

# Convert back to WAV for playback
ffmpeg -f s16le -ar 48000 -ac 1 -i denoised_output.raw denoised_output.wav

# Note: Input and output are RAW format, not WAV
# Each sample is a 16-bit signed integer (machine endian)
# No headers or metadata included
```

### Pre-allocated State Initialization

Uses a pre-allocated memory buffer for the denoise state instead of dynamic allocation. Useful for embedded systems.

```c
#include <stdlib.h>
#include "rnnoise.h"

int main() {
    int state_size;
    void *state_memory;
    DenoiseState *st;

    // Get required memory size
    state_size = rnnoise_get_size();
    printf("DenoiseState requires %d bytes\n", state_size);

    // Allocate memory
    state_memory = malloc(state_size);
    if (state_memory == NULL) {
        fprintf(stderr, "Memory allocation failed\n");
        return 1;
    }

    // Initialize pre-allocated state
    st = (DenoiseState *)state_memory;
    if (rnnoise_init(st, NULL) != 0) {
        fprintf(stderr, "Initialization failed\n");
        free(state_memory);
        return 1;
    }

    // Process audio frames...

    // No rnnoise_destroy() needed for pre-allocated state
    free(state_memory);
    return 0;
}
```

## Usage Summary

RNNoise is primarily used for real-time audio noise suppression in applications requiring low-latency speech enhancement. Common use cases include VoIP clients, podcasting software, live streaming applications, voice assistants, and hearing aid signal processing. The library processes audio in 10ms frames (480 samples at 48 kHz), making it suitable for real-time applications with minimal latency overhead. The neural network model uses a GRU-based architecture that balances quality and computational efficiency.

Integration typically involves reading audio samples, converting to float format, calling `rnnoise_process_frame()` for each 480-sample frame, and writing the denoised output. The library supports both embedded models compiled into the binary and external models loaded at runtime, allowing flexibility in deployment scenarios. Custom models can be trained on domain-specific datasets using the provided Python training scripts, enabling optimization for particular acoustic environments or speaker characteristics. The model export pipeline handles quantization and optimization, producing efficient C code suitable for embedded systems or resource-constrained environments.