### Main Application Entry Point

Source: https://context7.com/espressif/esp-sr/llms.txt

Sets up the necessary hardware drivers and starts the speech recognition tasks. Ensure I2S and other peripherals are initialized before calling this function.

```c
void app_main(void) {
    // Initialize I2S, GPIO, etc.
    i2s_driver_install(...);

    // Initialize speech recognition
    speech_recognition_init();

    // Start processing tasks
    xTaskCreatePinnedToCore(feed_task, "feed", 4096, NULL, 5, NULL, 0);
    xTaskCreatePinnedToCore(detect_task, "detect", 8192, NULL, 5, NULL, 1);
}
```

--------------------------------

### MultiNet Initialization and Setup

Source: https://github.com/espressif/esp-sr/blob/master/docs/zh_CN/speech_command_recognition/README.rst

Steps to initialize and configure MultiNet for command recognition, including model loading and setting command words.

```APIDOC
## MultiNet Initialization

### Description

This section covers the necessary steps for initializing the MultiNet module, including loading the appropriate models and configuring command words.

### Model Loading

Refer to the documentation on model loading for details on how to load MultiNet models.

- **See also**: :doc:`Model Loading <../flash_model/README>`

### Setting Command Words

Configure the specific command words that MultiNet should recognize.

- **See also**: :ref:`command-requirements`
```

--------------------------------

### Install g2p-en Package

Source: https://github.com/espressif/esp-sr/blob/master/tool/README.md

Install the g2p-en Python package using pip. This is required for processing English text into phonemes for the MultiNet5 model.

```bash
pip install g2p_en
```

--------------------------------

### WakeNet Wake Word Detection Example

Source: https://context7.com/espressif/esp-sr/llms.txt

This example demonstrates how to initialize, configure, and use the WakeNet engine for detecting wake words from audio streams. Ensure the necessary headers are included and models are available.

```c
#include "esp_wn_iface.h"
#include "esp_wn_models.h"

// Get WakeNet model handle
const esp_wn_iface_t *wakenet = esp_wn_handle_from_name("wn9_hilexin");
// Available models: wn9_hilexin, wn9_hiesp, wn9_alexa, wn9_nihaoxiaozhi, etc.

// Create WakeNet instance
model_iface_data_t *model_data = wakenet->create("wn9_hilexin", DET_MODE_90);
// DET_MODE_90: Normal sensitivity
// DET_MODE_95: Aggressive (higher detection, more false positives)

// Get processing parameters
int chunksize = wakenet->get_samp_chunksize(model_data);  // Samples per frame
int sample_rate = wakenet->get_samp_rate(model_data);     // 16000 Hz
int word_num = wakenet->get_word_num(model_data);         // Number of wake words

// Get wake word information
for (int i = 1; i <= word_num; i++) {
    char *name = wakenet->get_word_name(model_data, i);
    float threshold = wakenet->get_det_threshold(model_data, i);
    printf("Wake word %d: %s (threshold: %.2f)\n", i, name, threshold);
}

// Set custom detection threshold (0.4 - 0.9999)
wakenet->set_det_threshold(model_data, 0.85, 1);  // threshold=0.85, word_index=1

// Process audio frames
int16_t *audio_buffer = (int16_t *)malloc(chunksize * sizeof(int16_t));
while (1) {
    // Fill audio_buffer with 16-bit mono audio @ 16kHz
    read_audio(audio_buffer, chunksize);

    wakenet_state_t state = wakenet->detect(model_data, audio_buffer);

    if (state == WAKENET_DETECTED) {
        int channel = wakenet->get_triggered_channel(model_data);
        int start_point = wakenet->get_start_point(model_data);
        printf("Wake word detected! Channel: %d, Start: %d samples back\n",
               channel, start_point);
    }
}

// Reset state for new session
wakenet->clean(model_data);

// Cleanup
wakenet->destroy(model_data);
free(audio_buffer);

```

--------------------------------

### Run ESP-SR Test Suite

Source: https://github.com/espressif/esp-sr/blob/master/test_apps/README.md

Commands to install test dependencies and execute the pytest suite for the ESP32-S3 target.

```bash
pip install -r test_apps/requirement.txt
pytest test_apps --target esp32s3
```

--------------------------------

### Get Sample Chunk Size - MultiNet

Source: https://github.com/espressif/esp-sr/blob/master/docs/zh_CN/speech_command_recognition/README.rst

Retrieves the required sample chunk size for MultiNet input. This value must match the AFE fetch frame length.

```c
int mu_chunksize = multinet->get_samp_chunksize(model_data);
```

--------------------------------

### Get Recognition Results - MultiNet

Source: https://github.com/espressif/esp-sr/blob/master/docs/zh_CN/speech_command_recognition/README.rst

Retrieves the results of the command word recognition. Call this API when the state is ESP_MN_STATE_DETECTED.

```c
esp_mn_results_t *mn_result = multinet->get_results(model_data);
```

--------------------------------

### Create AFE Instance from Configuration

Source: https://github.com/espressif/esp-sr/blob/master/docs/zh_CN/audio_front_end/README.rst

Creates an AFE instance using the provided configuration. Obtain the AFE interface handle first, then use it to create the data structure for the AFE instance.

```c
// 获取句柄
esp_afe_sr_iface_t *afe_handle = esp_afe_handle_from_config(afe_config);
// 创建实例
esp_afe_sr_data_t *afe_data = afe_handle->create_from_config(afe_config);
```

--------------------------------

### Initialize AFE Configuration

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/audio_front_end/README.rst

Use afe_config_init to set up the AFE parameters, including input format and model selection.

```c
srmodel_list_t *models = esp_srmodel_init("model");
afe_config_t *afe_config = afe_config_init("MMNR", models, AFE_TYPE_SR, AFE_MODE_HIGH_PERF);
```

--------------------------------

### Prepare English Commands for MultiNet6

Source: https://github.com/espressif/esp-sr/blob/master/tool/README.md

Create a text file with command IDs and their corresponding sentences for English. Each line should follow the format '# command_id command_sentence'.

```text
# command_id command_sentence
1 TELL ME A JOKE
2 MAKE A COFFEE
```

--------------------------------

### Build ESP-SR Test Applications

Source: https://github.com/espressif/esp-sr/blob/master/test_apps/README.md

Commands to set up the environment and build test applications for the ESP32-S3 target.

```bash
. ${IDF_PATH}/export.sh
pip install idf_build_apps
python test_apps/build_apps.py test_apps -t esp32s3
```

--------------------------------

### Implement AEC using AFE and Direct Interfaces

Source: https://context7.com/espressif/esp-sr/llms.txt

Demonstrates both the recommended AFE AEC interface for multi-channel audio and the direct AEC interface for manual reference signal handling.

```c
#include "esp_afe_aec.h"
#include "esp_aec.h"

// Using AFE AEC interface (recommended)
afe_aec_handle_t *afe_aec = afe_aec_create(
    "MNR",               // input_format: M=mic, N=unused, R=reference
    4,                   // filter_length in frames
    AFE_TYPE_SR,         // AFE_TYPE_SR or AFE_TYPE_VC
    AFE_MODE_LOW_COST    // AFE_MODE_LOW_COST or AFE_MODE_HIGH_PERF
);

int frame_size = afe_aec->frame_size;
int nch = afe_aec->pcm_config.total_ch_num;

int16_t *input_data = (int16_t *)malloc(frame_size * nch * sizeof(int16_t));
int16_t *output_data = (int16_t *)malloc(frame_size * sizeof(int16_t));

while (1) {
    // Read multi-channel audio (interleaved format)
    read_multichannel_audio(input_data, frame_size * nch);

    // Process AEC
    afe_aec_process(afe_aec, input_data, output_data);

    // output_data contains echo-cancelled audio
    process_clean_audio(output_data, frame_size);
}

afe_aec_destroy(afe_aec);

// Using direct AEC interface
aec_handle_t *aec = aec_create(
    16000,              // sample_rate
    4,                  // filter_length
    1,                  // mic_channels
    AEC_MODE_SR_LOW_COST  // or AEC_MODE_SR_HIGH_PERF
);

int16_t *mic_data = (int16_t *)malloc(frame_size * sizeof(int16_t));
int16_t *ref_data = (int16_t *)malloc(frame_size * sizeof(int16_t));
int16_t *out_data = (int16_t *)malloc(frame_size * sizeof(int16_t));

while (1) {
    read_mic_audio(mic_data, frame_size);
    read_speaker_ref(ref_data, frame_size);

    aec_process(aec, mic_data, ref_data, out_data);

    // out_data has echo removed
}

aec_destroy(aec);
```

--------------------------------

### Manage AFE Instance and Pipeline

Source: https://context7.com/espressif/esp-sr/llms.txt

Creates an AFE instance from a configuration and provides methods to control audio processing algorithms and retrieve runtime parameters.

```c
// Get AFE handle from configuration
const esp_afe_sr_iface_t *afe_handle = esp_afe_handle_from_config(afe_config);

// Create AFE instance
esp_afe_sr_data_t *afe_data = afe_handle->create_from_config(afe_config);

// Print the processing pipeline
afe_handle->print_pipeline(afe_data);
// Output: [input] -> |AEC(VOIP_HIGH_PERF)| -> |WakeNet(wn9_hilexin)| -> [output]

// Get processing parameters
int feed_chunksize = afe_handle->get_feed_chunksize(afe_data);   // Samples per frame
int feed_nch = afe_handle->get_feed_channel_num(afe_data);       // Input channel count
int fetch_chunksize = afe_handle->get_fetch_chunksize(afe_data); // Output samples per frame
int sample_rate = afe_handle->get_samp_rate(afe_data);           // Sample rate (16000 Hz)

// Control individual algorithms
afe_handle->disable_wakenet(afe_data);   // Disable wake word detection
afe_handle->enable_wakenet(afe_data);    // Re-enable wake word detection
afe_handle->disable_aec(afe_data);       // Disable echo cancellation
afe_handle->enable_aec(afe_data);        // Re-enable echo cancellation
afe_handle->disable_vad(afe_data);       // Disable voice activity detection
afe_handle->enable_vad(afe_data);        // Re-enable voice activity detection
afe_handle->reset_vad(afe_data);         // Reset VAD state

// Adjust wake word detection threshold (0.4 - 0.9999)
afe_handle->set_wakenet_threshold(afe_data, 1, 0.8);  // wakenet_index=1, threshold=0.8
afe_handle->reset_wakenet_threshold(afe_data, 1);     // Reset to default

// Cleanup
afe_handle->destroy(afe_data);
afe_config_free(afe_config);
esp_srmodel_deinit(models);
```

--------------------------------

### Get MultiNet Chunk Size

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/speech_command_recognition/README.rst

Retrieves the required frame length (chunk size) for data passed to MultiNet. This size must match the AFE fetch frame length.

```APIDOC
## multinet->get_samp_chunksize

### Description
Get the length of the frame that needs to be passed to MultiNet. This size is exactly the same as the number of data points per frame obtained in AFE.

### Method
(Not specified, likely a method call on a MultiNet object)

### Endpoint
(Not applicable, this is a library function)

### Parameters
#### Path Parameters
None
#### Query Parameters
None
#### Request Body
- **model_data** (const model_iface_data_t *) - Required - The model object to query.

### Request Example
(Not applicable)

### Response
#### Success Response (200)
- int - The size of the sample chunk (short) required for each frame passed to MultiNet.

### Response Example
```c
int mu_chunksize = multinet->get_samp_chunksize(model_data);
```
```

--------------------------------

### Configure Audio Front-End (AFE)

Source: https://context7.com/espressif/esp-sr/llms.txt

Initializes AFE models and configures processing parameters like AEC, NS, VAD, and WakeNet. Requires valid model partitions and specific input channel formatting.

```c
#include "esp_afe_sr_models.h"
#include "esp_afe_sr_iface.h"

// Initialize models from partition
srmodel_list_t *models = esp_srmodel_init("model");

// Initialize AFE configuration
// input_format: "MMNR" = 2 mic channels, 1 unused, 1 reference
afe_config_t *afe_config = afe_config_init("MMNR", models, AFE_TYPE_SR, AFE_MODE_HIGH_PERF);

// Customize configuration parameters
afe_config->aec_init = true;                    // Enable Acoustic Echo Cancellation
afe_config->ns_init = true;                     // Enable Noise Suppression
afe_config->vad_init = true;                    // Enable Voice Activity Detection
afe_config->vad_mode = VAD_MODE_1;              // VAD aggressiveness (0-4)
afe_config->vad_min_speech_ms = 128;            // Minimum speech duration (ms)
afe_config->vad_min_noise_ms = 1000;            // Minimum noise duration (ms)
afe_config->wakenet_init = true;                // Enable WakeNet
afe_config->wakenet_mode = DET_MODE_90;         // Wake word detection sensitivity
afe_config->agc_init = true;                    // Enable Automatic Gain Control
afe_config->afe_linear_gain = 1.0;              // Output gain factor [0.1 - 10.0]
afe_config->memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM;

// Print configuration for debugging
afe_config_print(afe_config);
```

--------------------------------

### Audio Front-End (AFE) Configuration

Source: https://context7.com/espressif/esp-sr/llms.txt

This section covers the initialization and configuration of the AFE framework, including setting up audio processing algorithms like AEC, NS, VAD, and WakeNet.

```APIDOC
## Audio Front-End (AFE) Configuration

The AFE framework processes audio data for speech recognition and voice communication applications. It initializes and configures all audio processing algorithms including AEC, NS, VAD, and WakeNet detection based on the input format specification where 'M' represents microphone channels, 'R' represents playback reference channels, and 'N' represents unused channels.

### Method
C Code Example

### Endpoint
N/A

### Parameters
#### Request Body
- **input_format** (string) - Required - Specifies microphone, reference, and unused channels (e.g., "MMNR").
- **models** (srmodel_list_t*) - Required - Pointer to initialized speech recognition models.
- **afe_type** (afe_type_t) - Required - Type of AFE (e.g., AFE_TYPE_SR).
- **afe_mode** (afe_mode_t) - Required - Performance mode (e.g., AFE_MODE_HIGH_PERF).

### Request Example
```c
#include "esp_afe_sr_models.h"
#include "esp_afe_sr_iface.h"

// Initialize models from partition
srmodel_list_t *models = esp_srmodel_init("model");

// Initialize AFE configuration
afe_config_t *afe_config = afe_config_init("MMNR", models, AFE_TYPE_SR, AFE_MODE_HIGH_PERF);

// Customize configuration parameters
afe_config->aec_init = true;                    // Enable Acoustic Echo Cancellation
afe_config->ns_init = true;                     // Enable Noise Suppression
afe_config->vad_init = true;                    // Enable Voice Activity Detection
afe_config->vad_mode = VAD_MODE_1;              // VAD aggressiveness (0-4)
afe_config->vad_min_speech_ms = 128;            // Minimum speech duration (ms)
afe_config->vad_min_noise_ms = 1000;            // Minimum noise duration (ms)
afe_config->wakenet_init = true;                // Enable WakeNet
afe_config->wakenet_mode = DET_MODE_90;         // Wake word detection sensitivity
afe_config->agc_init = true;                    // Enable Automatic Gain Control
afe_config->afe_linear_gain = 1.0;              // Output gain factor [0.1 - 10.0]
afe_config->memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM;

// Print configuration for debugging
ave_config_print(afe_config);
```

### Response
#### Success Response (200)
- **afe_config_t*** - Pointer to the initialized AFE configuration structure.

#### Response Example
N/A
```

--------------------------------

### Create AFE Instance from Configuration

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/audio_front_end/migration_guide.rst

Obtain an AFE handle using `esp_afe_handle_from_config` with the initialized configuration. The previous `ESP_AFE_SR_HANDLE` and `ESP_AFE_VC_HANDLE` are no longer used.

```c
esp_afe_sr_iface_t *afe_handle = esp_afe_handle_from_config(afe_config);
```

--------------------------------

### Configure ESP-SR Project CMakeLists.txt

Source: https://github.com/espressif/esp-sr/blob/master/test_apps/esp-sr/CMakeLists.txt

Sets up the CMake build system for the ESP-SR test subproject. Includes necessary components and defines the project name.

```cmake
cmake_minimum_required(VERSION 3.5)

# Include the components directory of the main application:
#
set(EXTRA_COMPONENT_DIRS "$ENV{IDF_PATH}/tools/unit-test-app/components"
                         "../../../esp-sr")

include($ENV{IDF_PATH}/tools/cmake/project.cmake)
project(esp_sr_test)
```

--------------------------------

### Create AFE Instance

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/audio_front_end/README.rst

Initialize the AFE handle and data instance based on the previously defined configuration.

```c
// get handle
esp_afe_sr_iface_t *afe_handle = esp_afe_handle_from_config(afe_config);
// create instance
esp_afe_sr_data_t *afe_data = afe_handle->create_from_config(afe_config);
```

--------------------------------

### Prepare Chinese Commands for MultiNet6

Source: https://github.com/espressif/esp-sr/blob/master/tool/README.md

Create a text file with command IDs and their corresponding Pinyin sentences for Chinese. Each line should follow the format '# command_id command_sentence'.

```text
# command_id command_sentence
1 da kai kong tiao
2 guan bi kong tiao
```

--------------------------------

### Initialize and Configure MultiNet Speech Commands

Source: https://context7.com/espressif/esp-sr/llms.txt

This snippet demonstrates how to initialize the MultiNet model, add custom speech commands with IDs and strings, and optionally add phonemes for improved accuracy. It also shows how to update the command list and print active commands. Ensure the correct model name (e.g., "mn7_en") is used.

```c
#include "esp_mn_iface.h"
#include "esp_mn_models.h"
#include "esp_mn_speech_commands.h"

// Get MultiNet model handle
const esp_mn_iface_t *multinet = esp_mn_handle_from_name("mn7_en");
// Available: mn5q8_cn, mn5q8_en, mn6_cn, mn6_en, mn7_cn, mn7_en

// Create MultiNet instance with timeout (ms)
model_iface_data_t *mn_model = multinet->create("mn7_en", 6000);  // 6 second timeout

// Initialize speech commands management
esp_mn_commands_alloc(multinet, mn_model);

// Add speech commands (command_id, command_string)
esp_mn_commands_add(1, "turn on the light");
esp_mn_commands_add(2, "turn off the light");
esp_mn_commands_add(3, "play music");
esp_mn_commands_add(4, "stop music");
esp_mn_commands_add(5, "increase volume");
esp_mn_commands_add(6, "decrease volume");

// For MultiNet7, optionally add phonemes for better accuracy
esp_mn_commands_phoneme_add(7, "tell me a joke", "TfL Mm c qbK");

// Apply command changes (required after add/remove/modify)
esp_mn_error_t *err = esp_mn_commands_update();
if (err != NULL) {
    printf("Error adding %d commands\n", err->num);
}

// Print active commands
esp_mn_active_commands_print();

// Get processing parameters
int mn_chunksize = multinet->get_samp_chunksize(mn_model);
char *language = multinet->get_language(mn_model);  // "en" or "cn"
```

```c
// Modify commands at runtime
esp_mn_commands_modify("play music", "start playing");
esp_mn_commands_remove("stop music");
esp_mn_commands_update();

// Clear all commands
esp_mn_commands_clear();
esp_mn_commands_update();

// Cleanup
multinet->destroy(mn_model);
esp_mn_commands_free();
free(audio_buffer);
```

--------------------------------

### Initialize AFE Configuration

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/audio_front_end/migration_guide.rst

Use `afe_config_init` to initialize configurations, specifying the input format, models, AFE type, and performance mode. The previous `AFE_CONFIG_DEFAULT()` method is removed.

```c
afe_config_t *afe_config = afe_config_init("MMNR", models, AFE_TYPE_SR, AFE_MODE_HIGH_PERF);
afe_config_print(afe_config); // print all configurations
```

--------------------------------

### AFE Instance Creation and Pipeline Control

Source: https://context7.com/espressif/esp-sr/llms.txt

This section describes how to create an AFE instance, control the processing pipeline, adjust algorithm parameters, and manage the AFE lifecycle.

```APIDOC
## AFE Instance Creation and Pipeline Control

Create an AFE instance using the configuration and access the interface handle for audio processing operations. The handle provides methods to feed audio data, fetch processed results, and control individual algorithms in the pipeline.

### Method
C Code Example

### Endpoint
N/A

### Parameters
#### Path Parameters
N/A

#### Query Parameters
N/A

#### Request Body
N/A

### Request Example
```c
// Get AFE handle from configuration
const esp_afe_sr_iface_t *afe_handle = esp_afe_handle_from_config(afe_config);

// Create AFE instance
esp_afe_sr_data_t *afe_data = afe_handle->create_from_config(afe_config);

// Print the processing pipeline
ave_handle->print_pipeline(afe_data);
// Output: [input] -> |AEC(VOIP_HIGH_PERF)| -> |WakeNet(wn9_hilexin)| -> [output]

// Get processing parameters
int feed_chunksize = afe_handle->get_feed_chunksize(afe_data);   // Samples per frame
int feed_nch = afe_handle->get_feed_channel_num(afe_data);       // Input channel count
int fetch_chunksize = afe_handle->get_fetch_chunksize(afe_data); // Output samples per frame
int sample_rate = afe_handle->get_samp_rate(afe_data);           // Sample rate (16000 Hz)

// Control individual algorithms
ave_handle->disable_wakenet(afe_data);   // Disable wake word detection
ave_handle->enable_wakenet(afe_data);    // Re-enable wake word detection
ave_handle->disable_aec(afe_data);       // Disable echo cancellation
ave_handle->enable_aec(afe_data);        // Re-enable echo cancellation
ave_handle->disable_vad(afe_data);       // Disable voice activity detection
ave_handle->enable_vad(afe_data);        // Re-enable voice activity detection
ave_handle->reset_vad(afe_data);         // Reset VAD state

// Adjust wake word detection threshold (0.4 - 0.9999)
ave_handle->set_wakenet_threshold(afe_data, 1, 0.8);  // wakenet_index=1, threshold=0.8
ave_handle->reset_wakenet_threshold(afe_data, 1);     // Reset to default

// Cleanup
ave_handle->destroy(afe_data);
ave_config_free(afe_config);
esp_srmodel_deinit(models);
```

### Response
#### Success Response (200)
- **esp_afe_sr_data_t*** - Pointer to the created AFE instance.
- **int** - Processing chunk size, channel number, fetch chunk size, sample rate.
- **void** - Control functions return void.

#### Response Example
N/A
```

--------------------------------

### Initialize Speech Recognition System

Source: https://context7.com/espressif/esp-sr/llms.txt

Initializes the Audio Front-End (AFE) and MultiNet for speech recognition. Ensure models are loaded and AFE/MultiNet configurations are set appropriately for your hardware and desired performance.

```c
#include "esp_afe_sr_models.h"
#include "esp_afe_sr_iface.h"
#include "esp_mn_iface.h"
#include "esp_mn_models.h"
#include "esp_mn_speech_commands.h"

// Global handles
static srmodel_list_t *models = NULL;
static const esp_afe_sr_iface_t *afe_handle = NULL;
static esp_afe_sr_data_t *afe_data = NULL;
static const esp_mn_iface_t *multinet = NULL;
static model_iface_data_t *mn_model = NULL;

void speech_recognition_init(void) {
    // Load models
    models = esp_srmodel_init("model");

    // Configure AFE
    afe_config_t *afe_config = afe_config_init("MMR", models, AFE_TYPE_SR, AFE_MODE_HIGH_PERF);
    afe_config->wakenet_model_name = "wn9_hiesp";
    afe_config->vad_init = true;
    afe_config->vad_min_speech_ms = 128;

    // Create AFE
    afe_handle = esp_afe_handle_from_config(afe_config);
    afe_data = afe_handle->create_from_config(afe_config);
    afe_config_free(afe_config);

    // Initialize MultiNet
    multinet = esp_mn_handle_from_name("mn7_en");
    mn_model = multinet->create("mn7_en", 6000);

    // Configure speech commands
    esp_mn_commands_alloc(multinet, mn_model);
    esp_mn_commands_add(1, "turn on the light");
    esp_mn_commands_add(2, "turn off the light");
    esp_mn_commands_add(3, "set brightness to maximum");
    esp_mn_commands_add(4, "set brightness to minimum");
    esp_mn_commands_update();
}
```

--------------------------------

### Implement AFE Feed and Fetch Tasks

Source: https://context7.com/espressif/esp-sr/llms.txt

Demonstrates the structure of feed and fetch tasks for the AFE pipeline, including buffer allocation, I2S reading, and result processing.

```c
// Allocate audio buffer for feeding
int16_t *feed_buff = (int16_t *)malloc(feed_chunksize * feed_nch * sizeof(int16_t));

// Feed Task - runs on dedicated core
void feed_task(void *arg) {
    afe_task_into_t *task_info = (afe_task_into_t *)arg;
    const esp_afe_sr_iface_t *afe_handle = task_info->afe_handle;
    esp_afe_sr_data_t *afe_data = task_info->afe_data;

    int feed_chunksize = afe_handle->get_feed_chunksize(afe_data);
    int feed_nch = afe_handle->get_feed_channel_num(afe_data);
    int16_t *i2s_buff = (int16_t *)malloc(feed_chunksize * feed_nch * sizeof(int16_t));

    while (1) {
        // Read audio from I2S (channel-interleaved format)
        i2s_read(I2S_NUM_0, i2s_buff, feed_chunksize * feed_nch * sizeof(int16_t),
                 &bytes_read, portMAX_DELAY);

        // Feed to AFE pipeline
        afe_handle->feed(afe_data, i2s_buff);
    }
    free(i2s_buff);
    vTaskDelete(NULL);
}

// Fetch Task - runs on dedicated core
void fetch_task(void *arg) {
    afe_task_into_t *task_info = (afe_task_into_t *)arg;
    const esp_afe_sr_iface_t *afe_handle = task_info->afe_handle;
    esp_afe_sr_data_t *afe_data = task_info->afe_data;

    while (1) {
        // Fetch processed audio and detection results
        afe_fetch_result_t *result = afe_handle->fetch(afe_data);
        // Or with custom timeout:
        // afe_fetch_result_t *result = afe_handle->fetch_with_delay(afe_data, 100 / portTICK_PERIOD_MS);

        if (!result || result->ret_value == ESP_FAIL) {
            break;
        }

        // Access processed audio data
        int16_t *audio_data = result->data;
        int data_size = result->data_size;        // Size in bytes
        float volume_db = result->data_volume;    // Volume in dB

        // Check VAD cache (prevents speech truncation)
        if (result->vad_cache_size > 0) {
            int16_t *vad_cache = result->vad_cache;
            // Prepend vad_cache to audio_data for complete speech
        }

        // Check Voice Activity Detection state
        if (result->vad_state == VAD_SPEECH) {
            printf("Speech detected\n");
        } else {
            printf("Silence/Noise\n");
        }

        // Check Wake Word Detection
        if (result->wakeup_state == WAKENET_DETECTED) {
            int wake_word_index = result->wake_word_index;    // Which wake word (1-based)
            int model_index = result->wakenet_model_index;    // Which model detected
            int wake_length = result->wake_word_length;       // Samples of wake word
            printf("Wake word %d detected by model %d\n", wake_word_index, model_index);
        }
    }
    vTaskDelete(NULL);
}

// Create tasks
xTaskCreatePinnedToCore(feed_task, "feed", 8*1024, &task_info, 5, NULL, 0);
xTaskCreatePinnedToCore(fetch_task, "fetch", 8*1024, &task_info, 5, NULL, 1);
```

--------------------------------

### Create and Configure VAD Instance

Source: https://context7.com/espressif/esp-sr/llms.txt

This snippet shows how to create a Voice Activity Detection (VAD) instance using different modes or custom parameters. It highlights the available modes from `VAD_MODE_0` to `VAD_MODE_4` and allows for detailed configuration of sample rate, frame length, and speech/noise durations. Choose the mode and parameters that best suit your application's accuracy and performance needs.

```c
#include "esp_vad.h"

// Create VAD instance with mode
vad_handle_t vad = vad_create(VAD_MODE_3);
// VAD_MODE_0: Normal (more speech detected)
// VAD_MODE_1: Aggressive
// VAD_MODE_2: Very Aggressive
// VAD_MODE_3: Very Very Aggressive
// VAD_MODE_4: Very Very Very Aggressive (less false positives)

// Or create with detailed parameters
vad_handle_t vad_custom = vad_create_with_param(
    VAD_MODE_2,        // mode
    16000,             // sample_rate (8000, 16000, or 32000)
    30,                // frame_length_ms (10, 20, or 30)
    200,               // min_speech_ms
    500                // min_noise_ms
);
```

--------------------------------

### Generate Model Binary for Arduino

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/flash_model/README.rst

Use this Python script to generate the srmodels.bin file required for manual loading in the Arduino framework.

```bash
python {esp-sr_path}/movemodel.py -d1 {sdkconfig_path} -d2 {esp-sr_path} -d3 {build_path}
```

--------------------------------

### MultiNet Running and Detection

Source: https://github.com/espressif/esp-sr/blob/master/docs/zh_CN/speech_command_recognition/README.rst

Instructions on how to run MultiNet for command recognition after AFE and WakeNet are enabled, including data formatting and API calls.

```APIDOC
## MultiNet Running

### Description

Once the AFE and WakeNet are enabled, MultiNet can be run for command recognition. Ensure that the frame length passed to MultiNet matches the AFE fetch frame length. The supported audio format is 16 KHz, 16 bit, mono channel.

### Get Sample Chunk Size

Determine the required chunk size for audio data to be fed into MultiNet.

```c
int mu_chunksize = multinet->get_samp_chunksize(model_data);
```

- **`mu_chunksize`**: The number of `short` type audio samples per frame required by MultiNet. This should be equal to the number of samples fetched by AFE per frame.

### MultiNet Detection

Feed the real-time fetched audio data from AFE into the `detect` API.

```c
esp_mn_state_t mn_state = multinet->detect(model_data, buff);
```

- **`buff`**: The audio data buffer. Its length should be `mu_chunksize * sizeof(int16_t)`.
```

--------------------------------

### Configure CMakeLists.txt for ESP-SR Test Project

Source: https://github.com/espressif/esp-sr/blob/master/test_apps/esp32c5/CMakeLists.txt

Sets the required CMake version, defines extra component directories, and includes the ESP-IDF project configuration.

```cmake
cmake_minimum_required(VERSION 3.5)

# Include the components directory of the main application:
#
set(EXTRA_COMPONENT_DIRS "$ENV{IDF_PATH}/tools/unit-test-app/components"
                         "../../../esp-sr")

include($ENV{IDF_PATH}/tools/cmake/project.cmake)
project(esp32c5_test)
```

--------------------------------

### Add Speech Commands via menuconfig

Source: https://github.com/espressif/esp-sr/blob/master/tool/README.md

Navigate through the ESP-IDF configuration menu to add speech commands. Access the 'ESP Speech Recognition' section and select 'Add speech commands'.

```bash
idf.py menuconfig
ESP Speech Recognition -> Add speech commands
```

--------------------------------

### Configure Voice Data Partition

Source: https://github.com/espressif/esp-sr/blob/master/test_apps/esp-tts/main/CMakeLists.txt

Defines a custom build target to include voice data and configures the flash process to write the data to a specific partition.

```cmake
set(voice_data_image ${PROJECT_DIR}/../../esp-tts/esp_tts_chinese/esp_tts_voice_data_xiaoxin_small.dat)
add_custom_target(voice_data ALL DEPENDS ${voice_data_image})
add_dependencies(flash voice_data)

partition_table_get_partition_info(size "--partition-name voice_data" "size")
partition_table_get_partition_info(offset "--partition-name voice_data" "offset")

if("${size}" AND "${offset}")
    esptool_py_flash_to_partition(flash "voice_data" "${voice_data_image}")
else()
    set(message "Failed to find model in partition table file"
                "Please add a line(Name=voice_data, Type=data, Size=3890K) to the partition file.")
endif()
```

--------------------------------

### Configure Model Partition and Flash Target in CMake

Source: https://github.com/espressif/esp-sr/blob/master/CMakeLists.txt

Use this logic within a CMake component file to verify the existence of a 'model' partition and register the custom build and flash commands.

```cmake
if(CONFIG_PARTITION_TABLE_CUSTOM)
    partition_table_get_partition_info(size "--partition-name model" "size")
    partition_table_get_partition_info(offset "--partition-name model" "offset")

    if("${size}" AND "${offset}")
        set(MVMODEL_EXE ${COMPONENT_PATH}/model/movemodel.py)
        idf_build_get_property(build_dir BUILD_DIR)
        set(image_file ${build_dir}/srmodels/srmodels.bin)

        add_custom_command(
            OUTPUT ${image_file}
            COMMENT "Move and Pack models..."
            COMMAND python ${MVMODEL_EXE} -d1 ${SDKCONFIG} -d2 ${COMPONENT_PATH} -d3 ${build_dir}
            DEPENDS ${SDKCONFIG}
            VERBATIM)

        add_custom_target(srmodels_bin ALL DEPENDS ${image_file})
        add_dependencies(flash srmodels_bin)
        esptool_py_flash_to_partition(flash "model" "${image_file}")
    else()
        set(message "Failed to find model in partition table file"
                    "Please add a line(Name=model) to the partition file if you want to use esp-sr models.")
    endif()
endif()
```

--------------------------------

### Initialize and Use Chinese TTS

Source: https://context7.com/espressif/esp-sr/llms.txt

Initializes the TTS module by finding a voice data partition, creating a voice set from a template, and then synthesizing Chinese text, pinyin, or monetary amounts. Ensure the 'voice_data' partition exists and is correctly formatted. The speech rate can be adjusted during playback.

```c
#include "esp_tts.h"
#include "esp_tts_voice_xiaole.h"
#include "esp_partition.h"

// Initialize voice set from partition
const esp_partition_t *part = esp_partition_find_first(
    ESP_PARTITION_TYPE_DATA,
    ESP_PARTITION_SUBTYPE_DATA_FAT,
    "voice_data"
);
if (part == NULL) {
    printf("Voice data partition not found!\n");
    return;
}

spi_flash_mmap_handle_t mmap;
uint16_t *voicedata;
esp_partition_mmap(part, 0, part->size, SPI_FLASH_MMAP_DATA,
                   (const void **)&voicedata, &mmap);

// Create voice set from template
esp_tts_voice_t *voice = esp_tts_voice_set_init(&esp_tts_voice_template, voicedata);

// Create TTS handle
esp_tts_handle_t tts = esp_tts_create(voice);

// Synthesize Chinese text
char *text = "欢迎使用乐鑫语音合成";
if (esp_tts_parse_chinese(tts, text)) {
    int len = 0;
    do {
        // Get audio data stream (speed: 0=slowest, 5=fastest)
        short *data = esp_tts_stream_play(tts, &len, 3);  // speed=3

        if (len > 0) {
            // Output via I2S (16-bit mono @ 16kHz)
            i2s_write(I2S_NUM_0, data, len * sizeof(short), 
                      &bytes_written, portMAX_DELAY);
        }
    } while (len > 0);
}

// Synthesize using pinyin
char *pinyin = "da4 jia1 hao3";  // 大家好
if (esp_tts_parse_pinyin(tts, pinyin)) {
    int len = 0;
    do {
        short *data = esp_tts_stream_play(tts, &len, 2);
        if (len > 0) {
            i2s_write(I2S_NUM_0, data, len * sizeof(short), 
                      &bytes_written, portMAX_DELAY);
        }
    } while (len > 0);
}

// Synthesize payment amounts (for payment terminals)
// yuan=72, jiao=1, fen=0, mode=ALI_PAY_MODE
if (esp_tts_parse_money(tts, 72, 1, 0, ALI_PAY_MODE)) {
    // "支付宝收款 72.1 元"
    int len = 0;
    do {
        short *data = esp_tts_stream_play(tts, &len, 2);
        if (len > 0) {
            i2s_write(I2S_NUM_0, data, len * sizeof(short), 
                      &bytes_written, portMAX_DELAY);
        }
    } while (len > 0);
}
// Pay modes: NONE_MODE, ALI_PAY_MODE, WEIXIN_PAY_MODE

// Reset TTS state for next synthesis
esp_tts_stream_reset(tts);

// Cleanup
esp_tts_destroy(tts);
esp_tts_voice_set_free(voice);

```

--------------------------------

### Configure ESP-SR Component

Source: https://github.com/espressif/esp-sr/blob/master/esp-tts/CMakeLists.txt

Sets include directories and registers the component. Links interface libraries based on the IDF target.

```cmake
set(COMPONENT_ADD_INCLUDEDIRS
    ./esp_tts_chinese/include
    )

register_component()

target_link_libraries(${COMPONENT_TARGET} INTERFACE "-L ${CMAKE_CURRENT_SOURCE_DIR}/esp_tts_chinese")

if(IDF_TARGET STREQUAL "esp32")
target_link_libraries(${COMPONENT_TARGET} INTERFACE
    esp_tts_chinese 
    voice_set_xiaole 
    voice_set_template
    )
endif()

if(IDF_TARGET STREQUAL "esp32s2")
target_link_libraries(${COMPONENT_TARGET} INTERFACE
    esp_tts_chinese_esp32s2 
    voice_set_xiaole_esp32s2
    voice_set_template_esp32s2
    )
endif()

if(IDF_TARGET STREQUAL "esp32s3")
target_link_libraries(${COMPONENT_TARGET} INTERFACE
    esp_tts_chinese_esp32s3
    voice_set_xiaole_esp32s3
    )
endif()
```

--------------------------------

### Initialize and Stream TTS Synthesis

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/speech_synthesis/readme.rst

Initializes the voice data partition and performs streaming synthesis of Chinese text to I2S output.

```c
#include "esp_tts.h"
#include "esp_tts_voice_female.h"
#include "esp_partition.h"

/*** 1. create esp tts handle  ***/


// initial voice set from separate voice data partition

const esp_partition_t* part=esp_partition_find_first(ESP_PARTITION_TYPE_DATA, ESP_PARTITION_SUBTYPE_DATA_FAT, "voice_data");
if (part==0) printf("Couldn't find voice data partition!\n");
spi_flash_mmap_handle_t mmap;
uint16_t* voicedata;
esp_err_t err=esp_partition_mmap(part, 0, part->size, SPI_FLASH_MMAP_DATA, (const void**)&voicedata, &mmap);
esp_tts_voice_t *voice=esp_tts_voice_set_init(&esp_tts_voice_template, voicedata);

// 2. parse text and synthesis wave data
char *text="欢迎使用乐鑫语音合成";
if (esp_tts_parse_chinese(tts_handle, text)) {  // parse text into pinyin list
    int len[1]={0};
    do {
        short *data=esp_tts_stream_play(tts_handle, len, 4); // streaming synthesis
        i2s_audio_play(data, len[0]*2, portMAX_DELAY);  // i2s output
    } while(len[0]>0);
    i2s_zero_dma_buffer(0);
}
```

--------------------------------

### Add Speech Commands via Reset Function

Source: https://github.com/espressif/esp-sr/blob/master/tool/README.md

Use the `multinet->reset` function to dynamically add speech commands. Commands are split by ';' and phrases within a command are split by ','.

```c
// Function definition
// typedef void (*esp_mn_iface_op_reset_t)(model_iface_data_t *model_data, char *command_str, char *err_phrase_id);

// "," is used to split different phrase with same command id
// ";" is used to split different command id
char *new_commands_str="hcLb WkLD,hi fST;TkN nN jc LiT;TkN eF jc LiT;"  //
char err_id[256];
multinet->reset(model_data, new_commands_str, err_id);
// hello world,hi ESP -> commond id=0
// turn on the light -> commond id=1
// turn off the light -> commond id=2
```

--------------------------------

### Process Audio with MultiNet for Command Detection

Source: https://context7.com/espressif/esp-sr/llms.txt

This snippet shows how to process audio buffers using a MultiNet model after wake word detection. It continuously detects speech commands, retrieves results including command IDs and probabilities, and handles recognized commands. The model is cleaned after each detection or timeout. Ensure `audio_buffer` is filled with audio data matching `mn_chunksize`.

```c
// Process audio after wake word detection
int16_t *audio_buffer = (int16_t *)malloc(mn_chunksize * sizeof(int16_t));
while (1) {
    // Get audio from AFE fetch
    // audio_buffer should match mn_chunksize

    esp_mn_state_t mn_state = multinet->detect(mn_model, audio_buffer);

    if (mn_state == ESP_MN_STATE_DETECTING) {
        // Still listening...
    } else if (mn_state == ESP_MN_STATE_DETECTED) {
        // Command recognized
        esp_mn_results_t *results = multinet->get_results(mn_model);

        printf("Recognized: %s\n", results->string);
        printf("Top %d results:\n", results->num);

        for (int i = 0; i < results->num; i++) {
            int cmd_id = results->command_id[i];
            float prob = results->prob[i];
            char *cmd_str = esp_mn_commands_get_string(cmd_id);
            printf("  [%d] %s (prob: %.2f)\n", cmd_id, cmd_str, prob);
        }

        // Use top result
        int best_command = results->command_id[0];
        handle_command(best_command);

        // Reset for next recognition
        multinet->clean(mn_model);

    } else if (mn_state == ESP_MN_STATE_TIMEOUT) {
        // No command detected within timeout
        printf("Recognition timeout\n");
        multinet->clean(mn_model);
        break;  // Wait for next wake word
    }
}
```

--------------------------------

### Run Multinet G2P Script

Source: https://github.com/espressif/esp-sr/blob/master/tool/README.md

Execute the multinet_g2p.py script with a string of commands separated by commas and semicolons. The script converts the input text into a phoneme representation.

```python
python multinet_g2p.py -t "hello world,hi ESP;turn on the light;turn off the light"
```

```text
------
in: hello world,hi ESP;turn on the light;turn off the light
out: hcLb WkLD,hi fST;TkN nN jc LiT;TkN eF jc LiT;
```

--------------------------------

### MultiNet6 Command Configuration Format

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/speech_command_recognition/README.rst

Format for defining English speech commands using command IDs and graphemes.

```text
# command_id,command_grapheme
1,TELL ME A JOKE
2,MAKE A COFFEE
```

--------------------------------

### Allocate Model Partition in CSV

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/flash_model/README.rst

Add this line to your partitions.csv file to reserve space for speech recognition models.

```csv
model,  data,        ,         ,    6000K
```

--------------------------------

### MultiNet7 Command Configuration Format

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/speech_command_recognition/README.rst

Format for defining English speech commands using command IDs, graphemes, and phonemes.

```text
# command_id,command_grapheme,command_phoneme
1,tell me a joke,TfL Mm c qbK
2,sing a song,Sgl c Sel
```

--------------------------------

### Process VAD Cache and Fetch Results

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/vadnet/README.rst

Fetch audio data using the AFE handle. Check `vad_cache_size` to determine if there is cached VAD data to be written, which helps prevent truncation of the first word. Also, print the current VAD state, which can be either 'noise' or 'speech'.

```c
afe_fetch_result_t* result = afe_handle->fetch(afe_data); 
if (result->vad_cache_size > 0) {
    printf("vad cache size: %d\n", result->vad_cache_size);
    fwrite(result->vad_cache, 1, result->vad_cache_size, fp);
}

printf("vad state: %s\n", res->vad_state==VAD_SILENCE ? "noise" : "speech");
```

--------------------------------

### Print Active Speech Commands

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/speech_command_recognition/README.rst

Prints all currently active speech commands.

```c
/**
* @brief Print all commands in linked list.
*/
void esp_mn_active_commands_print(void);
```

--------------------------------

### Feed Audio Data to AFE

Source: https://github.com/espressif/esp-sr/blob/master/docs/en/audio_front_end/README.rst

Allocate a buffer for audio data based on the AFE's required chunk size and channel count.

```c
int feed_chunksize = afe_handle->get_feed_chunksize(afe_data);
int feed_nch = afe_handle->get_feed_channel_num(afe_data);
int16_t *feed_buff = (int16_t *) malloc(feed_chunksize * feed_nch * sizeof(int16_t));
```