### Install wasmtime CLI

Source: https://github.com/linebender/fearless_simd/blob/main/fearless_simd_tests/README.md

Install the wasmtime command-line interface for running WebAssembly binaries. This is a prerequisite for executing the WebAssembly tests.

```sh
cargo install --locked wasmtime-cli
```

--------------------------------

### SimdInto Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Demonstrates how to use the .simd_into() method for converting scalar values and arrays into SIMD vectors.

```rust
dispatch!(level, simd => {
    let v: f32x4<_> = 1.0.simd_into(simd);
    let v: f32x4<_> = [1.0, 2.0, 3.0, 4.0].simd_into(simd);
});
```

--------------------------------

### u8x16 Initialization Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Shows how to initialize u8x16 vectors using splat and from_slice.

```rust
dispatch!(level, simd => {
    let ascii = u8x16::splat(simd, 65);  // ASCII 'A'
    let bytes = u8x16::from_slice(simd, b"Hello, World!");
});
```

--------------------------------

### Install wasmi CLI with SIMD support

Source: https://github.com/linebender/fearless_simd/blob/main/fearless_simd_tests/README.md

Install the wasmi command-line interface with SIMD support enabled. This is an alternative runtime for executing WebAssembly tests.

```sh
cargo install --locked --features simd wasmi_cli
```

--------------------------------

### WebAssembly SIMD Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md

Illustrates using the `kernel!` macro for WebAssembly SIMD128 intrinsics, demonstrating 32-bit float vector addition.

```rust
use fearless_simd::kernel;

#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
use core::arch::wasm32::{v128, f32x4_add};

kernel!(
    fn add_f32x4_wasm(wasm: WasmSimd128, a: v128, b: v128) -> v128 {
        f32x4_add(a, b)
    }
);
```

--------------------------------

### SimdMask Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Demonstrates how to use SimdMask methods like `simd_lt`, `any`, and `all` to perform conditional operations on SIMD vectors.

```rust
dispatch!(level, simd => {
    let a: f32x4<_> = ...;
    let b: f32x4<_> = ...;
    
    let mask = a.simd_lt(b);
    if mask.any() {
        println!("At least one element of a < b");
    }
    if mask.all() {
        println!("All elements of a < b");
    }
});
```

--------------------------------

### ARM Neon Intrinsics Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md

Shows how to use the `kernel!` macro for ARM Neon intrinsics, specifically for 32-bit float vector addition.

```rust
use fearless_simd::kernel;

#[cfg(target_arch = "aarch64")]
use core::arch::aarch64::{float32x4_t, vaddq_f32};

kernel!(
    fn add_f32x4_intrinsic(neon: Neon, a: float32x4_t, b: float32x4_t) -> float32x4_t {
        vaddq_f32(a, b)
    }
);
```

--------------------------------

### x86_64 AVX2 Intrinsics Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md

Demonstrates how to use the `kernel!` macro to wrap x86_64 AVX2 intrinsic functions for 32-bit integer addition.

```rust
use fearless_simd::kernel;

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::{__m256i, _mm256_add_epi32};

kernel!(
    fn add_i32x8_intrinsic(avx2: Avx2, a: __m256i, b: __m256i) -> __m256i {
        _mm256_add_epi32(a, b)
    }
);
```

--------------------------------

### i8x16 Wrapping Arithmetic Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Demonstrates wrapping arithmetic for i8x16 vectors. Addition and multiplication wrap on overflow.

```rust
dispatch!(level, simd => {
    let a = i8x16::splat(simd, 100);
    let b = i8x16::splat(simd, 50);
    let sum = a + b;  // Wraps: [150 -> -106, 150 -> -106, ...]
});
```

--------------------------------

### Usage Example of WithSimd

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Demonstrates how to use the `WithSimd` trait with a closure. This example creates a `Level` and then dispatches a closure to it, returning a `u32` value.

```rust
let level = Level::new();
let result = level.dispatch(|_level| {
    42  // Returns u32
});
```

--------------------------------

### Recommended Library API Pattern with Fearless SIMD

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md

This example demonstrates a recommended pattern for libraries using Fearless SIMD, where the public API accepts a Level parameter to dispatch to appropriate SIMD implementations. It includes separate functions for AVX2, SSE4.2, NEON, and fallback.

```rust
use fearless_simd::{Level, Simd};

// Public API accepts Level parameter
pub fn process_data(level: Level, data: &mut [f32]) {
    match level {
        #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
        Level::Avx2(avx2) => process_avx2(avx2, data),
        #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
        Level::Sse4_2(sse42) => process_sse42(sse42, data),
        #[cfg(target_arch = "aarch64")]
        Level::Neon(neon) => process_neon(neon, data),
        Level::Fallback(fb) => process_fallback(fb, data),
        _ => {}
    }
}

#[inline(always)]
fn process_avx2<S: Simd>(simd: S, data: &mut [f32]) {
    // AVX2-specific implementation
}

#[inline(always)]
fn process_sse42<S: Simd>(simd: S, data: &mut [f32]) {
    // SSE4.2 implementation
}

#[inline(always)]
fn process_neon<S: Simd>(simd: S, data: &mut [f32]) {
    // NEON implementation
}

#[inline(always)]
fn process_fallback<S: Simd>(simd: S, data: &mut [f32]) {
    // Fallback implementation
}

// Application code
fn main() {
    let level = Level::new();
    let mut data = vec![1.0, 2.0, 3.0, 4.0];
    process_data(level, &mut data);
}
```

--------------------------------

### Array to Vector Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Shows how to convert a [f32; 4] array into an f32x4 SIMD vector using SimdInto (via macro) or directly with simd_from.

```rust
dispatch!(level, simd => {
    let v: f32x4<_> = [1.0, 2.0, 3.0, 4.0].simd_into(simd);
    let v: f32x4<_> = f32x4::simd_from(simd, [1.0, 2.0, 3.0, 4.0]);
});
```

--------------------------------

### Using Kernel-Generated Functions Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md

Demonstrates how to use a function generated by the `kernel!` macro in a `main` function, including obtaining a SIMD level token and converting between vector types and raw intrinsic types.

```rust
use fearless_simd::{i32x8, Level, Simd, dispatch, prelude::*};

kernel!(
    fn add_i32x8_intrinsic(avx2: Avx2, a: __m256i, b: __m256i) -> __m256i {
        _mm256_add_epi32(a, b)
    }
);

fn main() {
    let level = Level::new();
    if let Some(avx2) = level.as_avx2() {
        let a: i32x8<_> = [1, 2, 3, 4, 5, 6, 7, 8].simd_into(avx2);
        let b: i32x8<_> = [10, 20, 30, 40, 50, 60, 70, 80].simd_into(avx2);
        
        // Convert vector to raw intrinsic type
        let a_raw: __m256i = a.into();
        let b_raw: __m256i = b.into();
        
        // Call kernel function
        let sum_raw = add_i32x8_intrinsic(avx2, a_raw, b_raw);
        
        // Convert back to vector
        let sum: i32x8<_> = sum_raw.simd_into(avx2);
        
        assert_eq!(
            <[i32; 8]>::from(sum),
            [11, 22, 33, 44, 55, 66, 77, 88]
        );
    }
}
```

--------------------------------

### Scalar to Vector (Splat) Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Demonstrates how to convert a f32 scalar to an f32x4 SIMD vector using SimdInto (via macro) or directly with simd_from.

```rust
dispatch!(level, simd => {
    let v: f32x4<_> = 1.0.simd_into(simd);  // Via SimdInto
    let v: f32x4<_> = f32x4::simd_from(simd, 1.0);  // Direct
    // Both create [1.0, 1.0, 1.0, 1.0]
});
```

--------------------------------

### SimdSplit Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Shows how to split a `f32x8` vector into two `f32x4` vectors using the `split` method. The method returns a tuple containing the lower and higher halves of the original vector.

```rust
dispatch!(level, simd => {
    let wide: f32x8<_> = ...;
    let (lo, hi): (f32x4<_>, f32x4<_>) = wide.split();
});
```

--------------------------------

### SimdCombine Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Illustrates how to combine two `f32x4` vectors into a single `f32x8` vector using the `combine` method. This operation effectively doubles the width of the vector.

```rust
dispatch!(level, simd => {
    let a: f32x4<_> = ...;
    let b: f32x4<_> = ...;
    let combined: f32x8<_> = a.combine(b);  // Doubles width
});
```

--------------------------------

### SimdCvtTruncate Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Demonstrates how to use `truncate_from` and `truncate_from_precise` to convert a `f32x4` vector to `i32x4`. The `truncate_from` method has undefined behavior for out-of-range values, while `truncate_from_precise` provides safe, saturating behavior.

```rust
dispatch!(level, simd => {
    let f: f32x4<_> = ...;
    let i: i32x4<_> = i32x4::truncate_from(f);          // Undefined for OOB
    let i2: i32x4<_> = i32x4::truncate_from_precise(f); // Safe, saturating
});
```

--------------------------------

### SSE4.2 Intrinsic Access Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md

Demonstrates how to define a kernel function for SSE4.2 using `kernel!` macro for specific operations like `_mm_add_ps`. This requires the `std::arch::x86_64::_mm_add_ps` intrinsic and is conditional on the target architecture.

```rust
use fearless_simd::kernel;

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::_mm_add_ps;

kernel!(
    fn add_sse42(sse42: Sse4_2, a: __m128, b: __m128) -> __m128 {
        _mm_add_ps(a, b)
    }
);
```

--------------------------------

### SimdElement Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Illustrates how to use the SimdElement trait in generic function bounds to access the associated mask type.

```rust
// Example usage in trait bounds:
fn example<T: SimdElement>() {
    type MaskType = T::Mask;  // Get the corresponding mask type
}
```

--------------------------------

### Getting SIMD Token via Witness

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Shows how to retrieve the SIMD token from a vector using the `witness()` method. This is part of the `SimdBase` trait.

```rust
dispatch!(level, simd => {
    let v: f32x4<_> = f32x4::splat(simd, 1.0);
    let token = v.witness();  // Get back the Simd token
});
```

--------------------------------

### Rust SimdCvtFloat Trait Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Demonstrates converting an i32x4 integer SIMD vector to an f32x4 floating-point SIMD vector using the SimdCvtFloat trait. Ensure the appropriate dispatch macro is used.

```rust
dispatch!(level, simd => {
    let i: i32x4<_> = ...;
    let f: f32x4<_> = f32x4::float_from(i);
});
```

--------------------------------

### Rust Select Trait Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Demonstrates using the Select trait to return the minimum of two SIMD vectors based on a mask. Ensure the appropriate dispatch macro is used.

```rust
dispatch!(level, simd => {
    let mask: mask32x4<_> = a.simd_lt(b);
    let result = mask.select(a, b);  // Returns min(a, b)
});
```

--------------------------------

### Define sRGB to Linear RGB Conversion with AVX2 Intrinsics

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md

Defines a kernel function for sRGB to linear RGB conversion using AVX2 intrinsics. This example is simplified and focuses on the `kernel!` macro usage.

```rust
use fearless_simd::kernel;

// sRGB to linear RGB using intrinsics
kernel!(
    fn srgb_to_linear_avx2(
        avx2: Avx2,
        r: __m256,
        g: __m256,
        b: __m256
    ) -> (__m256, __m256, __m256) {
        // Use specific AVX2 instructions for exact behavior needed
        // (Example only; actual conversion more complex)
        (r, g, b)  // Simplified
    }
);
```

--------------------------------

### Rust Bytes Trait Usage Example

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Shows how to use the Bytes trait to convert a floating-point SIMD vector to bytes, bitcast it to unsigned integers, and reconstruct it from bytes. Ensure the appropriate dispatch macro is used.

```rust
dispatch!(level, simd => {
    let f: f32x4<_> = ...;
    
    // Convert to bytes
    let bytes: u8x16<_> = f.to_bytes();
    
    // Bitcast to unsigned integers
    let u: u32x4<_> = f.bitcast();
    
    // Reconstruct from bytes
    let f2: f32x4<_> = f32x4::from_bytes(bytes);
});
```

--------------------------------

### Import Prelude

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/09-api-index.md

Re-exports all public traits for ergonomic use, making them available as methods on vector types.

```rust
use fearless_simd::prelude::*;
```

--------------------------------

### Fearless SIMD Module Structure

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/01-overview.md

Illustrates the directory structure of the fearless_simd crate, showing the organization of its source files.

```text
fearless_simd/src/
├── lib.rs                    # Entry point, Level enum, re-exports
├── traits.rs                 # User-facing traits (SimdFrom, Select, Bytes, etc.)
├── macros.rs                 # dispatch! and kernel! macros
├── kernel_macros.rs          # kernel! implementation
├── support.rs                # Internal utilities
├── transmute.rs              # Bit-casting operations
└── generated/                # Auto-generated code
    ├── simd_trait.rs         # Simd trait definition
    ├── simd_types.rs         # Vector type definitions
    ├── ops.rs                # Operator overloads
    ├── fallback.rs           # Scalar implementation
    ├── sse4_2.rs             # SSE4.2 implementation
    ├── avx2.rs               # AVX2 implementation
    ├── neon.rs               # Neon implementation
    └── wasm.rs               # WebAssembly implementation
```

--------------------------------

### Dispatch SIMD Functions

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/README.md

Demonstrates how to use the `dispatch!` macro to call a function with the appropriate SIMD implementation based on the available hardware capabilities. Requires a `Level` context.

```rust
let level = Level::new();
dispatch!(level, simd => my_simd_function(simd, data));
```

--------------------------------

### Get Current SIMD Level

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/03-simd-trait.md

Retrieves the SIMD level token for the current SIMD implementation. This is useful for conditional logic or logging the active SIMD level.

```rust
use fearless_simd::{Level, Simd, dispatch};

dispatch!(Level::new(), simd => {
    let current_level = simd.level();
    println!("Running with level: {:?}", current_level);
});
```

--------------------------------

### Common Usage Pattern for Fearless SIMD

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/01-overview.md

Demonstrates how to define and use SIMD-aware functions with Fearless SIMD, including CPU capability detection and dispatch.

```rust
use fearless_simd::{Level, Simd, dispatch, prelude::*};

// Define SIMD-aware function
#[inline(always)]
fn my_simd_function<S: Simd>(simd: S, data: &mut [f32]) {
    // Use simd parameter to access SIMD operations
    // All operations must stay generic over S: Simd
}

// In application code
let level = Level::new();  // Detect CPU capabilities once
dispatch!(level, simd => {
    my_simd_function(simd, &mut data);
});
```

--------------------------------

### Basic SIMD Operation with Dispatch

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/09-api-index.md

Demonstrates basic SIMD operations using the dispatch macro for automatic selection of the appropriate SIMD level. Requires importing Level, Simd, dispatch, f32x4, and prelude traits.

```rust
use fearless_simd::{Level, Simd, dispatch, f32x4, prelude::*};

let level = Level::new();
dispatch!(level, simd => {
    let v: f32x4<_> = [1.0, 2.0, 3.0, 4.0].simd_into(simd);
    // Use v with prelude traits
});
```

--------------------------------

### Get Mutable Slice View of Vector Elements

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Use `as_mut_slice` to obtain a mutable slice view of the vector's elements. This allows modifying elements directly.

```rust
let slice: &mut [f32] = v.as_mut_slice();
slice[0] = 1.0;
```

--------------------------------

### Constructing Mask Vectors

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Demonstrates how to create mask vectors from comparisons, scalar booleans, and bitmasks. Ensure the appropriate SIMD context is available.

```rust
dispatch!(level, simd => {
    let a = f32x4::splat(simd, 1.0);
    let b = f32x4::splat(simd, 2.0);
    
    // Create from comparison
    let mask: mask32x4<_> = a.simd_lt(b);  // All lanes true (1.0 < 2.0)
    
    // Create from scalar boolean
    let all_true: mask8x16<_> = mask8x16::splat(simd, true);
    let all_false: mask8x16<_> = mask8x16::splat(simd, false);
    
    // Create from bitmask
    let mask_from_bits = mask8x16::from_bitmask(simd, 0xFF); // First 8 lanes true
});
```

--------------------------------

### Get Slice View of Vector Elements

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Use `as_slice` to obtain an immutable slice view of the vector's elements. This allows reading elements without copying.

```rust
let slice: &[f32] = v.as_slice();
```

--------------------------------

### WasmSimd128 Intrinsic Access

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md

Access WebAssembly SIMD128 intrinsics directly using `core::arch::wasm32`. This example demonstrates adding two `v128` vectors using `f32x4_add`.

```rust
use fearless_simd::kernel;

#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
use core::arch::wasm32::{v128, f32x4_add};

kernel!(
    fn add_wasm(wasm: WasmSimd128, a: v128, b: v128) -> v128 {
        f32x4_add(a, b)
    }
);
```

--------------------------------

### Accessing NEON SIMD

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md

Demonstrates how to check for and obtain a NEON SIMD implementation using the Level API. This is useful for conditionally using NEON features at runtime.

```rust
use fearless_simd::Level;

let level = Level::new();
if let Some(neon) = level.as_neon() {
    // Use NEON SIMD
}
```

--------------------------------

### Load-Process-Store Pattern

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md

Demonstrates a basic load-process-store pattern for SIMD operations. Ensure functions passed to `dispatch!` have `#[inline(always)]`.

```rust
#[inline(always)]
fn template<S: Simd>(simd: S, data: &mut [f32]) {
    for chunk in data.chunks_exact_mut(4) {
        // Load
        let mut vec: f32x4<_> = f32x4::from_slice(simd, chunk);
        
        // Process
        vec = vec * 2.0;
        
        // Store
        vec.store_slice(chunk);
    }
}
```

--------------------------------

### Slide Elements within f32x4 Vectors

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/03-simd-trait.md

Concatenates two f32x4 vectors and extracts N elements starting at a specified shift. Used for rotations and shifts across the full vector width.

```rust
fn slide_f32x4<const SHIFT: usize>(self, a: f32x4<Self>, b: f32x4<Self>) -> f32x4<Self>
```

--------------------------------

### Constructing and Using f64x2 Vectors

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Illustrates the creation of an `f64x2` vector using `splat` and its conversion to a standard Rust array. This requires a `Simd` token in scope.

```rust
dispatch!(level, simd => {
    let v = f64x2::splat(simd, 3.14159);
    let arr: [f64; 2] = v.as_array();  // Deref implements this automatically
});
```

--------------------------------

### Library Code with Generic SIMD Implementation

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/09-api-index.md

Illustrates how to structure library code to handle different SIMD levels generically. It uses a match statement to select the appropriate SIMD implementation and a generic function for the core logic.

```rust
use fearless_simd::{Level, Simd, SimdBase};

pub fn process(level: Level, data: &mut [f32]) {
    match level {
        Level::Avx2(avx2) => process_impl(avx2, data),
        Level::Sse4_2(sse42) => process_impl(sse42, data),
        // ...
    }
}

#[inline(always)]
fn process_impl<S: Simd>(simd: S, data: &mut [f32]) {
    // Generic implementation
}
```

--------------------------------

### Basic Usage of dispatch! Macro

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md

Demonstrates how to use the dispatch! macro to call a function with the appropriate SIMD level. The function being called should be marked with #[inline(always)].

```rust
use fearless_simd::{Level, Simd, dispatch, prelude::*};

#[inline(always)]
fn add_vectors<S: Simd>(simd: S, a: &[f32], b: &[f32], out: &mut [f32]) {
    for i in 0..a.len() {
        // Actual SIMD code here
        out[i] = a[i] + b[i];
    }
}

fn main() {
    let level = Level::new();
    let a = [1.0, 2.0, 3.0, 4.0];
    let b = [5.0, 6.0, 7.0, 8.0];
    let mut out = [0.0; 4];
    
    dispatch!(level, simd => add_vectors(simd, &a, &b, &mut out));
    
    assert_eq!(out, [6.0, 8.0, 10.0, 12.0]);
}
```

--------------------------------

### Run WebAssembly SIMD tests

Source: https://github.com/linebender/fearless_simd/blob/main/fearless_simd_tests/README.md

Execute the WebAssembly tests using a specified target and runtime. This command configures the Rust compiler flags to enable +simd128 and +relaxed-simd features for the WebAssembly target.

```sh
cargo test --target wasm32-wasip1 \
    --config 'target.wasm32-wasip1.rustflags = "-Ctarget-feature=+simd128,+relaxed-simd"' \
    --config 'target.wasm32-wasip1.rustdocflags = "-Ctarget-feature=+simd128,+relaxed-simd"' \
    --config 'target.wasm32-wasip1.runner = "wasmtime"' # or "wasmi_cli" if you installed that
```

--------------------------------

### Simd Trait Definition

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/03-simd-trait.md

The Simd trait defines associated types for native-width vectors of different scalar types (f32, f64, u8, i8, u16, i16, u32, i32) and mask types. It also includes methods like `level` to get the SIMD level and `vectorize` to execute a closure with SIMD enabled.

```APIDOC
## Trait: Simd

### Description
Defines core SIMD operations and associated types for native-width vectors.

### Associated Types
- `f32s`: Native-width vector of f32 elements.
- `f64s`: Native-width vector of f64 elements.
- `u8s`: Native-width vector of u8 elements.
- `i8s`: Native-width vector of i8 elements.
- `u16s`: Native-width vector of u16 elements.
- `i16s`: Native-width vector of i16 elements.
- `u32s`: Native-width vector of u32 elements.
- `i32s`: Native-width vector of i32 elements.
- `mask8s`: Mask vector for i8 elements.
- `mask16s`: Mask vector for i16 elements.
- `mask32s`: Mask vector for i32 elements.
- `mask64s`: Mask vector for i64 elements.

### Methods
- **`level(self) -> Level`**: Returns the SIMD level of the current token.
- **`vectorize<F: FnOnce() -> R, R>(self, f: F) -> R`**: Executes a closure `f` with SIMD enabled.
```

--------------------------------

### Construct Fallback SIMD Level

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md

Demonstrates how to obtain the Fallback SIMD level. This can be done through runtime detection or explicit selection, the latter requiring a feature flag.

```rust
use fearless_simd::Level;

// Runtime detection fallback
let level = Level::new();
if level.is_fallback() {
    println!("No SIMD support on this CPU");
}

// Explicit fallback (requires feature)
#[cfg(feature = "force_support_fallback")]
let level = Level::fallback();
```

--------------------------------

### Create f32x4 Vectors

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/README.md

Demonstrates various methods for creating f32x4 vector types, including splatting, from arrays, slices, and functions. Requires a SIMD context.

```rust
let v1: f32x4<_> = f32x4::splat(simd, 1.0);           // [1.0, 1.0, 1.0, 1.0]
let v2: f32x4<_> = [1.0, 2.0, 3.0, 4.0].simd_into(simd);  // From array
let v3: f32x4<_> = f32x4::from_slice(simd, slice);    // From slice
let v4: f32x4<_> = f32x4::from_fn(simd, |i| i as f32); // From function
```

--------------------------------

### Level::as_sse4_2

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md

Provides access to the SSE4.2 token on x86/x86_64 architectures.

```APIDOC
## Level::as_sse4_2

### Description
Provides access to SSE4.2 token. Returns token even if AVX2 is available (hierarchical compatibility).

### Method
`as_sse4_2(self) -> Option<Sse4_2>`

### Availability
`#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]`

### CPU Features Enabled
- sse4.2, cmpxchg16b, popcnt (minimum)
- Other sse4.1, sse3, ssse3, sse2, sse, avx features implied

### Usage Example
```rust
if let Some(sse42) = level.as_sse4_2() {
    // Can safely use SSE4.2 intrinsics
}
```
```

--------------------------------

### SIMD Vector Comparisons and Selection

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/README.md

Shows how to perform element-wise comparisons between SIMD vectors, generate masks, and use masks for conditional selection or checking lane states. Supports less than, and checking if any or all lanes satisfy a condition.

```rust
let mask = a.simd_lt(b);      // Less than (returns mask)
let result = mask.select(a, b); // Conditional selection
if mask.any() { }             // Check if any lane is true
if mask.all() { }             // Check if all lanes are true
```

--------------------------------

### SimdInto Blanket Implementation

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md

Provides a blanket implementation for SimdInto, allowing types that implement SimdFrom to be converted using .simd_into().

```rust
impl<F, T: SimdFrom<F, S>, S: Simd> SimdInto<T, S> for F {
    fn simd_into(self, simd: S) -> T {
        SimdFrom::simd_from(simd, self)
    }
}
```

--------------------------------

### Create Vector using Trait-Based Constructor

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

The `simd_from` constructor supports creating vectors from arrays and scalars, offering a flexible way to initialize vector types.

```rust
let v1: f32x4<_> = f32x4::simd_from(simd, 1.0);
let v2: f32x4<_> = f32x4::simd_from(simd, [1.0, 2.0, 3.0, 4.0]);
```

--------------------------------

### dispatch! Macro with Return Value

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md

Shows how to use the dispatch! macro when the operation returns a value. The SIMD token can be ignored using `_simd` if not needed.

```rust
use fearless_simd::{Level, dispatch};

let level = Level::new();
let result = dispatch!(level, _simd => {
    // Perform computation
    1 + 2 + 3
});
assert_eq!(result, 6);
```

--------------------------------

### Splitting and Combining AVX2 Vectors

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md

Demonstrates how to split a wide 256-bit AVX2 vector into two 128-bit SSE-width vectors (lo and hi) and then combine them back. Useful for processing wider vectors with SSE-compatible logic.

```rust
use fearless_simd::{f32x8, f32x4, Level, dispatch, prelude::*};

dispatch!(level, simd => {
    let wide: f32x8<_> = ...;
    let (lo, hi): (f32x4<_>, f32x4<_>) = wide.split();
    // Process lo and hi independently
    let combined: f32x8<_> = lo.combine(hi);
});
```

--------------------------------

### Defining NEON Intrinsics with kernel!

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md

Shows how to define a function that uses ARM NEON intrinsics via the `kernel!` macro. This allows for direct access to NEON operations when targeting aarch64.

```rust
use fearless_simd::kernel;

#[cfg(target_arch = "aarch64")]
use core::arch::aarch64::vaddq_f32;

kernel!(
    fn add_neon(neon: Neon, a: float32x4_t, b: float32x4_t) -> float32x4_t {
        vaddq_f32(a, b)
    }
);
```

--------------------------------

### Create Vector by Calling a Function for Each Lane

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Use `from_fn` to create a vector by applying a closure to each lane's index. This allows for dynamic initialization based on lane position.

```rust
let v: i32x4<_> = i32x4::from_fn(simd, |i| (i * 2) as i32);  // [0, 2, 4, 6]
```

--------------------------------

### Level::new()

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md

Detects available CPU features at runtime and returns the best available SIMD level. This method requires the `std` feature or a `wasm32` target. It is recommended to call this once and reuse the result for efficiency.

```APIDOC
## Level::new()

### Description
Detects available CPU features at runtime and returns the best available SIMD level.

### Method
`Level::new() -> Self`

### Availability
Requires `std` feature or `wasm32` target

### Returns
`Level` - The highest SIMD level available on the current CPU

### Panics
No panics; returns fallback if detection fails

### Usage Example
```rust
use fearless_simd::Level;

let level = Level::new();
// level is now one of: Fallback, Neon, WasmSimd128, Sse4_2, or Avx2
// depending on CPU capabilities and compilation target
```

### Notes
- Should be called once per application and stored/passed around
- Repeating this call is inefficient as it re-detects capabilities each time
- Applications should prefer creating `Level` once and passing it as needed
```

--------------------------------

### Detect CPU SIMD Level and Use Conditionally

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md

This snippet shows how to detect the available SIMD level on the CPU and conditionally execute code based on it. It includes checks for AVX2, SSE4.2, NEON, and a fallback.

```rust
use fearless_simd::Level;

fn main() {
    let level = Level::new();
    
    println!("CPU SIMD Level: {:?}", level);
    
    match level {
        #[cfg(target_arch = "x86_64")]
        Level::Avx2(_) => println!("Using AVX2 - 256-bit vectors"),
        #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
        Level::Sse4_2(_) => println!("Using SSE4.2 - 128-bit vectors"),
        #[cfg(target_arch = "aarch64")]
        Level::Neon(_) => println!("Using ARM NEON - 128-bit vectors"),
        Level::Fallback(_) => println!("Using scalar fallback"),
        _ => {}
    }
    
    // Conditional processing
    if !level.is_fallback() {
        println!("SIMD acceleration available!");
    }
}
```

--------------------------------

### Performing Operations on Mask Vectors

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Illustrates common logical operations (AND, OR, NOT) and tests (any, all) on mask vectors. Requires a SIMD context and pre-defined masks.

```rust
dispatch!(level, simd => {
    let mask_a: mask32x4<_> = a.simd_lt(b);
    let mask_b: mask32x4<_> = b.simd_lt(c);
    
    let combined = mask_a & mask_b;     // Logical AND
    let either = mask_a | mask_b;       // Logical OR
    let negated = !mask_a;              // Logical NOT
    
    // Test masks
    if mask_a.any() {
        println!("At least one lane is true");
    }
    if mask_a.all() {
        println!("All lanes are true");
    }
});
```

--------------------------------

### Performing Arithmetic Operations on f32x4 Vectors

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md

Shows element-wise arithmetic operations like addition, multiplication, division, and square root on `f32x4` vectors. Ensure the `prelude` is imported and a `Simd` token is available.

```rust
use fearless_simd::prelude::*;

dispatch!(level, simd => {
    let a = f32x4::splat(simd, 2.0);
    let b = f32x4::splat(simd, 3.0);
    let sum = a + b;                    // Element-wise: [5.0, 5.0, 5.0, 5.0]
    let prod = a * b;                   // [6.0, 6.0, 6.0, 6.0]
    let div = b / a;                    // [1.5, 1.5, 1.5, 1.5]
    let sq = a.sqrt();                  // [1.414..., 1.414..., 1.414..., 1.414...]
});
```

--------------------------------

### Custom Kernel with AVX2 Intrinsics

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md

Demonstrates how to define a custom kernel using AVX2 intrinsics for bitwise AND operations on 256-bit integers. This approach is specific to x86_64 architectures.

```rust
use fearless_simd::{Level, Simd, dispatch, kernel, prelude::*};

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::{__m256i, _mm256_and_si256, _mm256_set1_epi32};

kernel!(
    fn mask_bits_avx2(avx2: Avx2, v: __m256i) -> __m256i {
        let mask = _mm256_set1_epi32(0xFF);
        _mm256_and_si256(v, mask)
    }
);

fn main() {
    let level = Level::new();
    
    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    {
        if let Some(avx2) = level.as_avx2() {
            let v: i32x8<_> = [255, 256, 257, 258, 259, 260, 261, 262].simd_into(avx2);
            let masked_raw = mask_bits_avx2(avx2, v.into());
            let masked: i32x8<_> = masked_raw.simd_into(avx2);
            
            assert_eq!(
                <[i32; 8]>::from(masked),
                [255, 0, 1, 2, 3, 4, 5, 6]
            );
        }
    }
}
```

--------------------------------

### Conceptual Expansion of dispatch! Macro

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md

This simplified Rust code conceptually shows how the dispatch! macro expands to handle different SIMD levels based on target architecture and features.

```rust
// What the macro expands to (conceptually):
{
    match level {
        #[cfg(target_arch = "aarch64")]
        Level::Neon(neon) => {
            let simd = neon;
            neon.vectorize(|| operation)
        },
        #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
        Level::Avx2(avx2) => {
            let simd = avx2;
            avx2.vectorize(|| operation)
        },
        // ... other levels
    }
}
```

--------------------------------

### dispatch!(level, simd => operation)

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/09-api-index.md

Invokes a SIMD operation with the best available SIMD level. This macro abstracts away the complexity of selecting the appropriate SIMD instruction set for the target platform.

```APIDOC
## dispatch!(level, simd => operation)

### Description
Invokes operation with best available SIMD level.

### Returns
Value of operation expression

### Platforms
All (uses appropriate conditional code paths)
```

--------------------------------

### Accumulation Pattern

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md

Shows how to accumulate values from SIMD vectors. Cache `Level::new()` for efficiency.

```rust
#[inline(always)]
fn sum<S: Simd>(simd: S, data: &[f32]) -> f32 {
    let mut acc = 0.0;
    for chunk in data.chunks_exact(4) {
        let vec: f32x4<_> = f32x4::from_slice(simd, chunk);
        let arr: [f32; 4] = vec.into();
        acc += arr.iter().sum::<f32>();
    }
    acc
}
```

--------------------------------

### Using dispatch! from a Generic Function

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md

Demonstrates how to call the dispatch! macro from within a generic function. The SIMD token is passed to the generic function which contains the actual SIMD logic.

```rust
use fearless_simd::{Level, Simd, dispatch};

fn wrapper(level: Level) {
    dispatch!(level, simd => generic_function(simd));
}

#[inline(always)]
fn generic_function<S: Simd>(simd: S) {
    // SIMD code here
}
```

--------------------------------

### Access SSE4.2 SIMD Token (x86/x86_64)

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md

Provides access to the SSE4.2 token on x86 and x86_64 architectures. This is useful for leveraging SSE4.2 intrinsics, even when higher-level features like AVX2 are available.

```rust
if let Some(sse42) = level.as_sse4_2() {
    // Can safely use SSE4.2 intrinsics
}
```

--------------------------------

### Level::as_avx2

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md

Provides access to the AVX2 token on x86/x86_64 architectures if the x86-64-v3 feature set is available.

```APIDOC
## Level::as_avx2

### Description
Provides access to AVX2 token. Returns token if x86-64-v3 feature set is available.

### Method
`as_avx2(self) -> Option<Avx2>`

### Availability
`#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]`

### CPU Features Enabled
- avx2, bmi1, bmi2, cmpxchg16b, f16c, fma, lzcnt, movbe, popcnt, xsave
- All lower levels (sse4.2, etc.) are implied

### Usage Example
```rust
if let Some(avx2) = level.as_avx2() {
    // Can use full AVX2 capabilities
    let result = fearless_simd::dispatch!(level, simd => function(simd));
}
```
```

--------------------------------

### Level::as_neon

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md

Provides access to the Neon SIMD token on aarch64 architectures if available.

```APIDOC
## Level::as_neon

### Description
Provides access to Neon SIMD token if available. Returns the token even if a higher level (not yet supported) is available.

### Method
`as_neon(self) -> Option<Neon>`

### Availability
`#[cfg(target_arch = "aarch64")]`

### Usage Example
```rust
#[cfg(target_arch = "aarch64")]
let level = Level::new();
if let Some(neon) = level.as_neon() {
    // Use Neon intrinsics with kernel! macro
    kernel!(/* ... */);
}
```
```

--------------------------------

### Vector Addition with Fearless SIMD

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md

Performs element-wise addition of two float arrays using SIMD for performance. Ensure input arrays are of compatible lengths.

```rust
use fearless_simd::{Level, Simd, dispatch, f32x4, prelude::*};

#[inline(always)]
fn add_arrays<S: Simd>(simd: S, a: &[f32], b: &[f32]) -> Vec<f32> {
    let mut result = Vec::with_capacity(a.len());
    
    for i in (0..a.len()).step_by(4) {
        let chunk_size = (a.len() - i).min(4);
        let a_chunk = &a[i..i + chunk_size];
        let b_chunk = &b[i..i + chunk_size];
        
        let a_vec: f32x4<_> = f32x4::from_slice(simd, a_chunk);
        let b_vec: f32x4<_> = f32x4::from_slice(simd, b_chunk);
        let sum = a_vec + b_vec;
        
        for j in 0..chunk_size {
            result.push(sum[j]);
        }
    }
    result
}

fn main() {
    let level = Level::new();
    let a = vec![1.0, 2.0, 3.0, 4.0, 5.0];
    let b = vec![10.0, 20.0, 30.0, 40.0, 50.0];
    
    let result = dispatch!(level, simd => add_arrays(simd, &a, &b));
    assert_eq!(result, vec![11.0, 22.0, 33.0, 44.0, 55.0]);
}
```

--------------------------------

### SIMD Vector Arithmetic Operations

Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/README.md

Illustrates common element-wise arithmetic operations on SIMD vectors, including addition, multiplication, division, negation, square root, and fused multiply-add. Assumes vectors 'a' and 'b' are of compatible types.

```rust
let sum = a + b;              // Element-wise addition
let prod = a * b;             // Element-wise multiplication
let quotient = a / b;         // Element-wise division
let neg = -a;                 // Negation
let sqrt = a.sqrt();          // Square root
let fused = a.mul_add(b, c);  // (a * b) + c
```