### Install wasmtime CLI Source: https://github.com/linebender/fearless_simd/blob/main/fearless_simd_tests/README.md Install the wasmtime command-line interface for running WebAssembly binaries. This is a prerequisite for executing the WebAssembly tests. ```sh cargo install --locked wasmtime-cli ``` -------------------------------- ### SimdInto Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Demonstrates how to use the .simd_into() method for converting scalar values and arrays into SIMD vectors. ```rust dispatch!(level, simd => { let v: f32x4<_> = 1.0.simd_into(simd); let v: f32x4<_> = [1.0, 2.0, 3.0, 4.0].simd_into(simd); }); ``` -------------------------------- ### u8x16 Initialization Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Shows how to initialize u8x16 vectors using splat and from_slice. ```rust dispatch!(level, simd => { let ascii = u8x16::splat(simd, 65); // ASCII 'A' let bytes = u8x16::from_slice(simd, b"Hello, World!"); }); ``` -------------------------------- ### Install wasmi CLI with SIMD support Source: https://github.com/linebender/fearless_simd/blob/main/fearless_simd_tests/README.md Install the wasmi command-line interface with SIMD support enabled. This is an alternative runtime for executing WebAssembly tests. ```sh cargo install --locked --features simd wasmi_cli ``` -------------------------------- ### WebAssembly SIMD Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md Illustrates using the `kernel!` macro for WebAssembly SIMD128 intrinsics, demonstrating 32-bit float vector addition. ```rust use fearless_simd::kernel; #[cfg(all(target_arch = "wasm32", target_feature = "simd128"))] use core::arch::wasm32::{v128, f32x4_add}; kernel!( fn add_f32x4_wasm(wasm: WasmSimd128, a: v128, b: v128) -> v128 { f32x4_add(a, b) } ); ``` -------------------------------- ### SimdMask Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Demonstrates how to use SimdMask methods like `simd_lt`, `any`, and `all` to perform conditional operations on SIMD vectors. ```rust dispatch!(level, simd => { let a: f32x4<_> = ...; let b: f32x4<_> = ...; let mask = a.simd_lt(b); if mask.any() { println!("At least one element of a < b"); } if mask.all() { println!("All elements of a < b"); } }); ``` -------------------------------- ### ARM Neon Intrinsics Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md Shows how to use the `kernel!` macro for ARM Neon intrinsics, specifically for 32-bit float vector addition. ```rust use fearless_simd::kernel; #[cfg(target_arch = "aarch64")] use core::arch::aarch64::{float32x4_t, vaddq_f32}; kernel!( fn add_f32x4_intrinsic(neon: Neon, a: float32x4_t, b: float32x4_t) -> float32x4_t { vaddq_f32(a, b) } ); ``` -------------------------------- ### x86_64 AVX2 Intrinsics Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md Demonstrates how to use the `kernel!` macro to wrap x86_64 AVX2 intrinsic functions for 32-bit integer addition. ```rust use fearless_simd::kernel; #[cfg(target_arch = "x86_64")] use std::arch::x86_64::{__m256i, _mm256_add_epi32}; kernel!( fn add_i32x8_intrinsic(avx2: Avx2, a: __m256i, b: __m256i) -> __m256i { _mm256_add_epi32(a, b) } ); ``` -------------------------------- ### i8x16 Wrapping Arithmetic Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Demonstrates wrapping arithmetic for i8x16 vectors. Addition and multiplication wrap on overflow. ```rust dispatch!(level, simd => { let a = i8x16::splat(simd, 100); let b = i8x16::splat(simd, 50); let sum = a + b; // Wraps: [150 -> -106, 150 -> -106, ...] }); ``` -------------------------------- ### Usage Example of WithSimd Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Demonstrates how to use the `WithSimd` trait with a closure. This example creates a `Level` and then dispatches a closure to it, returning a `u32` value. ```rust let level = Level::new(); let result = level.dispatch(|_level| { 42 // Returns u32 }); ``` -------------------------------- ### Recommended Library API Pattern with Fearless SIMD Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md This example demonstrates a recommended pattern for libraries using Fearless SIMD, where the public API accepts a Level parameter to dispatch to appropriate SIMD implementations. It includes separate functions for AVX2, SSE4.2, NEON, and fallback. ```rust use fearless_simd::{Level, Simd}; // Public API accepts Level parameter pub fn process_data(level: Level, data: &mut [f32]) { match level { #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] Level::Avx2(avx2) => process_avx2(avx2, data), #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] Level::Sse4_2(sse42) => process_sse42(sse42, data), #[cfg(target_arch = "aarch64")] Level::Neon(neon) => process_neon(neon, data), Level::Fallback(fb) => process_fallback(fb, data), _ => {} } } #[inline(always)] fn process_avx2(simd: S, data: &mut [f32]) { // AVX2-specific implementation } #[inline(always)] fn process_sse42(simd: S, data: &mut [f32]) { // SSE4.2 implementation } #[inline(always)] fn process_neon(simd: S, data: &mut [f32]) { // NEON implementation } #[inline(always)] fn process_fallback(simd: S, data: &mut [f32]) { // Fallback implementation } // Application code fn main() { let level = Level::new(); let mut data = vec![1.0, 2.0, 3.0, 4.0]; process_data(level, &mut data); } ``` -------------------------------- ### Array to Vector Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Shows how to convert a [f32; 4] array into an f32x4 SIMD vector using SimdInto (via macro) or directly with simd_from. ```rust dispatch!(level, simd => { let v: f32x4<_> = [1.0, 2.0, 3.0, 4.0].simd_into(simd); let v: f32x4<_> = f32x4::simd_from(simd, [1.0, 2.0, 3.0, 4.0]); }); ``` -------------------------------- ### Using Kernel-Generated Functions Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md Demonstrates how to use a function generated by the `kernel!` macro in a `main` function, including obtaining a SIMD level token and converting between vector types and raw intrinsic types. ```rust use fearless_simd::{i32x8, Level, Simd, dispatch, prelude::*}; kernel!( fn add_i32x8_intrinsic(avx2: Avx2, a: __m256i, b: __m256i) -> __m256i { _mm256_add_epi32(a, b) } ); fn main() { let level = Level::new(); if let Some(avx2) = level.as_avx2() { let a: i32x8<_> = [1, 2, 3, 4, 5, 6, 7, 8].simd_into(avx2); let b: i32x8<_> = [10, 20, 30, 40, 50, 60, 70, 80].simd_into(avx2); // Convert vector to raw intrinsic type let a_raw: __m256i = a.into(); let b_raw: __m256i = b.into(); // Call kernel function let sum_raw = add_i32x8_intrinsic(avx2, a_raw, b_raw); // Convert back to vector let sum: i32x8<_> = sum_raw.simd_into(avx2); assert_eq!( <[i32; 8]>::from(sum), [11, 22, 33, 44, 55, 66, 77, 88] ); } } ``` -------------------------------- ### Scalar to Vector (Splat) Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Demonstrates how to convert a f32 scalar to an f32x4 SIMD vector using SimdInto (via macro) or directly with simd_from. ```rust dispatch!(level, simd => { let v: f32x4<_> = 1.0.simd_into(simd); // Via SimdInto let v: f32x4<_> = f32x4::simd_from(simd, 1.0); // Direct // Both create [1.0, 1.0, 1.0, 1.0] }); ``` -------------------------------- ### SimdSplit Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Shows how to split a `f32x8` vector into two `f32x4` vectors using the `split` method. The method returns a tuple containing the lower and higher halves of the original vector. ```rust dispatch!(level, simd => { let wide: f32x8<_> = ...; let (lo, hi): (f32x4<_>, f32x4<_>) = wide.split(); }); ``` -------------------------------- ### SimdCombine Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Illustrates how to combine two `f32x4` vectors into a single `f32x8` vector using the `combine` method. This operation effectively doubles the width of the vector. ```rust dispatch!(level, simd => { let a: f32x4<_> = ...; let b: f32x4<_> = ...; let combined: f32x8<_> = a.combine(b); // Doubles width }); ``` -------------------------------- ### SimdCvtTruncate Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Demonstrates how to use `truncate_from` and `truncate_from_precise` to convert a `f32x4` vector to `i32x4`. The `truncate_from` method has undefined behavior for out-of-range values, while `truncate_from_precise` provides safe, saturating behavior. ```rust dispatch!(level, simd => { let f: f32x4<_> = ...; let i: i32x4<_> = i32x4::truncate_from(f); // Undefined for OOB let i2: i32x4<_> = i32x4::truncate_from_precise(f); // Safe, saturating }); ``` -------------------------------- ### SSE4.2 Intrinsic Access Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md Demonstrates how to define a kernel function for SSE4.2 using `kernel!` macro for specific operations like `_mm_add_ps`. This requires the `std::arch::x86_64::_mm_add_ps` intrinsic and is conditional on the target architecture. ```rust use fearless_simd::kernel; #[cfg(target_arch = "x86_64")] use std::arch::x86_64::_mm_add_ps; kernel!( fn add_sse42(sse42: Sse4_2, a: __m128, b: __m128) -> __m128 { _mm_add_ps(a, b) } ); ``` -------------------------------- ### SimdElement Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Illustrates how to use the SimdElement trait in generic function bounds to access the associated mask type. ```rust // Example usage in trait bounds: fn example() { type MaskType = T::Mask; // Get the corresponding mask type } ``` -------------------------------- ### Getting SIMD Token via Witness Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Shows how to retrieve the SIMD token from a vector using the `witness()` method. This is part of the `SimdBase` trait. ```rust dispatch!(level, simd => { let v: f32x4<_> = f32x4::splat(simd, 1.0); let token = v.witness(); // Get back the Simd token }); ``` -------------------------------- ### Rust SimdCvtFloat Trait Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Demonstrates converting an i32x4 integer SIMD vector to an f32x4 floating-point SIMD vector using the SimdCvtFloat trait. Ensure the appropriate dispatch macro is used. ```rust dispatch!(level, simd => { let i: i32x4<_> = ...; let f: f32x4<_> = f32x4::float_from(i); }); ``` -------------------------------- ### Rust Select Trait Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Demonstrates using the Select trait to return the minimum of two SIMD vectors based on a mask. Ensure the appropriate dispatch macro is used. ```rust dispatch!(level, simd => { let mask: mask32x4<_> = a.simd_lt(b); let result = mask.select(a, b); // Returns min(a, b) }); ``` -------------------------------- ### Define sRGB to Linear RGB Conversion with AVX2 Intrinsics Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md Defines a kernel function for sRGB to linear RGB conversion using AVX2 intrinsics. This example is simplified and focuses on the `kernel!` macro usage. ```rust use fearless_simd::kernel; // sRGB to linear RGB using intrinsics kernel!( fn srgb_to_linear_avx2( avx2: Avx2, r: __m256, g: __m256, b: __m256 ) -> (__m256, __m256, __m256) { // Use specific AVX2 instructions for exact behavior needed // (Example only; actual conversion more complex) (r, g, b) // Simplified } ); ``` -------------------------------- ### Rust Bytes Trait Usage Example Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Shows how to use the Bytes trait to convert a floating-point SIMD vector to bytes, bitcast it to unsigned integers, and reconstruct it from bytes. Ensure the appropriate dispatch macro is used. ```rust dispatch!(level, simd => { let f: f32x4<_> = ...; // Convert to bytes let bytes: u8x16<_> = f.to_bytes(); // Bitcast to unsigned integers let u: u32x4<_> = f.bitcast(); // Reconstruct from bytes let f2: f32x4<_> = f32x4::from_bytes(bytes); }); ``` -------------------------------- ### Import Prelude Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/09-api-index.md Re-exports all public traits for ergonomic use, making them available as methods on vector types. ```rust use fearless_simd::prelude::*; ``` -------------------------------- ### Fearless SIMD Module Structure Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/01-overview.md Illustrates the directory structure of the fearless_simd crate, showing the organization of its source files. ```text fearless_simd/src/ ├── lib.rs # Entry point, Level enum, re-exports ├── traits.rs # User-facing traits (SimdFrom, Select, Bytes, etc.) ├── macros.rs # dispatch! and kernel! macros ├── kernel_macros.rs # kernel! implementation ├── support.rs # Internal utilities ├── transmute.rs # Bit-casting operations └── generated/ # Auto-generated code ├── simd_trait.rs # Simd trait definition ├── simd_types.rs # Vector type definitions ├── ops.rs # Operator overloads ├── fallback.rs # Scalar implementation ├── sse4_2.rs # SSE4.2 implementation ├── avx2.rs # AVX2 implementation ├── neon.rs # Neon implementation └── wasm.rs # WebAssembly implementation ``` -------------------------------- ### Dispatch SIMD Functions Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/README.md Demonstrates how to use the `dispatch!` macro to call a function with the appropriate SIMD implementation based on the available hardware capabilities. Requires a `Level` context. ```rust let level = Level::new(); dispatch!(level, simd => my_simd_function(simd, data)); ``` -------------------------------- ### Get Current SIMD Level Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/03-simd-trait.md Retrieves the SIMD level token for the current SIMD implementation. This is useful for conditional logic or logging the active SIMD level. ```rust use fearless_simd::{Level, Simd, dispatch}; dispatch!(Level::new(), simd => { let current_level = simd.level(); println!("Running with level: {:?}", current_level); }); ``` -------------------------------- ### Common Usage Pattern for Fearless SIMD Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/01-overview.md Demonstrates how to define and use SIMD-aware functions with Fearless SIMD, including CPU capability detection and dispatch. ```rust use fearless_simd::{Level, Simd, dispatch, prelude::*}; // Define SIMD-aware function #[inline(always)] fn my_simd_function(simd: S, data: &mut [f32]) { // Use simd parameter to access SIMD operations // All operations must stay generic over S: Simd } // In application code let level = Level::new(); // Detect CPU capabilities once dispatch!(level, simd => { my_simd_function(simd, &mut data); }); ``` -------------------------------- ### Basic SIMD Operation with Dispatch Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/09-api-index.md Demonstrates basic SIMD operations using the dispatch macro for automatic selection of the appropriate SIMD level. Requires importing Level, Simd, dispatch, f32x4, and prelude traits. ```rust use fearless_simd::{Level, Simd, dispatch, f32x4, prelude::*}; let level = Level::new(); dispatch!(level, simd => { let v: f32x4<_> = [1.0, 2.0, 3.0, 4.0].simd_into(simd); // Use v with prelude traits }); ``` -------------------------------- ### Get Mutable Slice View of Vector Elements Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Use `as_mut_slice` to obtain a mutable slice view of the vector's elements. This allows modifying elements directly. ```rust let slice: &mut [f32] = v.as_mut_slice(); slice[0] = 1.0; ``` -------------------------------- ### Constructing Mask Vectors Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Demonstrates how to create mask vectors from comparisons, scalar booleans, and bitmasks. Ensure the appropriate SIMD context is available. ```rust dispatch!(level, simd => { let a = f32x4::splat(simd, 1.0); let b = f32x4::splat(simd, 2.0); // Create from comparison let mask: mask32x4<_> = a.simd_lt(b); // All lanes true (1.0 < 2.0) // Create from scalar boolean let all_true: mask8x16<_> = mask8x16::splat(simd, true); let all_false: mask8x16<_> = mask8x16::splat(simd, false); // Create from bitmask let mask_from_bits = mask8x16::from_bitmask(simd, 0xFF); // First 8 lanes true }); ``` -------------------------------- ### Get Slice View of Vector Elements Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Use `as_slice` to obtain an immutable slice view of the vector's elements. This allows reading elements without copying. ```rust let slice: &[f32] = v.as_slice(); ``` -------------------------------- ### WasmSimd128 Intrinsic Access Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md Access WebAssembly SIMD128 intrinsics directly using `core::arch::wasm32`. This example demonstrates adding two `v128` vectors using `f32x4_add`. ```rust use fearless_simd::kernel; #[cfg(all(target_arch = "wasm32", target_feature = "simd128"))] use core::arch::wasm32::{v128, f32x4_add}; kernel!( fn add_wasm(wasm: WasmSimd128, a: v128, b: v128) -> v128 { f32x4_add(a, b) } ); ``` -------------------------------- ### Accessing NEON SIMD Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md Demonstrates how to check for and obtain a NEON SIMD implementation using the Level API. This is useful for conditionally using NEON features at runtime. ```rust use fearless_simd::Level; let level = Level::new(); if let Some(neon) = level.as_neon() { // Use NEON SIMD } ``` -------------------------------- ### Load-Process-Store Pattern Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md Demonstrates a basic load-process-store pattern for SIMD operations. Ensure functions passed to `dispatch!` have `#[inline(always)]`. ```rust #[inline(always)] fn template(simd: S, data: &mut [f32]) { for chunk in data.chunks_exact_mut(4) { // Load let mut vec: f32x4<_> = f32x4::from_slice(simd, chunk); // Process vec = vec * 2.0; // Store vec.store_slice(chunk); } } ``` -------------------------------- ### Slide Elements within f32x4 Vectors Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/03-simd-trait.md Concatenates two f32x4 vectors and extracts N elements starting at a specified shift. Used for rotations and shifts across the full vector width. ```rust fn slide_f32x4(self, a: f32x4, b: f32x4) -> f32x4 ``` -------------------------------- ### Constructing and Using f64x2 Vectors Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Illustrates the creation of an `f64x2` vector using `splat` and its conversion to a standard Rust array. This requires a `Simd` token in scope. ```rust dispatch!(level, simd => { let v = f64x2::splat(simd, 3.14159); let arr: [f64; 2] = v.as_array(); // Deref implements this automatically }); ``` -------------------------------- ### Library Code with Generic SIMD Implementation Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/09-api-index.md Illustrates how to structure library code to handle different SIMD levels generically. It uses a match statement to select the appropriate SIMD implementation and a generic function for the core logic. ```rust use fearless_simd::{Level, Simd, SimdBase}; pub fn process(level: Level, data: &mut [f32]) { match level { Level::Avx2(avx2) => process_impl(avx2, data), Level::Sse4_2(sse42) => process_impl(sse42, data), // ... } } #[inline(always)] fn process_impl(simd: S, data: &mut [f32]) { // Generic implementation } ``` -------------------------------- ### Basic Usage of dispatch! Macro Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md Demonstrates how to use the dispatch! macro to call a function with the appropriate SIMD level. The function being called should be marked with #[inline(always)]. ```rust use fearless_simd::{Level, Simd, dispatch, prelude::*}; #[inline(always)] fn add_vectors(simd: S, a: &[f32], b: &[f32], out: &mut [f32]) { for i in 0..a.len() { // Actual SIMD code here out[i] = a[i] + b[i]; } } fn main() { let level = Level::new(); let a = [1.0, 2.0, 3.0, 4.0]; let b = [5.0, 6.0, 7.0, 8.0]; let mut out = [0.0; 4]; dispatch!(level, simd => add_vectors(simd, &a, &b, &mut out)); assert_eq!(out, [6.0, 8.0, 10.0, 12.0]); } ``` -------------------------------- ### Run WebAssembly SIMD tests Source: https://github.com/linebender/fearless_simd/blob/main/fearless_simd_tests/README.md Execute the WebAssembly tests using a specified target and runtime. This command configures the Rust compiler flags to enable +simd128 and +relaxed-simd features for the WebAssembly target. ```sh cargo test --target wasm32-wasip1 \ --config 'target.wasm32-wasip1.rustflags = "-Ctarget-feature=+simd128,+relaxed-simd"' \ --config 'target.wasm32-wasip1.rustdocflags = "-Ctarget-feature=+simd128,+relaxed-simd"' \ --config 'target.wasm32-wasip1.runner = "wasmtime"' # or "wasmi_cli" if you installed that ``` -------------------------------- ### Simd Trait Definition Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/03-simd-trait.md The Simd trait defines associated types for native-width vectors of different scalar types (f32, f64, u8, i8, u16, i16, u32, i32) and mask types. It also includes methods like `level` to get the SIMD level and `vectorize` to execute a closure with SIMD enabled. ```APIDOC ## Trait: Simd ### Description Defines core SIMD operations and associated types for native-width vectors. ### Associated Types - `f32s`: Native-width vector of f32 elements. - `f64s`: Native-width vector of f64 elements. - `u8s`: Native-width vector of u8 elements. - `i8s`: Native-width vector of i8 elements. - `u16s`: Native-width vector of u16 elements. - `i16s`: Native-width vector of i16 elements. - `u32s`: Native-width vector of u32 elements. - `i32s`: Native-width vector of i32 elements. - `mask8s`: Mask vector for i8 elements. - `mask16s`: Mask vector for i16 elements. - `mask32s`: Mask vector for i32 elements. - `mask64s`: Mask vector for i64 elements. ### Methods - **`level(self) -> Level`**: Returns the SIMD level of the current token. - **`vectorize R, R>(self, f: F) -> R`**: Executes a closure `f` with SIMD enabled. ``` -------------------------------- ### Construct Fallback SIMD Level Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md Demonstrates how to obtain the Fallback SIMD level. This can be done through runtime detection or explicit selection, the latter requiring a feature flag. ```rust use fearless_simd::Level; // Runtime detection fallback let level = Level::new(); if level.is_fallback() { println!("No SIMD support on this CPU"); } // Explicit fallback (requires feature) #[cfg(feature = "force_support_fallback")] let level = Level::fallback(); ``` -------------------------------- ### Create f32x4 Vectors Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/README.md Demonstrates various methods for creating f32x4 vector types, including splatting, from arrays, slices, and functions. Requires a SIMD context. ```rust let v1: f32x4<_> = f32x4::splat(simd, 1.0); // [1.0, 1.0, 1.0, 1.0] let v2: f32x4<_> = [1.0, 2.0, 3.0, 4.0].simd_into(simd); // From array let v3: f32x4<_> = f32x4::from_slice(simd, slice); // From slice let v4: f32x4<_> = f32x4::from_fn(simd, |i| i as f32); // From function ``` -------------------------------- ### Level::as_sse4_2 Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md Provides access to the SSE4.2 token on x86/x86_64 architectures. ```APIDOC ## Level::as_sse4_2 ### Description Provides access to SSE4.2 token. Returns token even if AVX2 is available (hierarchical compatibility). ### Method `as_sse4_2(self) -> Option` ### Availability `#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]` ### CPU Features Enabled - sse4.2, cmpxchg16b, popcnt (minimum) - Other sse4.1, sse3, ssse3, sse2, sse, avx features implied ### Usage Example ```rust if let Some(sse42) = level.as_sse4_2() { // Can safely use SSE4.2 intrinsics } ``` ``` -------------------------------- ### SIMD Vector Comparisons and Selection Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/README.md Shows how to perform element-wise comparisons between SIMD vectors, generate masks, and use masks for conditional selection or checking lane states. Supports less than, and checking if any or all lanes satisfy a condition. ```rust let mask = a.simd_lt(b); // Less than (returns mask) let result = mask.select(a, b); // Conditional selection if mask.any() { } // Check if any lane is true if mask.all() { } // Check if all lanes are true ``` -------------------------------- ### SimdInto Blanket Implementation Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/05-traits.md Provides a blanket implementation for SimdInto, allowing types that implement SimdFrom to be converted using .simd_into(). ```rust impl, S: Simd> SimdInto for F { fn simd_into(self, simd: S) -> T { SimdFrom::simd_from(simd, self) } } ``` -------------------------------- ### Create Vector using Trait-Based Constructor Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md The `simd_from` constructor supports creating vectors from arrays and scalars, offering a flexible way to initialize vector types. ```rust let v1: f32x4<_> = f32x4::simd_from(simd, 1.0); let v2: f32x4<_> = f32x4::simd_from(simd, [1.0, 2.0, 3.0, 4.0]); ``` -------------------------------- ### dispatch! Macro with Return Value Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md Shows how to use the dispatch! macro when the operation returns a value. The SIMD token can be ignored using `_simd` if not needed. ```rust use fearless_simd::{Level, dispatch}; let level = Level::new(); let result = dispatch!(level, _simd => { // Perform computation 1 + 2 + 3 }); assert_eq!(result, 6); ``` -------------------------------- ### Splitting and Combining AVX2 Vectors Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md Demonstrates how to split a wide 256-bit AVX2 vector into two 128-bit SSE-width vectors (lo and hi) and then combine them back. Useful for processing wider vectors with SSE-compatible logic. ```rust use fearless_simd::{f32x8, f32x4, Level, dispatch, prelude::*}; dispatch!(level, simd => { let wide: f32x8<_> = ...; let (lo, hi): (f32x4<_>, f32x4<_>) = wide.split(); // Process lo and hi independently let combined: f32x8<_> = lo.combine(hi); }); ``` -------------------------------- ### Defining NEON Intrinsics with kernel! Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/07-simd-implementations.md Shows how to define a function that uses ARM NEON intrinsics via the `kernel!` macro. This allows for direct access to NEON operations when targeting aarch64. ```rust use fearless_simd::kernel; #[cfg(target_arch = "aarch64")] use core::arch::aarch64::vaddq_f32; kernel!( fn add_neon(neon: Neon, a: float32x4_t, b: float32x4_t) -> float32x4_t { vaddq_f32(a, b) } ); ``` -------------------------------- ### Create Vector by Calling a Function for Each Lane Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Use `from_fn` to create a vector by applying a closure to each lane's index. This allows for dynamic initialization based on lane position. ```rust let v: i32x4<_> = i32x4::from_fn(simd, |i| (i * 2) as i32); // [0, 2, 4, 6] ``` -------------------------------- ### Level::new() Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md Detects available CPU features at runtime and returns the best available SIMD level. This method requires the `std` feature or a `wasm32` target. It is recommended to call this once and reuse the result for efficiency. ```APIDOC ## Level::new() ### Description Detects available CPU features at runtime and returns the best available SIMD level. ### Method `Level::new() -> Self` ### Availability Requires `std` feature or `wasm32` target ### Returns `Level` - The highest SIMD level available on the current CPU ### Panics No panics; returns fallback if detection fails ### Usage Example ```rust use fearless_simd::Level; let level = Level::new(); // level is now one of: Fallback, Neon, WasmSimd128, Sse4_2, or Avx2 // depending on CPU capabilities and compilation target ``` ### Notes - Should be called once per application and stored/passed around - Repeating this call is inefficient as it re-detects capabilities each time - Applications should prefer creating `Level` once and passing it as needed ``` -------------------------------- ### Detect CPU SIMD Level and Use Conditionally Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md This snippet shows how to detect the available SIMD level on the CPU and conditionally execute code based on it. It includes checks for AVX2, SSE4.2, NEON, and a fallback. ```rust use fearless_simd::Level; fn main() { let level = Level::new(); println!("CPU SIMD Level: {:?}", level); match level { #[cfg(target_arch = "x86_64")] Level::Avx2(_) => println!("Using AVX2 - 256-bit vectors"), #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] Level::Sse4_2(_) => println!("Using SSE4.2 - 128-bit vectors"), #[cfg(target_arch = "aarch64")] Level::Neon(_) => println!("Using ARM NEON - 128-bit vectors"), Level::Fallback(_) => println!("Using scalar fallback"), _ => {} } // Conditional processing if !level.is_fallback() { println!("SIMD acceleration available!"); } } ``` -------------------------------- ### Performing Operations on Mask Vectors Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Illustrates common logical operations (AND, OR, NOT) and tests (any, all) on mask vectors. Requires a SIMD context and pre-defined masks. ```rust dispatch!(level, simd => { let mask_a: mask32x4<_> = a.simd_lt(b); let mask_b: mask32x4<_> = b.simd_lt(c); let combined = mask_a & mask_b; // Logical AND let either = mask_a | mask_b; // Logical OR let negated = !mask_a; // Logical NOT // Test masks if mask_a.any() { println!("At least one lane is true"); } if mask_a.all() { println!("All lanes are true"); } }); ``` -------------------------------- ### Performing Arithmetic Operations on f32x4 Vectors Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/04-vector-types.md Shows element-wise arithmetic operations like addition, multiplication, division, and square root on `f32x4` vectors. Ensure the `prelude` is imported and a `Simd` token is available. ```rust use fearless_simd::prelude::*; dispatch!(level, simd => { let a = f32x4::splat(simd, 2.0); let b = f32x4::splat(simd, 3.0); let sum = a + b; // Element-wise: [5.0, 5.0, 5.0, 5.0] let prod = a * b; // [6.0, 6.0, 6.0, 6.0] let div = b / a; // [1.5, 1.5, 1.5, 1.5] let sq = a.sqrt(); // [1.414..., 1.414..., 1.414..., 1.414...] }); ``` -------------------------------- ### Custom Kernel with AVX2 Intrinsics Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md Demonstrates how to define a custom kernel using AVX2 intrinsics for bitwise AND operations on 256-bit integers. This approach is specific to x86_64 architectures. ```rust use fearless_simd::{Level, Simd, dispatch, kernel, prelude::*}; #[cfg(target_arch = "x86_64")] use std::arch::x86_64::{__m256i, _mm256_and_si256, _mm256_set1_epi32}; kernel!( fn mask_bits_avx2(avx2: Avx2, v: __m256i) -> __m256i { let mask = _mm256_set1_epi32(0xFF); _mm256_and_si256(v, mask) } ); fn main() { let level = Level::new(); #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] { if let Some(avx2) = level.as_avx2() { let v: i32x8<_> = [255, 256, 257, 258, 259, 260, 261, 262].simd_into(avx2); let masked_raw = mask_bits_avx2(avx2, v.into()); let masked: i32x8<_> = masked_raw.simd_into(avx2); assert_eq!( <[i32; 8]>::from(masked), [255, 0, 1, 2, 3, 4, 5, 6] ); } } } ``` -------------------------------- ### Conceptual Expansion of dispatch! Macro Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md This simplified Rust code conceptually shows how the dispatch! macro expands to handle different SIMD levels based on target architecture and features. ```rust // What the macro expands to (conceptually): { match level { #[cfg(target_arch = "aarch64")] Level::Neon(neon) => { let simd = neon; neon.vectorize(|| operation) }, #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] Level::Avx2(avx2) => { let simd = avx2; avx2.vectorize(|| operation) }, // ... other levels } } ``` -------------------------------- ### dispatch!(level, simd => operation) Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/09-api-index.md Invokes a SIMD operation with the best available SIMD level. This macro abstracts away the complexity of selecting the appropriate SIMD instruction set for the target platform. ```APIDOC ## dispatch!(level, simd => operation) ### Description Invokes operation with best available SIMD level. ### Returns Value of operation expression ### Platforms All (uses appropriate conditional code paths) ``` -------------------------------- ### Accumulation Pattern Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md Shows how to accumulate values from SIMD vectors. Cache `Level::new()` for efficiency. ```rust #[inline(always)] fn sum(simd: S, data: &[f32]) -> f32 { let mut acc = 0.0; for chunk in data.chunks_exact(4) { let vec: f32x4<_> = f32x4::from_slice(simd, chunk); let arr: [f32; 4] = vec.into(); acc += arr.iter().sum::(); } acc } ``` -------------------------------- ### Using dispatch! from a Generic Function Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/06-dispatch-kernel.md Demonstrates how to call the dispatch! macro from within a generic function. The SIMD token is passed to the generic function which contains the actual SIMD logic. ```rust use fearless_simd::{Level, Simd, dispatch}; fn wrapper(level: Level) { dispatch!(level, simd => generic_function(simd)); } #[inline(always)] fn generic_function(simd: S) { // SIMD code here } ``` -------------------------------- ### Access SSE4.2 SIMD Token (x86/x86_64) Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md Provides access to the SSE4.2 token on x86 and x86_64 architectures. This is useful for leveraging SSE4.2 intrinsics, even when higher-level features like AVX2 are available. ```rust if let Some(sse42) = level.as_sse4_2() { // Can safely use SSE4.2 intrinsics } ``` -------------------------------- ### Level::as_avx2 Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md Provides access to the AVX2 token on x86/x86_64 architectures if the x86-64-v3 feature set is available. ```APIDOC ## Level::as_avx2 ### Description Provides access to AVX2 token. Returns token if x86-64-v3 feature set is available. ### Method `as_avx2(self) -> Option` ### Availability `#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]` ### CPU Features Enabled - avx2, bmi1, bmi2, cmpxchg16b, f16c, fma, lzcnt, movbe, popcnt, xsave - All lower levels (sse4.2, etc.) are implied ### Usage Example ```rust if let Some(avx2) = level.as_avx2() { // Can use full AVX2 capabilities let result = fearless_simd::dispatch!(level, simd => function(simd)); } ``` ``` -------------------------------- ### Level::as_neon Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/02-level.md Provides access to the Neon SIMD token on aarch64 architectures if available. ```APIDOC ## Level::as_neon ### Description Provides access to Neon SIMD token if available. Returns the token even if a higher level (not yet supported) is available. ### Method `as_neon(self) -> Option` ### Availability `#[cfg(target_arch = "aarch64")]` ### Usage Example ```rust #[cfg(target_arch = "aarch64")] let level = Level::new(); if let Some(neon) = level.as_neon() { // Use Neon intrinsics with kernel! macro kernel!(/* ... */); } ``` ``` -------------------------------- ### Vector Addition with Fearless SIMD Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/08-examples.md Performs element-wise addition of two float arrays using SIMD for performance. Ensure input arrays are of compatible lengths. ```rust use fearless_simd::{Level, Simd, dispatch, f32x4, prelude::*}; #[inline(always)] fn add_arrays(simd: S, a: &[f32], b: &[f32]) -> Vec { let mut result = Vec::with_capacity(a.len()); for i in (0..a.len()).step_by(4) { let chunk_size = (a.len() - i).min(4); let a_chunk = &a[i..i + chunk_size]; let b_chunk = &b[i..i + chunk_size]; let a_vec: f32x4<_> = f32x4::from_slice(simd, a_chunk); let b_vec: f32x4<_> = f32x4::from_slice(simd, b_chunk); let sum = a_vec + b_vec; for j in 0..chunk_size { result.push(sum[j]); } } result } fn main() { let level = Level::new(); let a = vec![1.0, 2.0, 3.0, 4.0, 5.0]; let b = vec![10.0, 20.0, 30.0, 40.0, 50.0]; let result = dispatch!(level, simd => add_arrays(simd, &a, &b)); assert_eq!(result, vec![11.0, 22.0, 33.0, 44.0, 55.0]); } ``` -------------------------------- ### SIMD Vector Arithmetic Operations Source: https://github.com/linebender/fearless_simd/blob/main/_autodocs/README.md Illustrates common element-wise arithmetic operations on SIMD vectors, including addition, multiplication, division, negation, square root, and fused multiply-add. Assumes vectors 'a' and 'b' are of compatible types. ```rust let sum = a + b; // Element-wise addition let prod = a * b; // Element-wise multiplication let quotient = a / b; // Element-wise division let neg = -a; // Negation let sqrt = a.sqrt(); // Square root let fused = a.mul_add(b, c); // (a * b) + c ```