ArrayFire (arrayfire/arrayfire)

ArrayFire

https://github.com/arrayfire/arrayfire
Admin
ArrayFire is a general-purpose tensor library that simplifies software development for parallel...

Tokens:160,895
Snippets:1,116
Trust Score:7.4
Update:5 days ago
Show doc for...
Context Summary (auto-generated)
Raw
# ArrayFire

ArrayFire is a high-performance tensor computing library designed for parallel and massively-parallel architectures. It provides a unified, easy-to-use API that abstracts away the complexity of programming GPUs, CPUs, and other accelerators, allowing developers to write portable code that runs efficiently on CUDA, OpenCL, oneAPI, and CPU backends. The library is built around a single container object, the `array`, which represents multi-dimensional data stored on the device and enables automatic memory management and just-in-time (JIT) kernel compilation for optimized performance.

ArrayFire offers hundreds of accelerated functions spanning linear algebra, signal processing, image processing, computer vision, machine learning, and statistics. It supports multiple data types including single and double precision floats, complex numbers, half precision, and various integer types. The library features automatic kernel fusion that combines multiple operations into single kernels to minimize memory transfers and maximize throughput, making it ideal for scientific computing, financial modeling, deep learning, and real-time data processing applications.

## Array Creation and Initialization

The `af::array` class is the fundamental data container in ArrayFire. Arrays can be created with random values, constants, or initialized from host memory, providing flexible data initialization for parallel computing workloads.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    // Set device and display info
    af::setDevice(0);
    af::info();

    // Create arrays with random uniform values [0, 1)
    array A = randu(5, 3, f32);        // 5x3 single precision
    array B = randu(100, 100, f64);    // 100x100 double precision
    array C = randu(4, 4, 4);          // 4x4x4 3D array

    // Create arrays with random normal distribution (mean=0, std=1)
    array normal = randn(1000, 1000);

    // Create constant arrays
    array zeros = constant(0, 5, 5);
    array ones = constant(1, 10, 10);
    array identity = af::identity(4, 4);

    // Create from host data
    float host_data[] = {1, 2, 3, 4, 5, 6};
    array D(2, 3, host_data, afHost);  // 2x3 array from host

    // Create using dim4 for dimensions
    dim4 dims(16, 4, 1, 1);
    array E = constant(2, dims);

    // Print arrays
    af_print(A);
    af_print(D);

    // Get array properties
    printf("Dimensions: %lld x %lld\n", A.dims(0), A.dims(1));
    printf("Total elements: %lld\n", A.elements());
    printf("Type: %d\n", A.type());

    return 0;
}
```

## Element-wise Operations and Mathematical Functions

ArrayFire supports a comprehensive set of element-wise arithmetic operations and mathematical functions that are automatically JIT-compiled and fused for optimal performance.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    array A = randu(1000, 1000);
    array B = randu(1000, 1000);

    // Basic arithmetic (element-wise)
    array sum = A + B;
    array diff = A - B;
    array prod = A * B;
    array quot = A / B;

    // Mathematical functions
    array sine = sin(A);
    array cosine = cos(A);
    array tangent = tan(A);
    array exponential = exp(A);
    array logarithm = log(A);
    array sqroot = sqrt(A);
    array power = pow(A, 2.0f);
    array absolute = abs(A - 0.5f);

    // Hyperbolic functions
    array hyp_sin = sinh(A);
    array hyp_cos = cosh(A);

    // Comparison operations (returns boolean array)
    array mask = A > 0.5f;
    array equal = A == B;

    // Conditional selection
    array selected = select(mask, A, B);  // where mask true, use A; else B

    // Clamping values
    array clamped = clamp(A, 0.2f, 0.8f);

    // Complex numbers
    array real_part = randu(10, 10);
    array imag_part = randu(10, 10);
    array complex_arr = complex(real_part, imag_part);
    array magnitude = abs(complex_arr);
    array phase = arg(complex_arr);

    af_print(sum);

    return 0;
}
```

## Matrix Operations and Linear Algebra (BLAS)

ArrayFire provides high-performance BLAS operations including matrix multiplication, transpose, and various matrix decompositions through an intuitive API.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    array A = randu(4, 3, f32);
    array B = randu(3, 5, f32);

    // Matrix multiplication
    array C = matmul(A, B);           // A * B
    array D = matmulNT(A, A);         // A * A^T
    array E = matmulTN(A, A);         // A^T * A
    array F = matmulTT(A, B);         // A^T * B^T

    // Transpose
    array At = A.T();                 // Transpose
    array Ah = A.H();                 // Hermitian (conjugate transpose)

    // Dot product (for vectors)
    array v1 = randu(100);
    array v2 = randu(100);
    array dot_prod = dot(v1, v2);

    // Batch matrix multiplication
    array batch_A = randu(4, 4, 10);  // 10 matrices of 4x4
    array batch_B = randu(4, 4, 10);
    array batch_C = matmul(batch_A, batch_B);  // Multiplies all 10 pairs

    // Matrix norms
    double frobenius = norm(A, AF_NORM_EUCLID);
    double l1_norm = norm(A, AF_NORM_MATRIX_1);
    double inf_norm = norm(A, AF_NORM_MATRIX_INF);

    printf("Frobenius norm: %f\n", frobenius);
    af_print(C);

    return 0;
}
```

## Linear Algebra Decompositions and Solvers (LAPACK)

ArrayFire supports essential matrix decompositions including LU, QR, Cholesky, and SVD, along with linear system solvers for scientific computing applications.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    // SVD decomposition
    float h_buffer[] = {1, 4, 2, 5, 3, 6};
    array in(2, 3, h_buffer);

    array u, s_vec, vt;
    svd(u, s_vec, vt, in);

    // Reconstruct original matrix
    array s_mat = diag(s_vec, 0, false);
    array reconstructed = matmul(u, s_mat, vt(seq(2), span));

    af_print(s_vec);  // Singular values
    af_print(u);      // Left singular vectors
    af_print(vt);     // Right singular vectors (transposed)

    // LU decomposition
    array A = randu(4, 4);
    array lower, upper, pivot;
    lu(lower, upper, pivot, A);

    // QR decomposition
    array Q, R, tau;
    qr(Q, R, tau, A);

    // Cholesky decomposition (for positive definite matrices)
    array sym = matmulNT(A, A) + af::identity(4, 4) * 4;  // Make positive definite
    array L;
    int info = cholesky(L, sym, true);  // Upper triangular

    // Solve linear system Ax = b
    array b = randu(4, 1);
    array x = solve(A, b);

    // Verify solution
    array residual = matmul(A, x) - b;
    printf("Residual norm: %e\n", norm(residual));

    // Matrix inverse
    array A_inv = inverse(A);

    // Pseudo-inverse (Moore-Penrose)
    array rect = randu(5, 3);
    array pinv = pinverse(rect);

    // Determinant
    double det_val = det<double>(A);
    printf("Determinant: %f\n", det_val);

    // Matrix rank
    unsigned int r = rank(A);
    printf("Rank: %u\n", r);

    return 0;
}
```

## Signal Processing and FFT

ArrayFire includes comprehensive signal processing functions with 1D, 2D, and 3D Fast Fourier Transforms, convolution, and correlation operations optimized for parallel execution.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    // 1D FFT
    array signal = randu(1024);
    array spectrum = fft(signal);
    array recovered = ifft(spectrum);

    // 2D FFT (for images)
    array image = randu(256, 256);
    array freq_domain = fft2(image);
    array spatial = ifft2(freq_domain);

    // 3D FFT
    array volume = randu(64, 64, 64);
    array vol_fft = fft3(volume);

    // FFT with normalization
    array normalized = fftNorm(signal, 1.0 / sqrt(1024.0));

    // In-place FFT (must be complex input)
    array complex_signal = complex(signal, constant(0, signal.dims()));
    fftInPlace(complex_signal);

    // Real-to-complex FFT (optimized for real inputs)
    array rfft_result = fftR2C<1>(signal);

    // 1D Convolution
    array kernel = randu(5);
    array convolved = convolve1(signal, kernel);

    // 2D Convolution
    array img = randu(100, 100);
    array filter = gaussianKernel(5, 5, 1.0, 1.0);
    array blurred = convolve2(img, filter);

    // Separable convolution (more efficient)
    float h1[] = {1, 1, 1};
    float h2[] = {-1, 0, 1};
    array row_kernel(3, h1);
    array col_kernel(3, h2);
    array sep_conv = convolve(row_kernel, col_kernel, img);

    // Correlation
    array correlated = convolve2(img, flip(flip(filter, 0), 1));

    // FIR filter
    array fir_coeffs = randu(10);
    array filtered = fir(fir_coeffs, signal);

    // IIR filter
    array a_coeffs = randu(5);
    array b_coeffs = randu(5);
    array iir_out = iir(b_coeffs, a_coeffs, signal);

    // Signal interpolation
    array positions = randu(100) * 1023;  // Interpolation positions
    array interpolated = approx1(signal, positions, AF_INTERP_LINEAR);

    af_print(spectrum(seq(10)));  // First 10 frequency components

    return 0;
}
```

## Image Processing

ArrayFire provides a rich set of image processing functions including filtering, morphological operations, color space conversions, and geometric transformations.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    // Load image
    array img = loadImage("image.jpg", true);  // true for color

    // Color space conversions
    array gray = colorSpace(img, AF_GRAY, AF_RGB);
    array hsv = colorSpace(img, AF_HSV, AF_RGB);

    // Image resizing
    array resized = resize(img, 200, 300, AF_INTERP_BILINEAR);
    array scaled = resize(0.5f, 0.5f, img);  // Scale by factor

    // Image rotation
    array rotated = rotate(img, af::Pi / 4, true);  // 45 degrees, crop

    // Geometric transformations
    array transform_matrix = randu(3, 2);  // Affine transform
    array transformed = transform(img, transform_matrix);

    // Edge detection with Sobel
    array gx, gy;
    sobel(gx, gy, gray);
    array edges = hypot(gx, gy);

    // Gaussian blur
    array gauss_kernel = gaussianKernel(5, 5, 1.0, 1.0);
    array blurred = convolve(gray, gauss_kernel);

    // Median filter (noise removal)
    array denoised = medfilt(gray, 5, 5);

    // Bilateral filter (edge-preserving smoothing)
    array bilateral_out = bilateral(gray, 3.0f, 40.0f);

    // Morphological operations
    array kernel = constant(1, 5, 5);
    array dilated = dilate(gray, kernel);
    array eroded = erode(gray, kernel);
    array opened = erode(dilate(gray, kernel), kernel);
    array closed = dilate(erode(gray, kernel), kernel);

    // Histogram
    array hist = histogram(gray, 256, 0.0, 255.0);

    // Histogram equalization
    array equalized = histEqual(gray, hist);

    // Gradients
    array dx, dy;
    grad(dx, dy, gray);

    // Image derivatives using SAT (Summed Area Table)
    array sat = sat(gray);

    // Save image
    saveImage("output.jpg", resized);

    // Check if ImageIO is available
    if (isImageIOAvailable()) {
        printf("Image I/O is available\n");
    }

    return 0;
}
```

## Computer Vision and Feature Detection

ArrayFire includes computer vision functions for feature detection, extraction, and matching, enabling efficient implementation of vision algorithms on GPUs.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    array img = randu(480, 640);  // Grayscale image

    // FAST corner detection
    features fast_feats = fast(img, 20.0f, 9, true, 0.05f, 3);
    printf("FAST features found: %zu\n", fast_feats.getNumFeatures());

    // Get feature coordinates
    array x_coords = fast_feats.getX();
    array y_coords = fast_feats.getY();
    array scores = fast_feats.getScore();

    // Harris corner detection
    features harris_feats = harris(img, 3, 0.04f, 500, 0.05f);

    // ORB feature detection and description
    features orb_feats;
    array orb_desc;
    orb(orb_feats, orb_desc, img, 20.0f, 500, 1.2f, 8, true);

    // SIFT feature detection (if available)
    features sift_feats;
    array sift_desc;
    sift(sift_feats, sift_desc, img, 6, 3, 0.04f, 10.0f, 1.6f, true, 1.0f, true);

    // Feature matching (Hamming distance for binary descriptors)
    array img2 = randu(480, 640);
    features orb_feats2;
    array orb_desc2;
    orb(orb_feats2, orb_desc2, img2, 20.0f, 500, 1.2f, 8, true);

    array idx, dist;
    hammingMatcher(idx, dist, orb_desc, orb_desc2, 0, 1);

    // Nearest neighbor matching
    array nn_idx, nn_dist;
    nearestNeighbour(nn_idx, nn_dist, sift_desc, sift_desc, 1, 1, AF_SSD);

    // Homography estimation (RANSAC)
    array src_pts = randu(2, 100);
    array dst_pts = src_pts + randn(2, 100) * 0.01f;
    array H, inliers;
    homography(H, inliers, src_pts, dst_pts, AF_HOMOGRAPHY_RANSAC, 3.0f, 1000);

    // Template matching
    array templ = img(seq(100, 150), seq(100, 150));
    array match_result;
    // matchTemplate(match_result, img, templ, AF_SAD);

    af_print(x_coords(seq(10)));  // First 10 x coordinates

    return 0;
}
```

## Statistics Functions

ArrayFire provides comprehensive statistical functions for computing means, variances, correlations, and other statistical measures across array dimensions.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    array data = randu(1000, 100);  // 1000 samples, 100 features

    // Mean along dimension
    array col_means = mean(data, 0);       // Mean of each column
    array row_means = mean(data, 1);       // Mean of each row

    // Weighted mean
    array weights = randu(1000, 1);
    array weighted_mean = mean(data, weights, 0);

    // Global mean (scalar result)
    double global_mean = mean<double>(data);

    // Variance
    array col_var = var(data, AF_VARIANCE_SAMPLE, 0);
    array pop_var = var(data, AF_VARIANCE_POPULATION, 0);

    // Standard deviation
    array col_stdev = stdev(data, AF_VARIANCE_SAMPLE, 0);

    // Mean and variance together (more efficient)
    array m, v;
    meanvar(m, v, data, array(), AF_VARIANCE_SAMPLE, 0);

    // Covariance matrix
    array X = randu(100, 10);
    array Y = randu(100, 10);
    array cov_matrix = cov(X, Y, false);  // false = not biased

    // Correlation coefficient
    array corr = corrcoef(X, Y);

    // Median
    array col_median = median(data, 0);

    // Min and max
    array col_min = min(data, 0);
    array col_max = max(data, 0);

    // Min/max with indices
    array min_vals, min_idx;
    min(min_vals, min_idx, data, 0);

    // Global min/max (scalar)
    double global_min = min<double>(data);
    double global_max = max<double>(data);

    // Sum and product
    array col_sum = sum(data, 0);
    array col_prod = product(data, 0);

    // Cumulative operations
    array cum_sum = accum(data, 0);  // Cumulative sum
    array scan_prod = scan(data, 0, AF_BINARY_MUL);  // Cumulative product

    // Count non-zero elements
    unsigned count_nonzero = count<unsigned>(data > 0.5f);

    // Any and all (logical reductions)
    bool any_positive = anyTrue<bool>(data > 0);
    bool all_positive = allTrue<bool>(data > 0);

    printf("Global mean: %f\n", global_mean);
    printf("Global min: %f, max: %f\n", global_min, global_max);

    return 0;
}
```

## Random Number Generation

ArrayFire provides flexible random number generation with support for multiple distributions and engine types, enabling reproducible simulations on parallel hardware.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    // Set global seed for reproducibility
    setSeed(12345);

    // Uniform distribution [0, 1)
    array uniform = randu(1000, 1000);

    // Normal distribution (mean=0, std=1)
    array normal = randn(1000, 1000);

    // Create custom random engine
    randomEngine engine(AF_RANDOM_ENGINE_MERSENNE_GP11213, 42);

    // Generate with custom engine
    array custom_uniform = randu(dim4(100, 100), f32, engine);
    array custom_normal = randn(dim4(100, 100), f32, engine);

    // Scale and shift for custom distributions
    array custom_uniform_range = uniform * 10 + 5;  // [5, 15)
    array custom_normal_scaled = normal * 2 + 10;   // mean=10, std=2

    // Get/set engine type
    engine.setType(AF_RANDOM_ENGINE_PHILOX_4X32_10);
    randomEngineType rtype = engine.getType();

    // Get/set seed
    engine.setSeed(9999);
    unsigned long long current_seed = engine.getSeed();

    // Set default random engine type
    setDefaultRandomEngineType(AF_RANDOM_ENGINE_MERSENNE_GP11213);

    // Get default engine
    randomEngine default_engine = getDefaultRandomEngine();

    // Integer random values
    array int_random = (randu(100, 100) * 100).as(s32);  // 0-99 integers

    // Permutation (shuffle indices)
    array indices = range(dim4(100));
    array shuffled = indices(randu(100).as(u32) % 100);

    printf("Current seed: %llu\n", current_seed);
    af_print(uniform(seq(5), seq(5)));  // 5x5 sample

    return 0;
}
```

## Array Indexing and Slicing

ArrayFire provides powerful indexing capabilities using sequences, spans, and index arrays for efficient data access and manipulation on parallel devices.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    array A = randu(10, 10, 3);

    // Single element access
    float val = A(0, 0, 0).scalar<float>();

    // Row and column access
    array first_row = A.row(0);
    array last_col = A.col(end);

    // Multiple rows/columns
    array first_3_rows = A.rows(0, 2);
    array last_3_cols = A.cols(end - 2, end);

    // Slice access (3rd dimension)
    array first_slice = A.slice(0);
    array slices = A.slices(0, 1);

    // Using seq for ranges
    array sub = A(seq(2, 5), seq(3, 7), 0);  // Rows 2-5, cols 3-7, slice 0

    // Strided access
    array every_other_row = A(seq(0, end, 2), span);  // Every other row

    // Using span for all elements
    array all_rows = A(span, 0);  // All rows, first column

    // Boolean indexing
    array mask = A > 0.5f;
    array selected = A(mask);  // Returns 1D array of selected elements

    // Index array
    array indices = (randu(5) * 9).as(s32);  // Random indices 0-9
    array indexed = A(indices, 0);  // Select rows by indices

    // Assignment with indexing
    A(seq(0, 2), seq(0, 2), 0) = constant(0, 3, 3);  // Set subarray to 0
    A.row(5) = randu(1, 10, 3);  // Replace row 5
    A(A < 0.1f) = 0;  // Set small values to 0

    // Flip arrays
    array flipped_h = flip(A, 0);  // Flip horizontally
    array flipped_v = flip(A, 1);  // Flip vertically

    // Reshape arrays
    array B = randu(12);
    array reshaped = moddims(B, 3, 4);      // Reshape to 3x4
    array flat = flat(A);                    // Flatten to 1D

    // Join arrays
    array C = randu(10, 5);
    array D = randu(10, 5);
    array joined_cols = join(1, C, D);       // Join along columns
    array joined_rows = join(0, C.T(), D.T());  // Join along rows

    // Tile (repeat) array
    array tiled = tile(C, 2, 3);  // Repeat 2x in rows, 3x in cols

    // Reorder dimensions
    array reordered = reorder(A, 2, 0, 1);  // Permute dimensions

    af_print(sub);

    return 0;
}
```

## Reduction Operations

ArrayFire provides efficient parallel reduction operations for computing sums, products, and logical aggregations across specified dimensions.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    array A = randu(1000, 1000);

    // Sum reductions
    array col_sums = sum(A, 0);           // Sum along rows (result: 1 x 1000)
    array row_sums = sum(A, 1);           // Sum along columns (result: 1000 x 1)
    double total_sum = sum<double>(A);    // Global sum (scalar)

    // Product reductions
    array col_prods = product(A, 0);
    double total_prod = product<double>(A(seq(10), seq(10)));  // Small subarray

    // Min/Max reductions
    array col_mins = min(A, 0);
    array col_maxs = max(A, 0);
    double global_min = min<double>(A);
    double global_max = max<double>(A);

    // Min/Max with indices
    array min_vals, min_indices;
    min(min_vals, min_indices, A, 0);

    array max_vals, max_indices;
    max(max_vals, max_indices, A, 0);

    // Logical reductions
    array B = A > 0.5f;
    bool any_true = anyTrue<bool>(B);      // Any element true?
    bool all_true = allTrue<bool>(B);      // All elements true?

    array any_per_col = anyTrue(B, 0);     // Any true per column
    array all_per_col = allTrue(B, 0);     // All true per column

    // Count non-zero
    unsigned count = count<unsigned>(B);
    array counts_per_col = count(B, 0);

    // Scan operations (prefix sums)
    array prefix_sum = accum(A, 0);                    // Cumulative sum
    array prefix_prod = scan(A, 0, AF_BINARY_MUL);    // Cumulative product
    array prefix_max = scan(A, 0, AF_BINARY_MAX);     // Running maximum
    array prefix_min = scan(A, 0, AF_BINARY_MIN);     // Running minimum

    // Exclusive scan (first element is identity)
    array excl_scan = scan(A, 0, AF_BINARY_ADD, false);  // false = exclusive

    // Scan by key (segmented scan)
    array keys = (randu(1000) * 5).as(s32);  // 5 different keys
    array data = randu(1000);
    array keyed_scan = scanByKey(keys, data, 0, AF_BINARY_ADD);

    // Reduce by key
    array unique_keys, reduced_vals;
    reduceByKey(unique_keys, reduced_vals, keys, data, 0, AF_BINARY_ADD);

    // Where (find indices of non-zero elements)
    array indices = where(A > 0.9f);

    // Set operations
    array set1 = (randu(100) * 50).as(s32);
    array set2 = (randu(100) * 50).as(s32);
    array unique_vals = setUnique(set1);
    array union_vals = setUnion(set1, set2);
    array intersect_vals = setIntersect(set1, set2);

    printf("Total sum: %f\n", total_sum);
    printf("Global min: %f, max: %f\n", global_min, global_max);
    printf("Count > 0.5: %u\n", count);

    return 0;
}
```

## Sorting and Ordering

ArrayFire provides high-performance parallel sorting algorithms for ordering arrays along any dimension, with support for both values and index tracking.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    array A = randu(10, 5);

    // Sort along first dimension (columns)
    array sorted = sort(A, 0, true);   // true = ascending
    array sorted_desc = sort(A, 0, false);  // false = descending

    // Sort with indices
    array sorted_vals, indices;
    sort(sorted_vals, indices, A, 0, true);

    // Sort by keys (sort one array based on another)
    array keys = randu(100);
    array values = randu(100, 10);
    array sorted_keys, sorted_values;
    sort(sorted_keys, sorted_values, keys, values);

    // Get unique elements
    array unique_vals = setUnique(A);

    // Get unique elements with indices
    array unique_out, unique_indices, unique_inverse;
    setUnique(unique_out, unique_indices, unique_inverse, A);

    // Diff (differences between consecutive elements)
    array differences = diff1(A, 0);  // First-order difference
    array diff2 = diff2(A, 0);        // Second-order difference

    // Unwrap (convert to column-major sequence of patches)
    array img = randu(100, 100);
    array unwrapped = unwrap(img, 3, 3, 1, 1);  // 3x3 patches, stride 1

    // Wrap (inverse of unwrap)
    array wrapped = wrap(unwrapped, 100, 100, 3, 3, 1, 1);

    af_print(sorted);
    af_print(indices);

    return 0;
}
```

## Sparse Matrix Operations

ArrayFire supports sparse matrix operations with CSR, CSC, and COO storage formats, enabling efficient computation with large sparse datasets.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    // Create sparse matrix from dense
    array dense = randu(100, 100);
    dense = dense * (dense > 0.9f);  // Make sparse (~10% non-zero)

    array sparse_csr = sparse(dense, AF_STORAGE_CSR);
    array sparse_csc = sparse(dense, AF_STORAGE_CSC);
    array sparse_coo = sparse(dense, AF_STORAGE_COO);

    // Create sparse from components
    float values[] = {1, 2, 3, 4, 5};
    int row_idx[] = {0, 0, 1, 2, 2};
    int col_idx[] = {0, 2, 1, 0, 2};

    array vals(5, values);
    array rows(5, row_idx);
    array cols(5, col_idx);

    array sp = sparse(3, 3, vals, rows, cols, AF_STORAGE_COO);

    // Convert between formats
    array csr_from_coo = sparseConvertTo(sp, AF_STORAGE_CSR);

    // Convert sparse to dense
    array back_to_dense = dense(sparse_csr);

    // Get sparse matrix info
    array sp_vals, sp_rows, sp_cols;
    af::storage stype;
    sparseGetInfo(sp_vals, sp_rows, sp_cols, stype, sparse_csr);

    // Get individual components
    array just_vals = sparseGetValues(sparse_csr);
    array just_rows = sparseGetRowIdx(sparse_csr);
    array just_cols = sparseGetColIdx(sparse_csr);
    dim_t nnz = sparseGetNNZ(sparse_csr);

    // Sparse matrix-vector multiplication
    array x = randu(100, 1);
    array y = matmul(sparse_csr, x);

    // Sparse matrix-matrix multiplication
    array B = randu(100, 10);
    array C = matmul(sparse_csr, B);

    printf("Non-zero elements: %lld\n", nnz);
    af_print(just_vals);

    return 0;
}
```

## Device Management and Memory

ArrayFire provides functions for managing multiple devices, controlling memory allocation, and synchronizing computations across CPU and GPU.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    // Display system info
    af::info();

    // Get number of available devices
    int num_devices = getDeviceCount();
    printf("Number of devices: %d\n", num_devices);

    // Get current device
    int current_device = getDevice();
    printf("Current device: %d\n", current_device);

    // Set device
    setDevice(0);

    // Check device capabilities
    bool has_double = isDoubleAvailable(0);
    bool has_half = isHalfAvailable(0);
    printf("Double precision: %s\n", has_double ? "yes" : "no");
    printf("Half precision: %s\n", has_half ? "yes" : "no");

    // Get device info
    char name[64], platform[64], toolkit[64], compute[64];
    deviceInfo(name, platform, toolkit, compute);
    printf("Device: %s, Platform: %s\n", name, platform);

    // Memory management
    array A = randu(1000, 1000);

    // Synchronize (wait for all operations to complete)
    af::sync();
    af::sync(0);  // Sync specific device

    // Force evaluation of lazy expressions
    A.eval();

    // Multiple array evaluation
    array B = sin(A);
    array C = cos(A);
    eval(B, C);  // Evaluate both

    // Memory info
    size_t alloc_bytes, alloc_buffers, lock_bytes, lock_buffers;
    deviceMemInfo(&alloc_bytes, &alloc_buffers, &lock_bytes, &lock_buffers);
    printf("Allocated: %zu bytes in %zu buffers\n", alloc_bytes, alloc_buffers);

    // Manual memory management
    void* device_ptr = allocV2(1024 * sizeof(float));  // Allocate 1024 floats
    // ... use device_ptr ...
    freeV2(device_ptr);  // Free memory

    // Garbage collection
    deviceGC();  // Free unused memory

    // Set memory step size
    setMemStepSize(1024 * 1024);  // 1 MB minimum allocation

    // Copy data to host
    float* host_data = A.host<float>();
    // ... use host_data ...
    freeHost(host_data);

    // Get device pointer (for interop)
    float* dev_ptr = A.device<float>();
    A.unlock();  // Unlock after use

    // Lock array (prevent garbage collection)
    A.lock();
    // ... array is protected ...
    A.unlock();

    return 0;
}
```

## GFOR: Parallel Loop Execution

ArrayFire's GFOR construct enables batched execution of operations across loop iterations, automatically parallelizing computations that would otherwise require sequential processing.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    af::setDevice(0);

    // Traditional sequential loop
    array A = randu(100, 100);
    array B = randu(100, 100);
    array results(100, 1);

    // Sequential (slow)
    for (int i = 0; i < 100; i++) {
        results(i) = sum<float>(A.col(i) * B.col(i));
    }

    // Parallel with GFOR (fast)
    array results_gfor(100, 1);
    gfor (seq i, 100) {
        results_gfor(i) = sum(A.col(i) * B.col(i));
    }

    // GFOR with batch function
    array batchFunc(const array& lhs, const array& rhs) {
        return sum(lhs * rhs);
    };
    array batch_result = batchFunc(A, B, batchFunc);

    // Process multiple images in parallel
    array images = randu(256, 256, 100);  // 100 images
    array processed(256, 256, 100);

    gfor (seq i, 100) {
        array img = images(span, span, i);
        array filtered = convolve2(img, gaussianKernel(5, 5));
        processed(span, span, i) = filtered;
    }

    // Batch matrix operations
    array matrices = randu(4, 4, 100);  // 100 4x4 matrices
    array determinants(100);

    gfor (seq i, 100) {
        determinants(i) = det<float>(matrices(span, span, i));
    }

    af_print(results(seq(10)));
    af_print(results_gfor(seq(10)));

    return 0;
}
```

## Timing and Performance Measurement

ArrayFire provides built-in timing utilities for measuring execution performance of GPU operations, essential for optimization and benchmarking.

```cpp
#include <arrayfire.h>
using namespace af;

// Function to time
void matmul_benchmark() {
    array A = randu(1000, 1000);
    array B = randu(1000, 1000);
    array C = matmul(A, B);
    C.eval();  // Force evaluation
}

void fft_benchmark() {
    array A = randu(1024, 1024);
    array B = fft2(A);
    B.eval();
}

int main() {
    af::setDevice(0);
    af::info();

    // Method 1: Using timer class
    timer::start();

    array A = randu(1000, 1000);
    array B = randu(1000, 1000);
    array C = matmul(A, B);
    C.eval();
    af::sync();  // Wait for completion

    double elapsed = timer::stop();
    printf("Matrix multiply time: %.4f seconds\n", elapsed);

    // Method 2: Using timeit() for automatic averaging
    double matmul_time = timeit(matmul_benchmark);
    double fft_time = timeit(fft_benchmark);

    printf("Average matmul time: %.6f seconds\n", matmul_time);
    printf("Average FFT time: %.6f seconds\n", fft_time);

    // Benchmark different sizes
    printf("\nMatrix multiply benchmark:\n");
    for (int n = 256; n <= 2048; n *= 2) {
        timer::start();

        array X = randu(n, n);
        array Y = randu(n, n);

        for (int i = 0; i < 10; i++) {
            array Z = matmul(X, Y);
            Z.eval();
        }
        af::sync();

        double t = timer::stop() / 10.0;
        double gflops = (2.0 * n * n * n) / (t * 1e9);
        printf("Size %4d x %4d: %.4f ms, %.2f GFLOPS\n",
               n, n, t * 1000, gflops);
    }

    return 0;
}
```

## Backend Selection and Unified API

ArrayFire supports multiple compute backends (CUDA, OpenCL, oneAPI, CPU) through a unified API, allowing seamless switching between backends without code changes.

```cpp
#include <arrayfire.h>
using namespace af;

int main() {
    // Get available backends
    int backends = getAvailableBackends();

    printf("Available backends:\n");
    if (backends & AF_BACKEND_CUDA)   printf("  - CUDA\n");
    if (backends & AF_BACKEND_OPENCL) printf("  - OpenCL\n");
    if (backends & AF_BACKEND_CPU)    printf("  - CPU\n");
    if (backends & AF_BACKEND_ONEAPI) printf("  - oneAPI\n");

    // Get current backend
    af::Backend current = getActiveBackend();
    printf("Current backend: %d\n", current);

    // Set backend (unified library)
    // setBackend(AF_BACKEND_CUDA);     // Switch to CUDA
    // setBackend(AF_BACKEND_OPENCL);   // Switch to OpenCL
    // setBackend(AF_BACKEND_CPU);      // Switch to CPU

    // Set device for specific backend
    setDevice(0);  // First device of current backend

    // Same code works on all backends
    array A = randu(1000, 1000);
    array B = randu(1000, 1000);

    timer::start();
    array C = matmul(A, B);
    C.eval();
    af::sync();
    printf("Matmul time: %.4f seconds\n", timer::stop());

    // Get backend-specific info
    af::info();

    // Check LAPACK availability
    if (isLAPACKAvailable()) {
        printf("LAPACK is available\n");

        array M = randu(100, 100);
        array U, S, Vt;
        svd(U, S, Vt, M);
    }

    return 0;
}
```

ArrayFire is designed for high-performance computing applications where parallel processing is essential. Its primary use cases include scientific computing and numerical simulations, financial modeling and quantitative analysis, image and signal processing pipelines, machine learning and deep learning implementations, and real-time data analytics. The library excels in scenarios where operations on large multi-dimensional arrays can be parallelized across GPU cores, providing significant speedups over traditional CPU-based implementations.

Integration with existing codebases is straightforward through ArrayFire's C and C++ APIs, with additional bindings available for Python, Rust, Julia, and other languages. The unified backend architecture allows applications to target CUDA GPUs, OpenCL devices, Intel oneAPI accelerators, or multi-core CPUs with the same code. ArrayFire's JIT compilation engine automatically optimizes chains of operations, reducing memory transfers and kernel launch overhead. For performance-critical applications, developers can directly access device pointers for interoperability with existing CUDA/OpenCL code, making ArrayFire an ideal foundation for building high-performance numerical computing applications that need to scale across diverse hardware platforms.