Try Live
Add Docs
Rankings
Pricing
Enterprise
Docs
Install
Theme
Install
Docs
Pricing
Enterprise
More...
More...
Try Live
Rankings
Create API Key
Add Docs
ArrayFire
https://github.com/arrayfire/arrayfire
Admin
ArrayFire is a general-purpose tensor library that simplifies software development for parallel
...
Tokens:
160,895
Snippets:
1,116
Trust Score:
7.4
Update:
5 days ago
Context
Skills
Chat
Benchmark
81.8
Suggestions
Latest
Show doc for...
Code
Info
Show Results
Context Summary (auto-generated)
Raw
Copy
Link
# ArrayFire ArrayFire is a high-performance tensor computing library designed for parallel and massively-parallel architectures. It provides a unified, easy-to-use API that abstracts away the complexity of programming GPUs, CPUs, and other accelerators, allowing developers to write portable code that runs efficiently on CUDA, OpenCL, oneAPI, and CPU backends. The library is built around a single container object, the `array`, which represents multi-dimensional data stored on the device and enables automatic memory management and just-in-time (JIT) kernel compilation for optimized performance. ArrayFire offers hundreds of accelerated functions spanning linear algebra, signal processing, image processing, computer vision, machine learning, and statistics. It supports multiple data types including single and double precision floats, complex numbers, half precision, and various integer types. The library features automatic kernel fusion that combines multiple operations into single kernels to minimize memory transfers and maximize throughput, making it ideal for scientific computing, financial modeling, deep learning, and real-time data processing applications. ## Array Creation and Initialization The `af::array` class is the fundamental data container in ArrayFire. Arrays can be created with random values, constants, or initialized from host memory, providing flexible data initialization for parallel computing workloads. ```cpp #include <arrayfire.h> using namespace af; int main() { // Set device and display info af::setDevice(0); af::info(); // Create arrays with random uniform values [0, 1) array A = randu(5, 3, f32); // 5x3 single precision array B = randu(100, 100, f64); // 100x100 double precision array C = randu(4, 4, 4); // 4x4x4 3D array // Create arrays with random normal distribution (mean=0, std=1) array normal = randn(1000, 1000); // Create constant arrays array zeros = constant(0, 5, 5); array ones = constant(1, 10, 10); array identity = af::identity(4, 4); // Create from host data float host_data[] = {1, 2, 3, 4, 5, 6}; array D(2, 3, host_data, afHost); // 2x3 array from host // Create using dim4 for dimensions dim4 dims(16, 4, 1, 1); array E = constant(2, dims); // Print arrays af_print(A); af_print(D); // Get array properties printf("Dimensions: %lld x %lld\n", A.dims(0), A.dims(1)); printf("Total elements: %lld\n", A.elements()); printf("Type: %d\n", A.type()); return 0; } ``` ## Element-wise Operations and Mathematical Functions ArrayFire supports a comprehensive set of element-wise arithmetic operations and mathematical functions that are automatically JIT-compiled and fused for optimal performance. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); array A = randu(1000, 1000); array B = randu(1000, 1000); // Basic arithmetic (element-wise) array sum = A + B; array diff = A - B; array prod = A * B; array quot = A / B; // Mathematical functions array sine = sin(A); array cosine = cos(A); array tangent = tan(A); array exponential = exp(A); array logarithm = log(A); array sqroot = sqrt(A); array power = pow(A, 2.0f); array absolute = abs(A - 0.5f); // Hyperbolic functions array hyp_sin = sinh(A); array hyp_cos = cosh(A); // Comparison operations (returns boolean array) array mask = A > 0.5f; array equal = A == B; // Conditional selection array selected = select(mask, A, B); // where mask true, use A; else B // Clamping values array clamped = clamp(A, 0.2f, 0.8f); // Complex numbers array real_part = randu(10, 10); array imag_part = randu(10, 10); array complex_arr = complex(real_part, imag_part); array magnitude = abs(complex_arr); array phase = arg(complex_arr); af_print(sum); return 0; } ``` ## Matrix Operations and Linear Algebra (BLAS) ArrayFire provides high-performance BLAS operations including matrix multiplication, transpose, and various matrix decompositions through an intuitive API. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); array A = randu(4, 3, f32); array B = randu(3, 5, f32); // Matrix multiplication array C = matmul(A, B); // A * B array D = matmulNT(A, A); // A * A^T array E = matmulTN(A, A); // A^T * A array F = matmulTT(A, B); // A^T * B^T // Transpose array At = A.T(); // Transpose array Ah = A.H(); // Hermitian (conjugate transpose) // Dot product (for vectors) array v1 = randu(100); array v2 = randu(100); array dot_prod = dot(v1, v2); // Batch matrix multiplication array batch_A = randu(4, 4, 10); // 10 matrices of 4x4 array batch_B = randu(4, 4, 10); array batch_C = matmul(batch_A, batch_B); // Multiplies all 10 pairs // Matrix norms double frobenius = norm(A, AF_NORM_EUCLID); double l1_norm = norm(A, AF_NORM_MATRIX_1); double inf_norm = norm(A, AF_NORM_MATRIX_INF); printf("Frobenius norm: %f\n", frobenius); af_print(C); return 0; } ``` ## Linear Algebra Decompositions and Solvers (LAPACK) ArrayFire supports essential matrix decompositions including LU, QR, Cholesky, and SVD, along with linear system solvers for scientific computing applications. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); // SVD decomposition float h_buffer[] = {1, 4, 2, 5, 3, 6}; array in(2, 3, h_buffer); array u, s_vec, vt; svd(u, s_vec, vt, in); // Reconstruct original matrix array s_mat = diag(s_vec, 0, false); array reconstructed = matmul(u, s_mat, vt(seq(2), span)); af_print(s_vec); // Singular values af_print(u); // Left singular vectors af_print(vt); // Right singular vectors (transposed) // LU decomposition array A = randu(4, 4); array lower, upper, pivot; lu(lower, upper, pivot, A); // QR decomposition array Q, R, tau; qr(Q, R, tau, A); // Cholesky decomposition (for positive definite matrices) array sym = matmulNT(A, A) + af::identity(4, 4) * 4; // Make positive definite array L; int info = cholesky(L, sym, true); // Upper triangular // Solve linear system Ax = b array b = randu(4, 1); array x = solve(A, b); // Verify solution array residual = matmul(A, x) - b; printf("Residual norm: %e\n", norm(residual)); // Matrix inverse array A_inv = inverse(A); // Pseudo-inverse (Moore-Penrose) array rect = randu(5, 3); array pinv = pinverse(rect); // Determinant double det_val = det<double>(A); printf("Determinant: %f\n", det_val); // Matrix rank unsigned int r = rank(A); printf("Rank: %u\n", r); return 0; } ``` ## Signal Processing and FFT ArrayFire includes comprehensive signal processing functions with 1D, 2D, and 3D Fast Fourier Transforms, convolution, and correlation operations optimized for parallel execution. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); // 1D FFT array signal = randu(1024); array spectrum = fft(signal); array recovered = ifft(spectrum); // 2D FFT (for images) array image = randu(256, 256); array freq_domain = fft2(image); array spatial = ifft2(freq_domain); // 3D FFT array volume = randu(64, 64, 64); array vol_fft = fft3(volume); // FFT with normalization array normalized = fftNorm(signal, 1.0 / sqrt(1024.0)); // In-place FFT (must be complex input) array complex_signal = complex(signal, constant(0, signal.dims())); fftInPlace(complex_signal); // Real-to-complex FFT (optimized for real inputs) array rfft_result = fftR2C<1>(signal); // 1D Convolution array kernel = randu(5); array convolved = convolve1(signal, kernel); // 2D Convolution array img = randu(100, 100); array filter = gaussianKernel(5, 5, 1.0, 1.0); array blurred = convolve2(img, filter); // Separable convolution (more efficient) float h1[] = {1, 1, 1}; float h2[] = {-1, 0, 1}; array row_kernel(3, h1); array col_kernel(3, h2); array sep_conv = convolve(row_kernel, col_kernel, img); // Correlation array correlated = convolve2(img, flip(flip(filter, 0), 1)); // FIR filter array fir_coeffs = randu(10); array filtered = fir(fir_coeffs, signal); // IIR filter array a_coeffs = randu(5); array b_coeffs = randu(5); array iir_out = iir(b_coeffs, a_coeffs, signal); // Signal interpolation array positions = randu(100) * 1023; // Interpolation positions array interpolated = approx1(signal, positions, AF_INTERP_LINEAR); af_print(spectrum(seq(10))); // First 10 frequency components return 0; } ``` ## Image Processing ArrayFire provides a rich set of image processing functions including filtering, morphological operations, color space conversions, and geometric transformations. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); // Load image array img = loadImage("image.jpg", true); // true for color // Color space conversions array gray = colorSpace(img, AF_GRAY, AF_RGB); array hsv = colorSpace(img, AF_HSV, AF_RGB); // Image resizing array resized = resize(img, 200, 300, AF_INTERP_BILINEAR); array scaled = resize(0.5f, 0.5f, img); // Scale by factor // Image rotation array rotated = rotate(img, af::Pi / 4, true); // 45 degrees, crop // Geometric transformations array transform_matrix = randu(3, 2); // Affine transform array transformed = transform(img, transform_matrix); // Edge detection with Sobel array gx, gy; sobel(gx, gy, gray); array edges = hypot(gx, gy); // Gaussian blur array gauss_kernel = gaussianKernel(5, 5, 1.0, 1.0); array blurred = convolve(gray, gauss_kernel); // Median filter (noise removal) array denoised = medfilt(gray, 5, 5); // Bilateral filter (edge-preserving smoothing) array bilateral_out = bilateral(gray, 3.0f, 40.0f); // Morphological operations array kernel = constant(1, 5, 5); array dilated = dilate(gray, kernel); array eroded = erode(gray, kernel); array opened = erode(dilate(gray, kernel), kernel); array closed = dilate(erode(gray, kernel), kernel); // Histogram array hist = histogram(gray, 256, 0.0, 255.0); // Histogram equalization array equalized = histEqual(gray, hist); // Gradients array dx, dy; grad(dx, dy, gray); // Image derivatives using SAT (Summed Area Table) array sat = sat(gray); // Save image saveImage("output.jpg", resized); // Check if ImageIO is available if (isImageIOAvailable()) { printf("Image I/O is available\n"); } return 0; } ``` ## Computer Vision and Feature Detection ArrayFire includes computer vision functions for feature detection, extraction, and matching, enabling efficient implementation of vision algorithms on GPUs. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); array img = randu(480, 640); // Grayscale image // FAST corner detection features fast_feats = fast(img, 20.0f, 9, true, 0.05f, 3); printf("FAST features found: %zu\n", fast_feats.getNumFeatures()); // Get feature coordinates array x_coords = fast_feats.getX(); array y_coords = fast_feats.getY(); array scores = fast_feats.getScore(); // Harris corner detection features harris_feats = harris(img, 3, 0.04f, 500, 0.05f); // ORB feature detection and description features orb_feats; array orb_desc; orb(orb_feats, orb_desc, img, 20.0f, 500, 1.2f, 8, true); // SIFT feature detection (if available) features sift_feats; array sift_desc; sift(sift_feats, sift_desc, img, 6, 3, 0.04f, 10.0f, 1.6f, true, 1.0f, true); // Feature matching (Hamming distance for binary descriptors) array img2 = randu(480, 640); features orb_feats2; array orb_desc2; orb(orb_feats2, orb_desc2, img2, 20.0f, 500, 1.2f, 8, true); array idx, dist; hammingMatcher(idx, dist, orb_desc, orb_desc2, 0, 1); // Nearest neighbor matching array nn_idx, nn_dist; nearestNeighbour(nn_idx, nn_dist, sift_desc, sift_desc, 1, 1, AF_SSD); // Homography estimation (RANSAC) array src_pts = randu(2, 100); array dst_pts = src_pts + randn(2, 100) * 0.01f; array H, inliers; homography(H, inliers, src_pts, dst_pts, AF_HOMOGRAPHY_RANSAC, 3.0f, 1000); // Template matching array templ = img(seq(100, 150), seq(100, 150)); array match_result; // matchTemplate(match_result, img, templ, AF_SAD); af_print(x_coords(seq(10))); // First 10 x coordinates return 0; } ``` ## Statistics Functions ArrayFire provides comprehensive statistical functions for computing means, variances, correlations, and other statistical measures across array dimensions. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); array data = randu(1000, 100); // 1000 samples, 100 features // Mean along dimension array col_means = mean(data, 0); // Mean of each column array row_means = mean(data, 1); // Mean of each row // Weighted mean array weights = randu(1000, 1); array weighted_mean = mean(data, weights, 0); // Global mean (scalar result) double global_mean = mean<double>(data); // Variance array col_var = var(data, AF_VARIANCE_SAMPLE, 0); array pop_var = var(data, AF_VARIANCE_POPULATION, 0); // Standard deviation array col_stdev = stdev(data, AF_VARIANCE_SAMPLE, 0); // Mean and variance together (more efficient) array m, v; meanvar(m, v, data, array(), AF_VARIANCE_SAMPLE, 0); // Covariance matrix array X = randu(100, 10); array Y = randu(100, 10); array cov_matrix = cov(X, Y, false); // false = not biased // Correlation coefficient array corr = corrcoef(X, Y); // Median array col_median = median(data, 0); // Min and max array col_min = min(data, 0); array col_max = max(data, 0); // Min/max with indices array min_vals, min_idx; min(min_vals, min_idx, data, 0); // Global min/max (scalar) double global_min = min<double>(data); double global_max = max<double>(data); // Sum and product array col_sum = sum(data, 0); array col_prod = product(data, 0); // Cumulative operations array cum_sum = accum(data, 0); // Cumulative sum array scan_prod = scan(data, 0, AF_BINARY_MUL); // Cumulative product // Count non-zero elements unsigned count_nonzero = count<unsigned>(data > 0.5f); // Any and all (logical reductions) bool any_positive = anyTrue<bool>(data > 0); bool all_positive = allTrue<bool>(data > 0); printf("Global mean: %f\n", global_mean); printf("Global min: %f, max: %f\n", global_min, global_max); return 0; } ``` ## Random Number Generation ArrayFire provides flexible random number generation with support for multiple distributions and engine types, enabling reproducible simulations on parallel hardware. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); // Set global seed for reproducibility setSeed(12345); // Uniform distribution [0, 1) array uniform = randu(1000, 1000); // Normal distribution (mean=0, std=1) array normal = randn(1000, 1000); // Create custom random engine randomEngine engine(AF_RANDOM_ENGINE_MERSENNE_GP11213, 42); // Generate with custom engine array custom_uniform = randu(dim4(100, 100), f32, engine); array custom_normal = randn(dim4(100, 100), f32, engine); // Scale and shift for custom distributions array custom_uniform_range = uniform * 10 + 5; // [5, 15) array custom_normal_scaled = normal * 2 + 10; // mean=10, std=2 // Get/set engine type engine.setType(AF_RANDOM_ENGINE_PHILOX_4X32_10); randomEngineType rtype = engine.getType(); // Get/set seed engine.setSeed(9999); unsigned long long current_seed = engine.getSeed(); // Set default random engine type setDefaultRandomEngineType(AF_RANDOM_ENGINE_MERSENNE_GP11213); // Get default engine randomEngine default_engine = getDefaultRandomEngine(); // Integer random values array int_random = (randu(100, 100) * 100).as(s32); // 0-99 integers // Permutation (shuffle indices) array indices = range(dim4(100)); array shuffled = indices(randu(100).as(u32) % 100); printf("Current seed: %llu\n", current_seed); af_print(uniform(seq(5), seq(5))); // 5x5 sample return 0; } ``` ## Array Indexing and Slicing ArrayFire provides powerful indexing capabilities using sequences, spans, and index arrays for efficient data access and manipulation on parallel devices. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); array A = randu(10, 10, 3); // Single element access float val = A(0, 0, 0).scalar<float>(); // Row and column access array first_row = A.row(0); array last_col = A.col(end); // Multiple rows/columns array first_3_rows = A.rows(0, 2); array last_3_cols = A.cols(end - 2, end); // Slice access (3rd dimension) array first_slice = A.slice(0); array slices = A.slices(0, 1); // Using seq for ranges array sub = A(seq(2, 5), seq(3, 7), 0); // Rows 2-5, cols 3-7, slice 0 // Strided access array every_other_row = A(seq(0, end, 2), span); // Every other row // Using span for all elements array all_rows = A(span, 0); // All rows, first column // Boolean indexing array mask = A > 0.5f; array selected = A(mask); // Returns 1D array of selected elements // Index array array indices = (randu(5) * 9).as(s32); // Random indices 0-9 array indexed = A(indices, 0); // Select rows by indices // Assignment with indexing A(seq(0, 2), seq(0, 2), 0) = constant(0, 3, 3); // Set subarray to 0 A.row(5) = randu(1, 10, 3); // Replace row 5 A(A < 0.1f) = 0; // Set small values to 0 // Flip arrays array flipped_h = flip(A, 0); // Flip horizontally array flipped_v = flip(A, 1); // Flip vertically // Reshape arrays array B = randu(12); array reshaped = moddims(B, 3, 4); // Reshape to 3x4 array flat = flat(A); // Flatten to 1D // Join arrays array C = randu(10, 5); array D = randu(10, 5); array joined_cols = join(1, C, D); // Join along columns array joined_rows = join(0, C.T(), D.T()); // Join along rows // Tile (repeat) array array tiled = tile(C, 2, 3); // Repeat 2x in rows, 3x in cols // Reorder dimensions array reordered = reorder(A, 2, 0, 1); // Permute dimensions af_print(sub); return 0; } ``` ## Reduction Operations ArrayFire provides efficient parallel reduction operations for computing sums, products, and logical aggregations across specified dimensions. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); array A = randu(1000, 1000); // Sum reductions array col_sums = sum(A, 0); // Sum along rows (result: 1 x 1000) array row_sums = sum(A, 1); // Sum along columns (result: 1000 x 1) double total_sum = sum<double>(A); // Global sum (scalar) // Product reductions array col_prods = product(A, 0); double total_prod = product<double>(A(seq(10), seq(10))); // Small subarray // Min/Max reductions array col_mins = min(A, 0); array col_maxs = max(A, 0); double global_min = min<double>(A); double global_max = max<double>(A); // Min/Max with indices array min_vals, min_indices; min(min_vals, min_indices, A, 0); array max_vals, max_indices; max(max_vals, max_indices, A, 0); // Logical reductions array B = A > 0.5f; bool any_true = anyTrue<bool>(B); // Any element true? bool all_true = allTrue<bool>(B); // All elements true? array any_per_col = anyTrue(B, 0); // Any true per column array all_per_col = allTrue(B, 0); // All true per column // Count non-zero unsigned count = count<unsigned>(B); array counts_per_col = count(B, 0); // Scan operations (prefix sums) array prefix_sum = accum(A, 0); // Cumulative sum array prefix_prod = scan(A, 0, AF_BINARY_MUL); // Cumulative product array prefix_max = scan(A, 0, AF_BINARY_MAX); // Running maximum array prefix_min = scan(A, 0, AF_BINARY_MIN); // Running minimum // Exclusive scan (first element is identity) array excl_scan = scan(A, 0, AF_BINARY_ADD, false); // false = exclusive // Scan by key (segmented scan) array keys = (randu(1000) * 5).as(s32); // 5 different keys array data = randu(1000); array keyed_scan = scanByKey(keys, data, 0, AF_BINARY_ADD); // Reduce by key array unique_keys, reduced_vals; reduceByKey(unique_keys, reduced_vals, keys, data, 0, AF_BINARY_ADD); // Where (find indices of non-zero elements) array indices = where(A > 0.9f); // Set operations array set1 = (randu(100) * 50).as(s32); array set2 = (randu(100) * 50).as(s32); array unique_vals = setUnique(set1); array union_vals = setUnion(set1, set2); array intersect_vals = setIntersect(set1, set2); printf("Total sum: %f\n", total_sum); printf("Global min: %f, max: %f\n", global_min, global_max); printf("Count > 0.5: %u\n", count); return 0; } ``` ## Sorting and Ordering ArrayFire provides high-performance parallel sorting algorithms for ordering arrays along any dimension, with support for both values and index tracking. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); array A = randu(10, 5); // Sort along first dimension (columns) array sorted = sort(A, 0, true); // true = ascending array sorted_desc = sort(A, 0, false); // false = descending // Sort with indices array sorted_vals, indices; sort(sorted_vals, indices, A, 0, true); // Sort by keys (sort one array based on another) array keys = randu(100); array values = randu(100, 10); array sorted_keys, sorted_values; sort(sorted_keys, sorted_values, keys, values); // Get unique elements array unique_vals = setUnique(A); // Get unique elements with indices array unique_out, unique_indices, unique_inverse; setUnique(unique_out, unique_indices, unique_inverse, A); // Diff (differences between consecutive elements) array differences = diff1(A, 0); // First-order difference array diff2 = diff2(A, 0); // Second-order difference // Unwrap (convert to column-major sequence of patches) array img = randu(100, 100); array unwrapped = unwrap(img, 3, 3, 1, 1); // 3x3 patches, stride 1 // Wrap (inverse of unwrap) array wrapped = wrap(unwrapped, 100, 100, 3, 3, 1, 1); af_print(sorted); af_print(indices); return 0; } ``` ## Sparse Matrix Operations ArrayFire supports sparse matrix operations with CSR, CSC, and COO storage formats, enabling efficient computation with large sparse datasets. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); // Create sparse matrix from dense array dense = randu(100, 100); dense = dense * (dense > 0.9f); // Make sparse (~10% non-zero) array sparse_csr = sparse(dense, AF_STORAGE_CSR); array sparse_csc = sparse(dense, AF_STORAGE_CSC); array sparse_coo = sparse(dense, AF_STORAGE_COO); // Create sparse from components float values[] = {1, 2, 3, 4, 5}; int row_idx[] = {0, 0, 1, 2, 2}; int col_idx[] = {0, 2, 1, 0, 2}; array vals(5, values); array rows(5, row_idx); array cols(5, col_idx); array sp = sparse(3, 3, vals, rows, cols, AF_STORAGE_COO); // Convert between formats array csr_from_coo = sparseConvertTo(sp, AF_STORAGE_CSR); // Convert sparse to dense array back_to_dense = dense(sparse_csr); // Get sparse matrix info array sp_vals, sp_rows, sp_cols; af::storage stype; sparseGetInfo(sp_vals, sp_rows, sp_cols, stype, sparse_csr); // Get individual components array just_vals = sparseGetValues(sparse_csr); array just_rows = sparseGetRowIdx(sparse_csr); array just_cols = sparseGetColIdx(sparse_csr); dim_t nnz = sparseGetNNZ(sparse_csr); // Sparse matrix-vector multiplication array x = randu(100, 1); array y = matmul(sparse_csr, x); // Sparse matrix-matrix multiplication array B = randu(100, 10); array C = matmul(sparse_csr, B); printf("Non-zero elements: %lld\n", nnz); af_print(just_vals); return 0; } ``` ## Device Management and Memory ArrayFire provides functions for managing multiple devices, controlling memory allocation, and synchronizing computations across CPU and GPU. ```cpp #include <arrayfire.h> using namespace af; int main() { // Display system info af::info(); // Get number of available devices int num_devices = getDeviceCount(); printf("Number of devices: %d\n", num_devices); // Get current device int current_device = getDevice(); printf("Current device: %d\n", current_device); // Set device setDevice(0); // Check device capabilities bool has_double = isDoubleAvailable(0); bool has_half = isHalfAvailable(0); printf("Double precision: %s\n", has_double ? "yes" : "no"); printf("Half precision: %s\n", has_half ? "yes" : "no"); // Get device info char name[64], platform[64], toolkit[64], compute[64]; deviceInfo(name, platform, toolkit, compute); printf("Device: %s, Platform: %s\n", name, platform); // Memory management array A = randu(1000, 1000); // Synchronize (wait for all operations to complete) af::sync(); af::sync(0); // Sync specific device // Force evaluation of lazy expressions A.eval(); // Multiple array evaluation array B = sin(A); array C = cos(A); eval(B, C); // Evaluate both // Memory info size_t alloc_bytes, alloc_buffers, lock_bytes, lock_buffers; deviceMemInfo(&alloc_bytes, &alloc_buffers, &lock_bytes, &lock_buffers); printf("Allocated: %zu bytes in %zu buffers\n", alloc_bytes, alloc_buffers); // Manual memory management void* device_ptr = allocV2(1024 * sizeof(float)); // Allocate 1024 floats // ... use device_ptr ... freeV2(device_ptr); // Free memory // Garbage collection deviceGC(); // Free unused memory // Set memory step size setMemStepSize(1024 * 1024); // 1 MB minimum allocation // Copy data to host float* host_data = A.host<float>(); // ... use host_data ... freeHost(host_data); // Get device pointer (for interop) float* dev_ptr = A.device<float>(); A.unlock(); // Unlock after use // Lock array (prevent garbage collection) A.lock(); // ... array is protected ... A.unlock(); return 0; } ``` ## GFOR: Parallel Loop Execution ArrayFire's GFOR construct enables batched execution of operations across loop iterations, automatically parallelizing computations that would otherwise require sequential processing. ```cpp #include <arrayfire.h> using namespace af; int main() { af::setDevice(0); // Traditional sequential loop array A = randu(100, 100); array B = randu(100, 100); array results(100, 1); // Sequential (slow) for (int i = 0; i < 100; i++) { results(i) = sum<float>(A.col(i) * B.col(i)); } // Parallel with GFOR (fast) array results_gfor(100, 1); gfor (seq i, 100) { results_gfor(i) = sum(A.col(i) * B.col(i)); } // GFOR with batch function array batchFunc(const array& lhs, const array& rhs) { return sum(lhs * rhs); }; array batch_result = batchFunc(A, B, batchFunc); // Process multiple images in parallel array images = randu(256, 256, 100); // 100 images array processed(256, 256, 100); gfor (seq i, 100) { array img = images(span, span, i); array filtered = convolve2(img, gaussianKernel(5, 5)); processed(span, span, i) = filtered; } // Batch matrix operations array matrices = randu(4, 4, 100); // 100 4x4 matrices array determinants(100); gfor (seq i, 100) { determinants(i) = det<float>(matrices(span, span, i)); } af_print(results(seq(10))); af_print(results_gfor(seq(10))); return 0; } ``` ## Timing and Performance Measurement ArrayFire provides built-in timing utilities for measuring execution performance of GPU operations, essential for optimization and benchmarking. ```cpp #include <arrayfire.h> using namespace af; // Function to time void matmul_benchmark() { array A = randu(1000, 1000); array B = randu(1000, 1000); array C = matmul(A, B); C.eval(); // Force evaluation } void fft_benchmark() { array A = randu(1024, 1024); array B = fft2(A); B.eval(); } int main() { af::setDevice(0); af::info(); // Method 1: Using timer class timer::start(); array A = randu(1000, 1000); array B = randu(1000, 1000); array C = matmul(A, B); C.eval(); af::sync(); // Wait for completion double elapsed = timer::stop(); printf("Matrix multiply time: %.4f seconds\n", elapsed); // Method 2: Using timeit() for automatic averaging double matmul_time = timeit(matmul_benchmark); double fft_time = timeit(fft_benchmark); printf("Average matmul time: %.6f seconds\n", matmul_time); printf("Average FFT time: %.6f seconds\n", fft_time); // Benchmark different sizes printf("\nMatrix multiply benchmark:\n"); for (int n = 256; n <= 2048; n *= 2) { timer::start(); array X = randu(n, n); array Y = randu(n, n); for (int i = 0; i < 10; i++) { array Z = matmul(X, Y); Z.eval(); } af::sync(); double t = timer::stop() / 10.0; double gflops = (2.0 * n * n * n) / (t * 1e9); printf("Size %4d x %4d: %.4f ms, %.2f GFLOPS\n", n, n, t * 1000, gflops); } return 0; } ``` ## Backend Selection and Unified API ArrayFire supports multiple compute backends (CUDA, OpenCL, oneAPI, CPU) through a unified API, allowing seamless switching between backends without code changes. ```cpp #include <arrayfire.h> using namespace af; int main() { // Get available backends int backends = getAvailableBackends(); printf("Available backends:\n"); if (backends & AF_BACKEND_CUDA) printf(" - CUDA\n"); if (backends & AF_BACKEND_OPENCL) printf(" - OpenCL\n"); if (backends & AF_BACKEND_CPU) printf(" - CPU\n"); if (backends & AF_BACKEND_ONEAPI) printf(" - oneAPI\n"); // Get current backend af::Backend current = getActiveBackend(); printf("Current backend: %d\n", current); // Set backend (unified library) // setBackend(AF_BACKEND_CUDA); // Switch to CUDA // setBackend(AF_BACKEND_OPENCL); // Switch to OpenCL // setBackend(AF_BACKEND_CPU); // Switch to CPU // Set device for specific backend setDevice(0); // First device of current backend // Same code works on all backends array A = randu(1000, 1000); array B = randu(1000, 1000); timer::start(); array C = matmul(A, B); C.eval(); af::sync(); printf("Matmul time: %.4f seconds\n", timer::stop()); // Get backend-specific info af::info(); // Check LAPACK availability if (isLAPACKAvailable()) { printf("LAPACK is available\n"); array M = randu(100, 100); array U, S, Vt; svd(U, S, Vt, M); } return 0; } ``` ArrayFire is designed for high-performance computing applications where parallel processing is essential. Its primary use cases include scientific computing and numerical simulations, financial modeling and quantitative analysis, image and signal processing pipelines, machine learning and deep learning implementations, and real-time data analytics. The library excels in scenarios where operations on large multi-dimensional arrays can be parallelized across GPU cores, providing significant speedups over traditional CPU-based implementations. Integration with existing codebases is straightforward through ArrayFire's C and C++ APIs, with additional bindings available for Python, Rust, Julia, and other languages. The unified backend architecture allows applications to target CUDA GPUs, OpenCL devices, Intel oneAPI accelerators, or multi-core CPUs with the same code. ArrayFire's JIT compilation engine automatically optimizes chains of operations, reducing memory transfers and kernel launch overhead. For performance-critical applications, developers can directly access device pointers for interoperability with existing CUDA/OpenCL code, making ArrayFire an ideal foundation for building high-performance numerical computing applications that need to scale across diverse hardware platforms.