### Complete OpenCL Initialization Example Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/configuration.md A comprehensive example demonstrating the initialization process for the OpenCL wrapper. It includes setting build-time macros, environment variables, and device discovery. ```cpp // Configure before includes #define WORKGROUP_SIZE 128 #define PTX // Generate Nvidia PTX #define LOG // Log compilation output #define UTILITIES_REGEX #define UTILITIES_FILE #define CONSOLE_WIDTH 120 #include "opencl.hpp" int main() { // Set environment variables before device discovery set_environment_variable((char*)"GPU_SINGLE_ALLOC_PERCENT=100"); // Initialize devices vector devices = get_devices(); // Proceed with device and kernel initialization Device device(select_device_with_most_flops(devices)); // ... return 0; } ``` -------------------------------- ### Minimal OpenCL-Wrapper Example Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/README.md A basic example demonstrating device selection, memory allocation, kernel creation, execution, and result retrieval using the OpenCL-Wrapper library. ```cpp #include "opencl.hpp" int main() { // Select the fastest available device Device device(select_device_with_most_flops()); // Allocate device and host memory Memory input(device, 1024); Memory output(device, 1024); // Initialize host memory for (ulong i = 0; i < 1024; i++) { input[i] = (float)i; } // Create and execute kernel Kernel kernel(device, 1024, "my_kernel", input, output); input.write_to_device(); kernel.run(); output.read_from_device(); // Access results println("First result: " + to_string(output[0])); return 0; } ``` -------------------------------- ### Setup Termux for OpenCL Development on Android Source: https://github.com/projectphysx/opencl-wrapper/blob/master/README.md Installs Termux and essential development tools like clang, git, and make for OpenCL development on Android devices. ```bash apt update && apt upgrade -y apt install -y clang git make ``` -------------------------------- ### Complete Usage Example: Vector Addition Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md A full example demonstrating the creation of OpenCL C code, device selection, memory management, kernel creation, execution, data transfer, and result verification. ```cpp #include "opencl.hpp" string get_opencl_c_code() { return R( // ############## OpenCL C Code ############## kernel void vector_add(global float* A, global float* B, global float* C) { const uint i = get_global_id(0); C[i] = A[i] + B[i]; } ); } int main() { Device device(select_device_with_most_flops()); const ulong N = 1024; Memory A(device, N); Memory B(device, N); Memory C(device, N); // Initialize for (ulong i = 0; i < N; i++) { A[i] = (float)i; B[i] = (float)i * 2.0f; } // Create kernel Kernel add(device, N, "vector_add", A, B, C); // Synchronize and execute A.write_to_device(); B.write_to_device(); add.run(); C.read_from_device(); // Verify results for (ulong i = 0; i < N; i++) { assert(C[i] == A[i] + B[i]); } println("Success!"); return 0; } ``` -------------------------------- ### Install Nvidia GPU Drivers and OpenCL Runtime on Linux Source: https://github.com/projectphysx/opencl-wrapper/blob/master/README.md Installs Nvidia GPU drivers and the OpenCL runtime on Ubuntu Noble. This command installs the necessary packages and a specific driver version. ```bash sudo apt update && sudo apt upgrade -y sudo apt install -y g++ git make ocl-icd-libopencl1 ocl-icd-opencl-dev nvidia-driver-580 sudo shutdown -r now ``` -------------------------------- ### Method Chaining Example Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Demonstrates fluent interface chaining for multiple kernel operations, including setting ranges, parameters, and executing kernels sequentially. Also shows asynchronous execution with synchronization. ```cpp Device device(select_device_with_most_flops()); Memory input(device, 1024); Memory output(device, 1024); Kernel compute(device, 1024, "my_kernel", input, output); // Chain operations compute.set_ranges(512).run() .set_parameters(1, output2).run() .add_parameters(threshold).run(); // Async with proper synchronization compute.enqueue_run().enqueue_run().finish_queue(); ``` -------------------------------- ### Install PoCL OpenCL Runtime on Linux Source: https://github.com/projectphysx/opencl-wrapper/blob/master/README.md Installs the Portable Computing Language (PoCL) OpenCL runtime on Ubuntu Noble. This is an alternative for CPU-based OpenCL execution. ```bash sudo apt update && sudo apt upgrade -y sudo apt install -y g++ git make ocl-icd-libopencl1 ocl-icd-opencl-dev pocl-opencl-icd ``` -------------------------------- ### Graceful Degradation Example Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/errors.md Demonstrates selecting a default device or choosing the best available device based on performance metrics like FLOPs. This approach prevents application failure if specific device criteria aren't met. ```cpp vector devices = get_devices(false); // false suppresses console output Device_Info selected = devices[0]; if (devices.size() > 1) { selected = select_device_with_most_flops(devices); } Device device(selected); ``` -------------------------------- ### Install Intel CPU OpenCL Runtime (Option 1) on Linux Source: https://github.com/projectphysx/opencl-wrapper/blob/master/README.md Installs the Intel CPU Runtime for OpenCL using the oneAPI DPC++ Compiler and oneTBB. This option requires downloading specific versions and configuring the system's OpenCL vendor paths. ```bash export OCLV="oclcpuexp-2025.21.10.0.10_160000_rel" export TBBV="oneapi-tbb-2023.0.0" sudo apt update && sudo apt upgrade -y sudo apt install -y g++ git make ocl-icd-libopencl1 ocl-icd-opencl-dev sudo mkdir -p ~/cpurt /opt/intel/${OCLV} /etc/OpenCL/vendors /etc/ld.so.conf.d sudo wget -P ~/cpurt https://github.com/intel/llvm/releases/download/2025-WW45/${OCLV}.tar.gz sudo wget -P ~/cpurt https://github.com/uxlfoundation/oneTBB/releases/download/v2023.0.0/${TBBV}-lin.tgz sudo tar -zxvf ~/cpurt/${OCLV}.tar.gz -C /opt/intel/${OCLV} sudo tar -zxvf ~/cpurt/${TBBV}-lin.tgz -C /opt/intel echo /opt/intel/${OCLV}/x64/libintelocl.so | sudo tee /etc/OpenCL/vendors/intel_expcpu.icd echo /opt/intel/${OCLV}/x64 | sudo tee /etc/ld.so.conf.d/libintelopenclexp.conf sudo ln -sf /opt/intel/${TBBV}/lib/intel64/gcc4.8/libtbb.so /opt/intel/${OCLV}/x64 sudo ln -sf /opt/intel/${TBBV}/lib/intel64/gcc4.8/libtbbmalloc.so /opt/intel/${OCLV}/x64 sudo ln -sf /opt/intel/${TBBV}/lib/intel64/gcc4.8/libtbb.so.12 /opt/intel/${OCLV}/x64 sudo ln -sf /opt/intel/${TBBV}/lib/intel64/gcc4.8/libtbbmalloc.so.2 /opt/intel/${OCLV}/x64 sudo ldconfig -f /etc/ld.so.conf.d/libintelopenclexp.conf sudo rm -r ~/cpurt ``` -------------------------------- ### Install Intel GPU Drivers and OpenCL Runtime on Linux Source: https://github.com/projectphysx/opencl-wrapper/blob/master/README.md Installs the OpenCL runtime for Intel GPUs on Linux systems with Kernel 6.2 or later. This command ensures necessary OpenCL ICD loaders and Intel-specific OpenCL drivers are installed. ```bash sudo apt update && sudo apt upgrade -y sudo apt install -y g++ git make ocl-icd-libopencl1 ocl-icd-opencl-dev intel-opencl-icd sudo usermod -a -G render $(whoami) sudo shutdown -r now ``` -------------------------------- ### Capability Checking Example Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/errors.md Checks for specific device capabilities like FP64 support before execution. If a capability is missing, it logs a warning and suggests fallback strategies, such as using FP32 precision or switching to the CPU. ```cpp Device_Info info = select_device_with_most_flops(); if (!info.is_fp64_capable) { println("Warning: FP64 not available, using FP32 precision"); // Adjust algorithm or fall back to CPU } if (info.uses_ram) { println("Using zero-copy buffers for iGPU"); } ``` -------------------------------- ### OpenCL Vector Addition Example Source: https://github.com/projectphysx/opencl-wrapper/blob/master/README.md Demonstrates a complete OpenCL vector addition using the wrapper library. It shows device selection, memory allocation, kernel creation, data initialization, kernel execution, and result retrieval. ```c++ #include "opencl.hpp" int main() { Device device(select_device_with_most_flops()); // compile OpenCL C code for the fastest available device const uint N = 1024u; // size of vectors Memory A(device, N); // allocate memory on both host and device Memory B(device, N); Memory C(device, N); Kernel add_kernel(device, N, "add_kernel", A, B, C); // kernel that runs on the device for(uint n=0u; n devices = get_devices(); } catch (...) { // Handle: Install appropriate drivers from README } ``` -------------------------------- ### OpenCL Event Usage Example Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/types.md Illustrates the usage of cl::Event for synchronization with various OpenCL operations like kernel execution and memory transfers. ```cpp cl::Event // OpenCL event for synchronization between commands ``` -------------------------------- ### Install AMD GPU Drivers and OpenCL Runtime on Linux Source: https://github.com/projectphysx/opencl-wrapper/blob/master/README.md Installs AMD GPU drivers and the OpenCL runtime on Ubuntu Noble. Ensure you have the necessary build tools and dependencies. ```bash sudo apt update && sudo apt upgrade -y sudo apt install -y g++ git make ocl-icd-libopencl1 ocl-icd-opencl-dev mkdir -p ~/amdgpu wget -P ~/amdgpu https://repo.radeon.com/amdgpu-install/25.35.1/ubuntu/noble/amdgpu-install_7.2.1.70201-1_all.deb sudo apt install -y ~/amdgpu/amdgpu-install*.deb sudo amdgpu-install -y --usecase=graphics,rocm,opencl --opencl=rocr sudo usermod -a -G render,video $(whoami) rm -r ~/amdgpu sudo shutdown -r now ``` -------------------------------- ### Get Command-Line Arguments Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-utilities.md Retrieves command-line arguments passed to the program. It takes the standard `argc` and `argv` from `main` and returns them as a `vector` for easier manipulation. ```cpp inline vector get_main_arguments(int argc, char* argv[]) ``` ```cpp int main(int argc, char* argv[]) { vector args = get_main_arguments(argc, argv); for (uint i = 0; i < (uint)args.size(); i++) { println("Arg " + to_string(i) + ": " + args[i]); } return 0; } ``` -------------------------------- ### Clock Class for Timing Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-utilities.md A `Clock` class to measure elapsed time. Instantiate it to start the timer, and call `stop()` to get the duration in seconds. The timer can be reset using `start()`. ```cpp class Clock { public: inline Clock(); // constructor starts timer inline void start(); // reset timer inline double stop() const; // elapsed time in seconds }; Clock timer; // ... do work ... double elapsed = timer.stop(); println("Elapsed: " + to_string(elapsed, 3) + " seconds"); ``` -------------------------------- ### Method Chaining for Kernel Configuration Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/README.md Demonstrates method chaining for configuring and executing a kernel. This allows for a more concise and readable way to set kernel ranges, parameters, and then run the kernel. ```cpp kernel.set_ranges(2048) .set_parameters(1, output2) .run(); ``` -------------------------------- ### Compile OpenCL Wrapper on Windows with Visual Studio Source: https://github.com/projectphysx/opencl-wrapper/blob/master/README.md Instructions for setting up Visual Studio Community for C++ development, including the necessary components for building the OpenCL wrapper. ```powershell Add-WindowsCapability -Online -Name "Desktop.Cpp.Tools~~~~0.0.1.0" winget install --id Microsoft.VisualStudio.2022.Community --override "--add Microsoft.VisualStudio.Workload.NativeDesktop;--includeOptional Microsoft.VisualStudio.Component.VC.v142,Microsoft.VisualStudio.Component.Windows10SDK.10240" ``` -------------------------------- ### Device Constructor with Info and Code Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device.md Initializes a Device object with specific device information and OpenCL C source code. The code is automatically compiled for the target device, and device capabilities are enabled. ```APIDOC ## Device(const Device_Info& info, const string& opencl_c_code) ### Description Initializes a Device by compiling the OpenCL C code for the specified device. Automatically enables device capabilities (FP64, FP16, INT64 atomics) and applies device-specific workarounds. Prints device information and compilation status to console. Fails with error if code compilation fails. ### Parameters #### Path Parameters - **info** (const Device_Info&): Yes - Device information structure containing device capabilities and context - **opencl_c_code** (const string&): No - OpenCL C source code as a string; automatically compiled for the device (Default: get_opencl_c_code()) ### Throws/Rejects - Error if `info` refers to an uninitialized device - Error if OpenCL C code compilation fails with details about compilation errors ### Example ```cpp #include "opencl.hpp" int main() { Device_Info device_info = select_device_with_most_flops(); Device device(device_info); // uses default get_opencl_c_code() // Device is now ready to use return 0; } ``` ``` -------------------------------- ### Get Memory Dimensions Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-memory.md Returns the number of dimensions for the memory object. ```cpp const uint dimensions() const ``` ```cpp Memory mem(device, 1024, 3); assert(mem.dimensions() == 3); ``` -------------------------------- ### Device Class Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/README.md Manages OpenCL device initialization and program compilation. ```APIDOC ## Device Class ### Description Manages OpenCL device initialization, program compilation, and execution synchronization. ### Constructors - `Device()` Default constructor, creates an uninitialized `Device` object. - `Device(const Device_Info& info, const string& opencl_c_code)` Initializes the `Device` with specified `Device_Info` and OpenCL C source code for program compilation. ### Methods - `bool is_initialized() const` Checks if the device has been successfully initialized and is ready for use. - `void barrier()` Introduces a synchronization barrier, ensuring all preceding commands complete before proceeding. - `void finish_queue()` Blocks execution until all previously enqueued commands on the device's command queue have finished. ### Members - `Device_Info info` A structure containing the capabilities and specifications of the managed OpenCL device. ``` -------------------------------- ### Get Memory Range Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-memory.md Calculates the total element count by multiplying length and dimensions. ```cpp const ulong range() const ``` ```cpp Memory mem(device, 1024, 3); assert(mem.range() == 3072); // 1024 * 3 ``` -------------------------------- ### Initialize Device with Capability Check Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device-selection.md Initializes a Device object after checking for specific capabilities, such as double-precision floating-point support (FP64). This ensures the selected device meets the application's requirements before proceeding. ```cpp Device_Info info = select_device_with_most_flops(); if (!info.is_fp64_capable) { print_error("This application requires double precision (FP64)"); } Device device(info); ``` -------------------------------- ### Clock Class Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-utilities.md A class for measuring elapsed time. It allows starting, resetting, and stopping a timer. ```APIDOC ## Timing ### Clock Class #### `Clock()` Constructor that starts the timer upon object creation. #### `start()` Resets the timer to zero. #### `stop() const` Returns the elapsed time in seconds since the timer was started or last reset. ``` -------------------------------- ### Handle Multiple Devices Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device-selection.md Demonstrates how to retrieve a list of all available OpenCL devices, display their names and indices, and then select a specific device by its ID. This is useful for applications that need to manage or choose among multiple available devices. ```cpp vector devices = get_devices(); println("Available devices:"); for (uint i = 0; i < (uint)devices.size(); i++) { println(" [" + to_string(i) + "] " + devices[i].name); } uint selected_id = 0; // default to first device // user could input different ID here Device device(select_device_with_id(selected_id, devices)); ``` -------------------------------- ### Get Host Buffer Pointer Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-memory.md Provides a pointer to the host buffer for direct memory access. ```cpp T* const data() const T* const data() const ``` ```cpp float* host_data = mem.data(); host_data[0] = 3.14f; ``` -------------------------------- ### Compile and Run on Linux/macOS/Android Source: https://github.com/projectphysx/opencl-wrapper/blob/master/README.md Make the script executable and then run it to compile and execute the project on Linux, macOS, or Android. Ensure g++ supports C++17. ```bash chmod +x make.sh ./make.sh ``` -------------------------------- ### Get Memory Length Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-memory.md Returns the primary dimension size (number of elements per dimension). ```cpp const ulong length() const ``` ```cpp Memory mem(device, 1024, 3); assert(mem.length() == 1024); ``` -------------------------------- ### Initialize Device with Most FLOPS Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device-selection.md A concise way to initialize a Device object by selecting the device with the highest floating-point operations per second (FLOPS). This is useful for performance-critical applications. ```cpp Device device(select_device_with_most_flops()); ``` -------------------------------- ### Get Memory Capacity Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-memory.md Returns the total capacity in bytes, calculated from length, dimensions, and the size of the data type. ```cpp const ulong capacity() const ``` ```cpp Memory mem(device, 1024, 3); assert(mem.capacity() == 3072 * sizeof(float)); ``` -------------------------------- ### Kernel Constructor with Full Parameters Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Creates a Kernel object and links parameters to the kernel function. Automatically calculates global range and accepts Memory objects or scalar constants. Ensure device is initialized, kernel name exists, parameter types match, and workgroup size is valid. ```cpp template inline Kernel( const Device& device, const ulong N, const uint workgroup_size, const string& name, const T&... parameters ) ``` ```cpp Device device(select_device_with_most_flops()); Memory input(device, 1024); Memory output(device, 1024); uint threshold = 128u; // Create kernel with parameters: input buffer, output buffer, constant Kernel kernel(device, 1024, 64, "process_data", input, output, threshold); ``` -------------------------------- ### Get Current Global Work Size Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Returns the current global work size N configured for the kernel. ```cpp const ulong range() const ``` ```cpp Kernel kernel(device, 1024, "compute", buffer); assert(kernel.range() == 1024); ``` -------------------------------- ### Clock Class Definition Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/types.md Defines a Clock class for timing operations. It includes methods to start, stop, and reset the timer. ```cpp class Clock { public: Clock(); // constructor starts timer void start(); // reset and restart timer double stop() const; // return elapsed time in seconds }; ``` -------------------------------- ### Discover and List OpenCL Devices Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device-selection.md Discovers all available OpenCL devices across all platforms. Optionally prints a list of discovered devices to the console. Aborts if no devices are found. ```cpp #include "opencl.hpp" int main() { vector devices = get_devices(); println("Found " + to_string((uint)devices.size()) + " devices"); for (uint i = 0; i < (uint)devices.size(); i++) { println(" [" + to_string(i) + "] " + devices[i].name + " (" + to_string(devices[i].tflops, 1) + " TFLOPs/s)"); } return 0; } ``` -------------------------------- ### set_parameters Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Updates parameters starting at a specified position. This is useful for changing buffer or constant values without recreating the kernel. ```APIDOC ## Kernel::set_parameters ### Description Updates parameters starting at specified position. Useful for changing buffer or constant values without recreating the kernel. ### Method `Kernel& set_parameters(const uint starting_position, const T&... parameters)` ### Parameters #### Path Parameters - **starting_position** (const uint) - Required - Position where to start replacing parameters - **parameters** (const T&...) - Required - New parameters to set ### Request Example ```cpp Kernel kernel(device, 1024, "compute", input, output, 0u); // Change output and threshold without recreating kernel kernel.set_parameters(1, output2, 100u).run(); ``` ``` -------------------------------- ### Initialize Device with OpenCL C Code Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device.md Initializes a Device object by compiling provided OpenCL C source code. Automatically enables device capabilities and applies workarounds. Use this constructor when you have custom OpenCL C code to compile. ```cpp #include "opencl.hpp" int main() { Device_Info device_info = select_device_with_most_flops(); Device device(device_info); // uses default get_opencl_c_code() // Device is now ready to use return 0; } ``` -------------------------------- ### Operator Overload for Host Buffer Pointer Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-memory.md An alternative syntax to get the host buffer pointer using the function call operator. ```cpp T* const operator()() const T* const operator()() const ``` ```cpp mem()[0] = 3.14f; // same as mem.data()[0] ``` -------------------------------- ### Initialize OpenCL Device Before Use Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/errors.md Ensure the `Device` object is properly initialized by calling its constructor with a valid device selection. Avoid using the default constructor without subsequent initialization to prevent 'Uninitialized Device Used' errors. ```cpp Device device; // ❌ Wrong - uninitialized if (!device.is_initialized()) { device = Device(select_device_with_most_flops()); // ✓ Initialize } ``` -------------------------------- ### Get OpenCL Context Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device.md Returns the underlying OpenCL context object. This is useful for advanced OpenCL operations or integration with other OpenCL code. ```cpp Device device(select_device_with_most_flops()); cl::Context context = device.get_cl_context(); ``` -------------------------------- ### Device_Info Constructor Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device-info.md Internal constructor used by get_devices() to query and initialize device information. Most users should utilize device selection functions instead of calling this directly. ```cpp Device_Info(const cl::Device& cl_device, const cl::Context& cl_context, const uint id) ``` ```cpp // Usually created through device selection functions Device_Info info = select_device_with_most_flops(); ``` -------------------------------- ### Get Number of Kernel Parameters Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Retrieves the total count of parameters associated with a kernel. This is useful for verifying the kernel's parameter configuration. ```cpp uint get_number_of_parameters() const ``` ```cpp Kernel kernel(device, 1024, "compute", input, output, threshold); assert(kernel.get_number_of_parameters() == 3); ``` -------------------------------- ### Get OpenCL Command Queue Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device.md Returns the command queue associated with this device. This is useful for advanced queue operations or direct OpenCL API calls. ```cpp Device device(select_device_with_most_flops()); cl::CommandQueue queue = device.get_cl_queue(); ``` -------------------------------- ### Enable File I/O Utilities Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/configuration.md Defines this macro to enable file I/O functions like read_file(), write_file(), and create_folder(). Requires the header. The find_files() function requires C++17. ```cpp #define UTILITIES_FILE #include "utilities.hpp" string code = read_file("kernel.hpp"); write_file("output.log", "Results"); ``` -------------------------------- ### Configure Compile-Time Options for OpenCL Wrapper Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/README.md Set preprocessor directives before including 'opencl.hpp' to customize build-time behavior. Options include setting the default workgroup size, enabling PTX assembly generation for Nvidia, and logging kernel compilation. ```cpp #define WORKGROUP_SIZE 128 // Default 64 (optimal for AMD) #define PTX // Generate Nvidia PTX assembly #define LOG // Log kernel compilation to bin/kernel.log #include "opencl.hpp" ``` -------------------------------- ### Initialize Device with Default OpenCL C Code Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device.md Initializes a Device object using the default OpenCL C code. This constructor is suitable when the default compiled code is sufficient for your needs. ```cpp Device_Info device_info = select_device_with_most_flops(); Device device(device_info); ``` -------------------------------- ### Get Compiled OpenCL Program Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device.md Returns the compiled OpenCL program object containing all kernels. This can be used to extract program information or create additional kernels. ```cpp Device device(select_device_with_most_flops()); cl::Program program = device.get_cl_program(); ``` -------------------------------- ### is_initialized Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device.md Checks if the Device has been properly initialized. This is true only if the Device was constructed with a Device_Info parameter. ```APIDOC ## is_initialized() ### Description Checks whether the Device has been properly initialized. Returns true only if the Device was constructed with a Device_Info parameter. ### Method `const` member function ### Parameters None ### Return Type bool ### Example ```cpp Device device1; // default constructor if (!device1.is_initialized()) { device1 = Device(select_device_with_most_flops()); } if (device1.is_initialized()) { println("Device is ready"); } ``` ``` -------------------------------- ### Set Kernel Parameters Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Updates parameters of an existing kernel without recreating it. Useful for changing buffer or constant values. Parameters are set starting at a specified position. ```cpp template inline Kernel& set_parameters( const uint starting_position, const T&... parameters ) ``` ```cpp Kernel kernel(device, 1024, "compute", input, output, 0u); // Change output and threshold without recreating kernel kernel.set_parameters(1, output2, 100u).run(); ``` -------------------------------- ### Multi-dimensional Array (Array of Structures) Allocation Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/README.md Allocates memory for a multi-dimensional array, suitable for representing structures or matrices on the device. This example shows allocating space for 512 positions, each with x, y, and z coordinates. ```cpp Memory positions(device, 512, 3); // 512 positions with x,y,z for (ulong i = 0; i < 512; i++) { positions.x[i] = 1.0f; positions.y[i] = 2.0f; positions.z[i] = 3.0f; } ``` -------------------------------- ### Memory Constructor with Allocation Options Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-memory.md Use this constructor to allocate host and/or device memory with optional multi-dimensional support. You can specify initial values and whether to use zero-copy buffers. ```cpp template Memory( Device& device, const ulong N, const uint dimensions=1u, const bool allocate_host=true, const bool allocate_device=true, const T value=(T)0, const bool allow_zero_copy=true ) ``` ```cpp Device device(select_device_with_most_flops()); // Simple 1D buffer Memory vector(device, 1024); // 2D array (1024 elements, 3 dimensions = 3072 total) Memory array(device, 1024, 3); // Host-only buffer (no device allocation) Memory cpu_data(device, 512, 1, true, false); // Device-only buffer (no host allocation) Memory gpu_only(device, 2048, 1, false, true); // Initialize with non-zero value Memory initialized(device, 256, 1, true, true, 3.14f); ``` -------------------------------- ### String Manipulation Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-utilities.md Offers utilities for string manipulation, including functions to get the string length, check for substring containment (exact match or any from a list), and verify if a string begins or ends with a specific substring. ```APIDOC ## String Manipulation ### Searching and Matching ```cpp inline uint length(const string& s) // string length inline bool contains(const string& s, const string& match) // substring search inline bool contains_any(const string& s, const vector& matches) inline bool begins_with(const string& s, const string& match) inline bool ends_with(const string& s, const string& match) ``` **Example:** ```cpp if (contains(device_name, "Radeon")) { /* AMD GPU */ } if (begins_with(version, "3.0")) { /* OpenCL 3.0 */ } ``` ``` -------------------------------- ### Device_Info Constructor Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device-info.md The Device_Info constructor is used internally to initialize a Device_Info object with details from an OpenCL device. It queries capabilities, calculates metrics, and applies patches. Most users should rely on device selection functions instead of calling this directly. ```APIDOC ## Device_Info Constructor ### Signature `Device_Info(const cl::Device& cl_device, const cl::Context& cl_context, const uint id)` ### Description Internal constructor called by `get_devices()`. Queries device capabilities from OpenCL, calculates performance metrics, and applies vendor-specific patches. Most users should use device selection functions instead of calling this directly. ### Parameters #### Path Parameters - **cl_device** (const cl::Device&): OpenCL device object from cl::Platform::getDevices() - **cl_context** (const cl::Context&): OpenCL context containing the device - **id** (const uint): Unique device ID (assigned sequentially by get_devices()) ### Request Example ```cpp // Usually created through device selection functions Device_Info info = select_device_with_most_flops(); ``` ## Default Constructor ### Signature `Device_Info()` ### Description Default constructor creating an uninitialized Device_Info. ``` -------------------------------- ### Validate Kernel Parameter Binding Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/errors.md Instantiate a kernel with its parameters and then use `get_number_of_parameters()` to verify that the number of bound parameters matches the kernel function's signature. This helps prevent parameter mismatch errors. ```cpp Kernel k(device, 1024, "compute", buf1, buf2, 42u); println("Kernel parameters: " + to_string(k.get_number_of_parameters())); // Should match number of arguments to kernel function ``` -------------------------------- ### Check Device Initialization Status Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device.md Use this method to determine if a Device object was constructed with valid device information. It's useful for ensuring a device is ready before performing operations or for initializing it if it wasn't done during construction. ```cpp bool is_initialized() const ``` ```cpp Device device1; // default constructor if (!device1.is_initialized()) { device1 = Device(select_device_with_most_flops()); } if (device1.is_initialized()) { println("Device is ready"); } ``` -------------------------------- ### Retrieve and Print Device Information Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/errors.md Use the `get_devices()` function to retrieve a list of available OpenCL devices and then iterate through them to print detailed information for each device using `print_device_info()`. ```cpp vector devices = get_devices(); // prints device list for (const auto& d : devices) { print_device_info(d); // prints detailed device info } ``` -------------------------------- ### select_device_with_id Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device-selection.md Selects and returns the OpenCL device with the specified ID (index). IDs are assigned sequentially starting from 0 during device enumeration. This function allows explicit device selection by index and will abort with an error if the provided ID is out of range. ```APIDOC ## select_device_with_id ### Description Selects and returns the OpenCL device with the specified ID (index). IDs are assigned sequentially starting from 0 during device enumeration. This function allows explicit device selection by index and will abort with an error if the provided ID is out of range. ### Parameters #### Path Parameters - **id** (uint) - Required - Device ID (index from 0). #### Query Parameters - **devices** (vector&) - Optional - Vector of available devices to select from. Defaults to calling `get_devices()`. ### Return Type Device_Info ### Throws/Rejects - Error if id is greater than or equal to the number of available devices. ### Example ```cpp #include "opencl.hpp" int main() { vector devices = get_devices(); if (devices.size() > 1) { // Use second device Device_Info device_info = select_device_with_id(1, devices); Device device(device_info); println("Selected device 1: " + device.info.name); } return 0; } ``` ``` -------------------------------- ### get_devices Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device-selection.md Discovers all available OpenCL devices across all platforms and returns them as a vector. Optionally prints device information to the console. It also sets environment variables for optimal buffer allocation on AMD and Intel GPUs. The function aborts if no devices are found. ```APIDOC ## get_devices ### Description Discovers all available OpenCL devices across all platforms and returns them as a vector. Optionally prints device information to the console. It also sets environment variables for optimal buffer allocation on AMD and Intel GPUs. The function aborts if no devices are found. ### Parameters #### Query Parameters - **print_info** (boolean) - Optional - If true, prints list of all discovered devices to console. Defaults to true. ### Return Type vector ### Throws/Rejects - Error if no OpenCL devices are available on the system; displays driver installation instructions for Windows or Linux. ### Example ```cpp #include "opencl.hpp" int main() { vector devices = get_devices(); println("Found " + to_string((uint)devices.size()) + " devices"); for (uint i = 0; i < (uint)devices.size(); i++) { println(" [" + to_string(i) + "] " + devices[i].name + " (" + to_string(devices[i].tflops, 1) + " TFLOPs/s)"); } return 0; } ``` ``` -------------------------------- ### Buffer Size Validation Example Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/errors.md Validates that the required buffer size does not exceed the device's capacity. It calculates a maximum safe size based on available memory and global buffer limits, providing a basis for error handling or problem scaling. ```cpp Device device(info); const ulong max_safe_size = min(device.info.max_global_buffer, device.info.memory / 2); if (needed_size > max_safe_size) { println("Problem size exceeds device capacity"); // Chunk computation or reduce problem size } ``` -------------------------------- ### Memory Constructor (External Host Buffer) Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-memory.md Creates a Memory object wrapping an externally-allocated host buffer. Useful for integrating with existing data structures. Automatically copies data to device on construction if allocate_device is true. ```APIDOC ## Memory(Device& device, const ulong N, const uint dimensions, T* const host_buffer, const bool allocate_device, const bool allow_zero_copy) ### Description Creates Memory object wrapping an externally-allocated host buffer. Useful for integrating with existing data structures. Automatically copies data to device on construction if allocate_device is true. ### Parameters #### Path Parameters - **device** (Device&) - Required - Initialized Device - **N** (const ulong) - Required - Length of buffer - **dimensions** (const uint) - Required - Number of dimensions (1-16) - **host_buffer** (T* const) - Required - External host buffer pointer (user-managed memory) - **allocate_device** (const bool) - Optional - If true, allocate device memory and copy external buffer to device (Default: true) - **allow_zero_copy** (const bool) - Optional - If true, use zero-copy for perfectly aligned external buffers (Default: true) ### Throws/Rejects - Error if host_buffer is nullptr and allocate_device is true - Error if dimensions count exceeds 16 ### Example ```cpp Device device(select_device_with_most_flops()); float host_array[1024]; // ... populate array ... Memory mem(device, 1024, 1, host_array); // Data automatically copied to device and kept in sync ``` ``` -------------------------------- ### Kernel Constructor (Full) Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Creates a Kernel object, links parameters to the kernel function, and configures work ranges. It automatically calculates the global range and accepts Memory objects and scalar constants as parameters. ```APIDOC ## Kernel(const Device& device, const ulong N, const uint workgroup_size, const string& name, const T&... parameters) ### Description Creates a Kernel object and links parameters to the kernel function. Automatically calculates global range as smallest multiple of workgroup_size that is ≥ N. Parameters can be Memory objects of any type or scalar constants (int, float, uint, etc.). ### Parameters #### Path Parameters - **device** (const Device&) - Required - Initialized Device containing compiled kernels - **N** (const ulong) - Required - Global work size (number of work items to execute) - **workgroup_size** (const uint) - Required - Local workgroup size (threads per workgroup) - **name** (const string&) - Required - Name of OpenCL kernel function in compiled code - **parameters** (const T&...) - Required - Variadic: Memory objects and scalar constants in kernel argument order ### Request Example ```cpp Device device(select_device_with_most_flops()); Memory input(device, 1024); Memory output(device, 1024); uint threshold = 128u; // Create kernel with parameters: input buffer, output buffer, constant Kernel kernel(device, 1024, 64, "process_data", input, output, threshold); ``` ``` -------------------------------- ### File I/O Functions Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-utilities.md Functions for interacting with the file system, including creating directories, files, reading, and writing content. Requires `UTILITIES_FILE` macro. ```APIDOC ## File I/O Functions ### File Operations #### `create_folder(const string& path)` Creates a new directory at the specified path. #### `create_file_extension(const string& filename, const string& extension)` Creates a file with a specified extension. #### `read_file(const string& filename)` Reads the entire content of a file into a string. #### `write_file(const string& filename, const string& content="")` Writes the given content to a file, overwriting it if it exists. #### `find_files(const string& path, const string& extension=".*")` Finds all files within a directory matching a given extension pattern (requires C++17 and `UTILITIES_NO_CPP17` not defined). ``` -------------------------------- ### run Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Executes the kernel synchronously (blocking until completion). Enqueues kernel execution and waits for completion. Returns *this for chaining. ```APIDOC ## Kernel::run ### Description Executes kernel synchronously (blocking until completion). Enqueues kernel execution and waits for completion. Returns *this for chaining. ### Method `Kernel& run(const uint t=1u, const vector* event_waitlist=nullptr, Event* event_returned=nullptr)` ### Parameters #### Path Parameters - **t** (const uint) - Optional - Default: 1u - Number of times to execute kernel - **event_waitlist** (const vector*) - Optional - Default: nullptr - Events to wait for before execution - **event_returned** (Event*) - Optional - Default: nullptr - Pointer to store completion event ### Request Example ```cpp Device device(select_device_with_most_flops()); Memory data(device, 1024); Kernel kernel(device, 1024, "process", data); kernel.run(); // execute once kernel.run(10); // execute 10 times in sequence kernel.run().run(); // execute twice with chaining ``` ``` -------------------------------- ### Memory Constructor (Default) Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-memory.md Allocates memory on host and/or device with optional multi-dimensional support. For multi-dimensional buffers (dimensions > 1), auxiliary pointers (x, y, z, w, s0-sF) are initialized for array-of-structures access. ```APIDOC ## Memory(Device& device, const ulong N, const uint dimensions, const bool allocate_host, const bool allocate_device, const T value, const bool allow_zero_copy) ### Description Allocates memory on host and/or device with optional multi-dimensional support. For multi-dimensional buffers (dimensions > 1), auxiliary pointers (x, y, z, w, s0-sF) are initialized for array-of-structures access. ### Parameters #### Path Parameters - **device** (Device&) - Required - Initialized Device to allocate buffers on - **N** (const ulong) - Required - Length of buffer (number of elements) - **dimensions** (const uint) - Optional - Number of dimensions for multi-dimensional array (1-16) (Default: 1u) - **allocate_host** (const bool) - Optional - If true, allocate host memory for CPU access (Default: true) - **allocate_device** (const bool) - Optional - If true, allocate device memory for GPU access (Default: true) - **value** (const T) - Optional - Initial value for all elements (Default: (T)0) - **allow_zero_copy** (const bool) - Optional - If true, use zero-copy buffers on CPUs/iGPUs when possible (Default: true) ### Throws/Rejects - Error if device is not initialized - Error if N × dimensions = 0 - Error if memory allocation exceeds device capacity ### Example ```cpp Device device(select_device_with_most_flops()); // Simple 1D buffer Memory vector(device, 1024); // 2D array (1024 elements, 3 dimensions = 3072 total) Memory array(device, 1024, 3); // Host-only buffer (no device allocation) Memory cpu_data(device, 512, 1, true, false); // Device-only buffer (no host allocation) Memory gpu_only(device, 2048, 1, false, true); // Initialize with non-zero value Memory initialized(device, 256, 1, true, true, 3.14f); ``` ``` -------------------------------- ### print_device_info Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-device-selection.md Prints a formatted table of device specifications to the console. This function is automatically called by the Device constructor and displays key information such as device ID, name, vendor, driver version, OpenCL version, compute units, memory, and cache sizes. ```APIDOC ## void print_device_info(const Device_Info& d) ### Description Prints a formatted table of device specifications to the console. Automatically called by the Device constructor. Shows device ID, name, vendor, driver version, OpenCL version, compute units, memory, and cache sizes. ### Parameters #### Path Parameters - **d** (const Device_Info&) - Required - Device information to print ### Request Example ```cpp #include "opencl.hpp" int main() { Device_Info info = select_device_with_most_flops(); print_device_info(info); // Prints formatted device table return 0; } ``` ### Output Example ``` |----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | Radeon RX 7900 XTX | | Device Vendor | AMD | | Device Driver | 6440.0 (Linux) | | OpenCL Version | OpenCL C 3.0 | | Compute Units | 96 at 2500 MHz (6144 cores, 30.72 TFLOPs/s) | | Memory, Cache | 24576 MB VRAM, 16384 KB global / 64 KB local | | Buffer Limits | 24576 MB global, 64 KB constant | |----------------'------------------------------------------------------------| ``` ``` -------------------------------- ### Kernel Constructor (Default Workgroup Size) Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Delegates to the full constructor, using a default WORKGROUP_SIZE (64) for the local workgroup size. ```APIDOC ## Kernel(const Device& device, const ulong N, const string& name, const T&... parameters) ### Description Delegating constructor using default WORKGROUP_SIZE (64). Equivalent to calling the full constructor with workgroup_size=WORKGROUP_SIZE. ### Parameters #### Path Parameters - **device** (const Device&) - Required - Initialized Device containing compiled kernels - **N** (const ulong) - Required - Global work size (number of work items to execute) - **name** (const string&) - Required - Name of OpenCL kernel function in compiled code - **parameters** (const T&...) - Required - Variadic: Memory objects and scalar constants in kernel argument order ### Request Example ```cpp // Uses default workgroup size of 64 Kernel kernel(device, 1024, "process", input, output, threshold); ``` ``` -------------------------------- ### Kernel Constructor with Default Workgroup Size Source: https://github.com/projectphysx/opencl-wrapper/blob/master/_autodocs/api-reference-kernel.md Delegates to the full constructor using a default workgroup size of 64. Useful for simpler kernel initializations. ```cpp template inline Kernel( const Device& device, const ulong N, const string& name, const T&... parameters ) ``` ```cpp // Uses default workgroup size of 64 Kernel kernel(device, 1024, "process", input, output, threshold); ```