### Queue Recording Example with Memory Pool Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Illustrates queue recording with a user-provided memory pool. This example shows how to create a memory pool with specific properties (like zero initialization) and use it with async_malloc_from_pool, managing dependencies via events. ```cpp using namespace sycl; using namespace sycl::ext::oneapi::experimental; void* Ptr = nullptr; size_t AllocSize = 1024; queue Queue{syclContext, syclDevice}; // Device memory pool with zero init property sycl_ext::memory_pool MemPool{syclContext, syclDevice, usm::alloc::device, {sycl_ext::property::memory_pool::zero_init{}}}; Graph.begin_recording(Queue); // Add an async_malloc node and capture the returned pointer in Ptr, // zero_init property and usm::alloc kind of pool will be respected but pool // is otherwise ignored event AllocEvent = Queue.submit([&](handler& CGH) { Ptr = sycl_ext::async_malloc_from_pool(CGH, AllocSize, MemPool); }); // Use Ptr in another graph node which depends on AllocNode event OtherEventA = Queue.submit([&](handler& CGH) { CGH.depends_on(AllocEvent); // Do something with Ptr CGH.parallel_for(...); }); // Use Ptr in another node which has an indirect dependency on AllocNode event OtherEventB = Queue.submit([&](handler& CGH) { CGH.depends_on(OtherEventA); // Do something with Ptr CGH.parallel_for(...); }); // Free Ptr, indicating it is no longer in use at this point in the graph, // with a dependency on any leaf nodes using Ptr Queue.submit([&](handler& CGH) { CGH.depends_on(OtherEventB); sycl_ext::async_free(CGH, Ptr); }); Graph.end_recording(Queue); ``` -------------------------------- ### Windows: Install OpenCL CPU RT and TBB Source: https://intel.github.io/llvm/GetStartedGuide Installs the OpenCL CPU runtime and TBB redistributable on Windows by running the install.bat script. Requires Administrator privileges. ```batch # Install OpenCL CPU RT # Answer N for ICD records cleanup c:\oclcpu_rt_\install.bat oneapi-tbb-\redist\intel64\vc14 ``` -------------------------------- ### Windows: Install OpenCL FPGA Emulation RT and TBB Source: https://intel.github.io/llvm/GetStartedGuide Installs the OpenCL FPGA emulation runtime and TBB redistributable on Windows by running the install.bat script. Requires Administrator privileges. ```batch # Install OpenCL FPGA emulation RT # Answer Y to clean previous OCL_ICD_FILENAMES configuration and ICD records cleanup c:\oclfpga_rt_\install.bat oneapi-tbb-\redist\intel64\vc14 ``` -------------------------------- ### SYCL Graph Host-Task Example Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Demonstrates how to correctly integrate host work into a SYCL Graph by using host-tasks. This ensures host operations are executed as dependencies within the graph, preventing issues with repeated graph execution. ```cpp Queue.submit([&](sycl::handler&CGH){ // Do some host work here to prepare for the kernel to be executed do_some_host_work(); CGH.parallel_for(...); }); Graph.begin_recording(Queue); sycl::eventHostWorkEvent=Queue.submit([&](sycl::handler&CGH){ CGH.host_task([=](){ do_some_host_work(); }); }); Queue.submit([&](sycl::handler&CGH){ CGH.depends_on(HostWorkEvent); CGH.parallel_for(...); }); Graph.end_recording(Queue); ``` -------------------------------- ### Linux DPC++ Workspace Setup Source: https://intel.github.io/llvm/GetStartedGuide Sets up the DPC++ workspace on Linux by defining the DPCPP_HOME environment variable, creating the directory, and navigating into it. ```bash export DPCPP_HOME=~/sycl_workspace mkdir $DPCPP_HOME cd $DPCPP_HOME git ``` -------------------------------- ### Dot Product with Explicit Graph Creation Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Demonstrates creating a SYCL command graph explicitly to perform a dot product operation. It shows how to add nodes with dependencies and execute the graph. ```C++ #include #include intmain(){ namespacesycl_ext=sycl::ext::oneapi::experimental; constsize_tn=10; floatalpha=1.0f; floatbeta=2.0f; floatgamma=3.0f; sycl::queueq; sycl_ext::command_graphg(q.get_context(),q.get_device()); float*dotp=sycl::malloc_shared(1,q); float*x=sycl::malloc_device(n,q); float*y=sycl::malloc_device(n,q); float*z=sycl::malloc_device(n,q); // Add commands to the graph to create the following topology. // // i // / \ // a b // \ / // c // init data on the device autonode_i=g.add([&](sycl::handler&h){ h.parallel_for(n,[=](sycl::id<1>it){ constsize_ti=it[0]; x[i]=1.0f; y[i]=2.0f; z[i]=3.0f; }); }); autonode_a=g.add([&](sycl::handler&h){ h.parallel_for(sycl::range<1>{n},[=](sycl::id<1>it){ constsize_ti=it[0]; x[i]=alpha*x[i]+beta*y[i]; }); },{sycl_ext::property::node::depends_on(node_i)}); autonode_b=g.add([&](sycl::handler&h){ h.parallel_for(sycl::range<1>{n},[=](sycl::id<1>it){ constsize_ti=it[0]; z[i]=gamma*z[i]+beta*y[i]; }); },{sycl_ext::property::node::depends_on(node_i)}); autonode_c=g.add( [&](sycl::handler&h){ h.single_task([=](){ for(size_ti=0;i(); if(IH.ext_codeplay_has_graph()){ autoNativeGraph= IH.ext_codeplay_get_native_graph(); // Start capture stream calls into graph cuStreamBeginCaptureToGraph(NativeStream,NativeGraph,nullptr, nullptr,0, CU_STREAM_CAPTURE_MODE_GLOBAL); myNativeLibraryCall(NativeStream); // Stop capturing stream calls into graph cuStreamEndCapture(NativeStream,&NativeGraph); }else{ myNativeLibraryCall(NativeStream); } }); }); ``` -------------------------------- ### Diamond Dependency with Queue Recording Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Illustrates using SYCL queue recording to build a command graph with a diamond dependency pattern. It covers recording, ending recording, finalizing, and executing the graph. ```C++ usingnamespacesycl; namespacesycl_ext=sycl::ext::oneapi::experimental; queueq{default_selector{}}; // Lifetime of buffers must exceed the lifetime of graphs they are used in. bufferbufferA{dataA.data(),range<1>{elements}}; bufferA.set_write_back(false); bufferbufferB{dataB.data(),range<1>{elements}}; bufferB.set_write_back(false); bufferbufferC{dataC.data(),range<1>{elements}}; bufferC.set_write_back(false); { // New object representing graph of command-groups sycl_ext::command_graphgraph(q.get_context(),q.get_device(), {sycl_ext::property::graph::assume_buffer_outlives_graph{}}); // `q` will be put in the recording state where commands are recorded to // `graph` rather than submitted for execution immediately. graph.begin_recording(q); // Record commands to `graph` with the following topology. // // increment_kernel // / \ // A->/ A->\ // / // add_kernel subtract_kernel // \ / // B->\ C->/ // \ // decrement_kernel q.submit([&](handler&cgh){ autoData=bufferA.get_access(cgh); cgh.parallel_for(range<1>(elements), [=](item<1>id){pData[id]++;}); }); q.submit([&](handler&cgh){ autoData1=bufferA.get_access(cgh); autoData2=bufferB.get_access(cgh); cgh.parallel_for(range<1>(elements), [=](item<1>id){pData2[id]+=pData1[id];}); }); q.submit([&](handler&cgh){ autoData1=bufferA.get_access(cgh); autoData2=bufferC.get_access(cgh); cgh.parallel_for( range<1>(elements),[=](item<1>id){ pData2[id]-=pData1[id]; }); }); q.submit([&](handler&cgh){ autoData1=bufferB.get_access(cgh); autoData2=bufferC.get_access(cgh); cgh.parallel_for(range<1>(elements),[=](item<1>id){ pData1[id]--; pData2[id]--; }); }); // queue `q` will be returned to the executing state where commands are // submitted immediately for extension. graph.end_recording(); // Finalize the modifiable graph to create an executable graph that can be // submitted for execution. autoexec_graph=graph.finalize(); // Execute graph q.submit([&](handler&cgh){ cgh.ext_oneapi_graph(exec_graph); }).wait(); } // Check output using host accessors host_accessorhostAccA(bufferA); host_accessorhostAccB(bufferB); host_accessorhostAccC(bufferC); ... ``` -------------------------------- ### DPC++ Frontend Device Configuration YAML Example Source: https://intel.github.io/llvm/design/DeviceConfigFile An example of a YAML file used by the DPC++ frontend to describe new targets. It includes aspects, sub-group sizes, and toolchain information. The frontend parses this file to update target information. ```yaml intel_gpu_skl: aspects: [aspect_name1, aspect_name2] may_support_other_aspects: true/false sub-group-sizes: [1, 2, 4, 8] aot-toolchain: ocloc aot-toolchain-options: -device skl ``` -------------------------------- ### SYCLcompat Kernel Launch Example Source: https://intel.github.io/llvm/syclcompat/README Demonstrates how to use the SYCLcompat launch interface to execute a device function like vectorAdd. It shows launching with dim3 and sycl::nd_range, highlighting the internal kernel lambda construction. ```cpp void vectorAdd(const float* A, const float* B, float* C, int n); // Using dim3 syclcompat::launch(blocksPerGrid, threadsPerBlock, d_A, d_B, d_C, n); // Using sycl::nd_range auto range = sycl::nd_range<3>{blocksPerGrid * threadsPerBlock, threadsPerBlock}; syclcompat::launch(range, d_A, d_B, d_C, n); ``` -------------------------------- ### clang-offload-packager Usage Example Source: https://intel.github.io/llvm/design/OffloadDesign Demonstrates the command-line usage of the `clang-offload-packager` tool for bundling device binaries. It specifies the input file, target triple, and offloading kind. ```bash clang-offload-packager --image=file=,triple=,kind= ``` -------------------------------- ### Dynamic Device Code Loading Example Source: https://intel.github.io/llvm/design/SharedLibraries An example demonstrating how a device binary image can be loaded into memory using a dlopen-like API for dynamic symbol resolution. This API is currently a TODO item as the SYCL standard does not yet define it. ```c++ // suppose, mylib.spv defines SYCL_EXTERNAL function foo, then this call: device_imageimg=device_dlopen("mylib.spv"); // will make foo available for dynamic symbol resolution. If any subsequent // JIT compilations try to compile device code with external reference to // foo, it can now be resolved following the resolution mechanism described // in this doc, and JIT compilation will succeed. ``` -------------------------------- ### SYCL Device Code Compilation Example Source: https://intel.github.io/llvm/-docs/index This snippet provides an example of SYCL device code compilation within the DPC++ framework. It links to the relevant section in the Users Manual for detailed instructions and context. ```APIDOC Users Manual - Example: SYCL device code compilation - Description: Provides an example of how to compile SYCL device code using the DPC++ compiler. - Related Sections: - Users Manual: https://intel.github.io/llvm-docs/UsersManual.html - Getting Started with oneAPI DPC++: https://intel.github.io/llvm-docs/GetStartedGuide.html ``` -------------------------------- ### Concurrent SYCL Graph Execution with Pre-finalized Graphs Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Shows how to achieve concurrent SYCL graph execution by pre-finalizing multiple graphs for different parameter sets. This avoids host synchronization between executions and can improve device occupancy. ```cpp namespace sycl_ext = sycl::ext::oneapi::experimental; sycl_ext::command_graph ExecutableGraph = ModifiableGraph.finalize(); sycl_ext::command_graph ExecutableGraphOtherParams = ModifiableGraphWithNewParams.finalize(); Queue.ext_oneapi_graph(ExecutableGraph); // We can now execute the second graph with no implicit dependency Queue.ext_oneapi_graph(ExecutableGraphOtherParams); // Repeatedly executing the graphs will create dependencies on their individual // preceeding executions, but not on each other. Queue.ext_oneapi_graph(ExecutableGraph); Queue.ext_oneapi_graph(ExecutableGraphOtherParams); ``` -------------------------------- ### SYCL Native CPU Code Coverage Example Source: https://intel.github.io/llvm/design/SYCLNativeCPU Example usage for enabling LLVM's source-based code coverage for SYCL Native CPU, allowing coverage testing across device and host code. ```Shell =native_cpu\vector-add.exe llvm-profdata\vector-add.exe=foo.profdata ``` -------------------------------- ### Querying Target Information Source: https://intel.github.io/llvm/design/DeviceConfigFile Example C++ code demonstrating how to query the TargetTable for specific target information, such as aspects and subgroup sizes. ```C++ DeviceConfigFile::TargetInfoinfo=DeviceConfigFile::targets.find("TargetA"); if(info==DeviceConfigFile::targets.end()){ /* Target not found */ ... }else{ autoaspects=info.aspects; automaySupportOtherAspects=info.maySupportOtherAspects; autosubGroupSizes=info.subGroupSizes; ... ``` -------------------------------- ### GPU Target Compilation Example Source: https://intel.github.io/llvm/design/OffloadDesign Demonstrates how to pass specific options to the OpenCL Offline Compiler (OCLOC) for GPU targets using the `-Xsycl-target-backend` and `--gpu-tool-arg` options. ```bash # Example with multiple GPU targets and specific options -fsycl -fsycl-targets=spir64_gen,intel_gpu_skl -Xsycl-target-backend=spir64_gen \"-device pvc -options -extraopt_pvc\" -Xsycl-target-backend=intel_gpu_skl \"-options -extraopt_skl\" ``` -------------------------------- ### Device Link Compilation Flow Example Source: https://intel.github.io/llvm/design/CompilerAndRuntimeDesign Illustrates a three-step compilation process using `-fsycl-link` to separate device and host linking, optimizing for cases where device code compilation is time-consuming. ```shell # Step 1: Device link (can be time-consuming) dev_a.cpp dev_b.cpp -fsycl-link -c -o dev_image.o # Step 2: Host Compile host_a.cpp -c -o host_a.o host_b.cpp -c -o host_b.o # Step 3: Linking dev_image.o host_a.o host_b.o -o executable ``` -------------------------------- ### DPC++ SYCL Graph Usage Guide Source: https://intel.github.io/llvm/-docs/index A guide detailing the usage of SYCL Graph within DPC++. This includes examples and explanations for leveraging SYCL Graph for efficient parallel programming. ```APIDOC SYCL Graph Usage Guide and Examples - Description: Comprehensive guide on using SYCL Graph with DPC++, including usage patterns and practical examples. - Link: https://intel.github.io/llvm-docs/syclgraph/SYCLGraphUsageGuide.html ``` -------------------------------- ### Discovering SYCL Devices with sycl-ls Source: https://intel.github.io/llvm/MultiTileCardWithLevelZero Demonstrates how to use the 'sycl-ls' tool to discover available SYCL devices, including CPUs and GPUs managed by different backends. ```bash sycl-ls ``` -------------------------------- ### Handling Library Warmups in SYCL Graph Recording Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Illustrates a scenario where a library function performs a warmup on its first execution. This example shows how to prevent such warmups from being captured in a SYCL graph by manually triggering the warmup before recording. ```cpp // Hypothetical library function void SomeLibrary::Operation(sycl::Queue Queue){ static bool IsFirstTime=true; if(IsFirstTime){ // Warmup by launching kernel once do_warmup(Queue); // Execute the actual operation execute_operation(Queue); IsFirstTime=false; } else{ execute_operation(Queue); } } // SYCL Application Code Graph.begin_recording(Queue); // do_warmup() will be captured here and executed every time the graph is // executed in future which is undesirable. SomeLibrary::Operation(Queue); Graph.end_recording(Queue); ``` -------------------------------- ### Creating Sub-devices from a Root Device Source: https://intel.github.io/llvm/MultiTileCardWithLevelZero Demonstrates how to partition a root device into sub-devices based on affinity domains using the `create_sub_devices` SYCL API. ```cpp try{ vectorSubDevices=RootDevice.create_sub_devices< sycl::info::partition_property::partition_by_affinity_domain>( sycl::info::partition_affinity_domain::next_partitionable); } ``` -------------------------------- ### Original Native Command Invocation (CUDA Example) Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Shows the original way to invoke a native library call using ext_codeplay_enqueue_native_command without SYCL-Graph compatibility. This code will schedule asynchronous work immediately if the queue is in a recording state. ```cpp q.submit([&](sycl::handler&CGH){ CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handleIH){ autoNativeStream=IH.get_native_queue(); myNativeLibraryCall(NativeStream); }); }); ``` -------------------------------- ### Explicit Graph Example with async_malloc Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Demonstrates explicit graph creation using SYCL's experimental oneAPI extensions. It shows how to add nodes for asynchronous memory allocation (async_malloc) and deallocation (async_free), managing dependencies between these operations within the graph. ```cpp using namespace sycl; using namespace sycl::ext::oneapi::experimental; void* Ptr = nullptr; size_t AllocSize = 1024; // Add an async_malloc node and capturing the returned pointer in Ptr auto AllocNode = Graph.add([&](handler& CGH) { Ptr = sycl_ext::async_malloc(CGH, usm::alloc::device, AllocSize); }); // Use Ptr in another graph node which depends on AllocNode auto OtherNodeA = Graph.add(..., {property::graph::depends_on{AllocNode}}); // Use Ptr in another node which has an indirect dependency on AllocNode auto OtherNodeB = Graph.add(..., {property::graph::depends_on{OtherNodeA}}); // Free Ptr, indicating it is no longer in use at this point in the graph, // with a dependency on any leaf nodes using Ptr Graph.add([&](handler& CGH) { sycl_ext::async_free(CGH, Ptr); }, {property::graph::depends_on{OtherNodeB}}); ``` -------------------------------- ### Build DPC++ Toolchain with HIP AMD Support Source: https://intel.github.io/llvm/GetStartedGuide Instructions for building the DPC++ toolchain with support for AMD HIP accelerators. This involves enabling the HIP backend during the build configuration. ```bash # Example command (actual command may vary based on build system and options) cmake -DDPCT_ENABLE_HIP=ON -DHIP_ROOT_DIR=/opt/rocm .. make ``` -------------------------------- ### Run DPC++ E2E Tests Source: https://intel.github.io/llvm/GetStartedGuide Instructions on how to execute the end-to-end (E2E) tests for the DPC++ toolchain. These tests verify the functionality of the compiler and runtime. ```bash # Example command (actual command may vary based on test runner and configuration) ./bin/dpcpp --test=e2e ``` -------------------------------- ### Dynamic Command Groups with Dynamic Parameters in SYCL Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Demonstrates creating and updating a SYCL graph with a dynamic command group that utilizes dynamic parameters. This example shows how to update a dynamic parameter and switch between different command group functions within the graph, then execute the updated graph. ```C++ using namespacesycl; namespacesycl_ext=sycl::ext::oneapi::experimental; size_tN=1024; queueQueue{}; autoMyContext=Queue.get_context(); autoMyDevice=Queue.get_device(); sycl_ext::command_graphGraph{MyContext,MyDevice}; int*PtrA=malloc_device(N,Queue); int*PtrB=malloc_device(N,Queue); // Kernels loaded from kernel bundle conststd::vectorBuiltinKernelIds= MyDevice.get_info(); kernel_bundleMyBundle= get_kernel_bundle(MyContext,{MyDevice},BuiltinKernelIds); kernelBuiltinKernelA=MyBundle.get_kernel(BuiltinKernelIds[0]); kernelBuiltinKernelB=MyBundle.get_kernel(BuiltinKernelIds[1]); // Create a dynamic parameter with an initial value of PtrA sycl_ext::dynamic_parameterDynamicPointerArg{Graph,PtrA}; // Create command groups for both kernels which use DynamicPointerArg autoCgfA=[&](handler&cgh){ cgh.set_arg(0,DynamicPointerArg); cgh.parallel_for(range{N},BuiltinKernelA); }; autoCgfB=[&](handler&cgh){ cgh.set_arg(0,DynamicPointerArg); cgh.parallel_for(range{N/2},BuiltinKernelB); }; // Construct a dynamic command-group with CgfA as the active cgf (index 0). autoDynamicCG=sycl_ext::dynamic_command_group(Graph,{CgfA,CgfB}); // Create a dynamic command-group graph node. autoDynamicCGNode=Graph.add(DynamicCG); autoExecGraph=Graph.finalize(sycl_ext::property::graph::updatable{}); // The graph will execute CgfA with PtrA. Queue.ext_oneapi_graph(ExecGraph).wait(); //Update DynamicPointerArg with a new value DynamicPointerArg.update(PtrB); // Sets CgfB as active in the dynamic command-group (index 1). DynamicCG.set_active_index(1); // Calls update to update the executable graph node with the changes to // DynamicCG and DynamicPointerArg. ExecGraph.update(DynamicCGNode); // The graph will execute CgfB with PtrB. Queue.ext_oneapi_graph(ExecGraph).wait(); ``` -------------------------------- ### Build Doxygen Documentation Source: https://intel.github.io/llvm/GetStartedGuide Builds the Doxygen documentation for the DPC++ toolchain. Requires doxygen, graphviz, and sphinx. The documentation is generated in the `$DPCPP_HOME/llvm/build/tools/sycl/doc/html` directory. ```cmake cmake -DSYCL_BUILD_DOCS=ON cmake --build . --target doxygen-sycl ``` -------------------------------- ### SYCL Device Assertion Example Source: https://intel.github.io/llvm/design/Assert Demonstrates the use of the standard C++ assert API within SYCL device code. When an assertion fails (e.g., Item[0] % 2 is false), it triggers a call to std::abort() on the host side. The example includes necessary headers and a basic SYCL queue setup. ```c++ #include #include using namespacesycl; voiduser_func(item<2>Item){ assert((Item[0]%2)&&“Nil”); } intmain(){ queueQ; Q.submit([&](handler&CGH){ CGH.parallel_for(range<2>{N,M},[=](item<2>It){ do_smth(); user_func(It); do_smth_else(); }); }); Q.wait(); std::cout<<“Oneshouldn’tseethismessage.“; return0; } ``` -------------------------------- ### Concurrent SYCL Graph Execution with Updates Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Demonstrates how to execute the same SYCL graph concurrently by updating it with new parameters. This method involves a host synchronization when the graph is updated. ```cpp namespace sycl_ext = sycl::ext::oneapi::experimental; sycl_ext::command_graph ExecutableGraph = ModifiableGraph.finalize(); Queue.ext_oneapi_graph(ExecutableGraph); // Updating the graph here to use new memory, this forces a host synchronization ExecutableGraph.update(ModifiableGraphWithNewParams); // Re-execute the update graph Queue.ext_oneapi_graph(ExecutableGraph); ``` -------------------------------- ### SYCL Graph-Owned Memory Allocation Example Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Demonstrates allocating, using, and freeing temporary memory within a SYCL graph. This pattern is useful for libraries that need specialized SYCL kernels and want to integrate with SYCL graphs for lifetime management and memory reuse. It shows the use of `sycl_ext::async_malloc` and `sycl_ext::async_free` with proper event dependencies. ```cpp using namespace sycl; // Library code, this example is assuming an out of order SYCL queue void launchLibraryKernel(queue& SyclQueue) { size_t TempMemSize = 1024; void* Ptr = nullptr; // Get a pointer to some temporary memory for use in the kernel // This call creates an allocation node in the graph if this call is being // recorded. event AllocEvent = SyclQueue.submit([&](handler& CGH) { Ptr = sycl_ext::async_malloc(CGH, usm::alloc::device, TempMemSize); }); // Submit the actual library kernel event KernelEvent = SyclQueue.submit([&](handler& CGH) { // Mark the allocation as a dependency so that the temporary memory // is available CGH.depends_on(AllocEvent); // Submit a kernel that uses the temp memory in Ptr CGH.parallel_for(...); }); // Free the memory back to the pool or graph, indicating that it is free to // be re-used. Memory will not actually be released back to the OS. SyclQueue.submit([&](handler& CGH) { // Mark the kernel as a dependency before freeing CGH.depends_on(KernelEvent); sycl_ext::async_free(CGH, Ptr); }); } // Application code void recordLibraryCall(queue& SyclQueue, sycl_ext::command_graph& Graph) { Graph.begin_recording(SyclQueue); // Call into library to record queue commands to the graph launchLibraryKernel(SyclQueue); Graph.end_recording(SyclQueue); } ``` -------------------------------- ### Compile DPC++ Toolchain Source: https://intel.github.io/llvm/GetStartedGuide This snippet shows the command to compile the DPC++ toolchain after configuration, typically using a `compile.py` script. ```bash CC=gcc CXX=g++ $DPCPP_HOME/llvm/buildbot/compile.py ``` -------------------------------- ### Linux: Configure Library Paths Source: https://intel.github.io/llvm/GetStartedGuide Configures system-wide library paths to include the DPC++ runtime libraries, requiring sudo access. ```bash echo| echo| ``` -------------------------------- ### Dynamic Parameter Update with Accessors Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Illustrates how to use `sycl_ext::dynamic_parameter` with SYCL accessors for updating graph node arguments. This method is useful when the handler is not directly available outside the command-group scope. ```cpp sycl::bufferbufferA{...}; sycl::bufferbufferB{...}; // Create graph dynamic parameter using a placeholder accessor, since the // sycl::handler is not available here outside of the command-group scope. sycl_ext::dynamic_parameterdynParamAccessor(myGraph,bufferA.get_access()); sycl_ext::nodekernelNode=myGraph.add([&](handler&cgh){ // Require the accessor contained in the dynamic paramter cgh.require(dynParamAccessor); // Set the arg on the kernel using the dynamic parameter directly cgh.set_args(dynParamAccessor); cgh.parallel_for(range{n},builtinKernel); }); ... // Update the dynamic parameter with a placeholder accessor from bufferB instead dynParamAccessor.update(bufferB.get_access()); ``` -------------------------------- ### Linux: Extract OpenCL FPGA Emulation RT Source: https://intel.github.io/llvm/GetStartedGuide Commands to create a directory and extract the OpenCL FPGA emulation runtime archive on a Linux system. ```bash mkdir cd ``` -------------------------------- ### In-Order Queue Recording with Event-less async_malloc Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Demonstrates queue recording using an in-order queue and event-less asynchronous memory allocation functions. This simplifies dependency management by leveraging the in-order execution property of the queue. ```cpp using namespace sycl; using namespace sycl::ext::oneapi::experimental; void* Ptr = nullptr; size_t AllocSize = 1024; queue Queue{syclContext, syclDevice, {property::queue::in_order{}}}; Graph.begin_recording(Queue); // Add an async_malloc node and capturing the returned pointer in Ptr Ptr = sycl_ext::async_malloc(Queue, usm::alloc::device, AllocSize); // Use Ptr in another graph node which has an in-order dependency on the // allocation node Queue.submit([&](handler& CGH) { // Do something with Ptr CGH.parallel_for(...); }); // Use Ptr in another node which has an in-order dependency on the // previous kernel. Queue.submit([&](handler& CGH) { // Do something with Ptr CGH.parallel_for(...); }); // Free Ptr, indicating it is no longer in use at this point in the graph, // with an in-order dependency on the previous kernel. sycl_ext::async_free(Queue, Ptr); Graph.end_recording(Queue); ``` -------------------------------- ### Dynamic Parameter Update with SYCL Graphs Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Demonstrates how to create a SYCL graph with a kernel node, update its arguments dynamically using `sycl_ext::dynamic_parameter`, and re-execute the graph. It covers updating both pointer and scalar arguments. ```cpp using namespacesycl; using namespacesycl_ext=sycl::ext::oneapi::experimental; queuemyQueue; automyContext=myQueue.get_context(); automyDevice=myQueue.get_device(); // USM allocations for kernel input/output constsize_tn=1024; int*ptrX=malloc_shared(n,myQueue); int*ptrY=malloc_device(n,myQueue); int*ptrZ=malloc_shared(n,myQueue); int*ptrQ=malloc_device(n,myQueue); // Kernel loaded from kernel bundle conststd::vectorbuiltinKernelIds= myDevice.get_info(); kernel_bundlemyBundle= get_kernel_bundle(myContext,{myDevice},builtinKernelIds); kernelbuiltinKernel=myBundle.get_kernel(builtinKernelIds[0]); // Graph containing a kernel node sycl_ext::command_graphmyGraph(myContext,myDevice); intmyScalar=42; // Create graph dynamic parameters sycl_ext::dynamic_parameterdynParamInput(myGraph,ptrX); sycl_ext::dynamic_parameterdynParamScalar(myGraph,myScalar); // The node uses ptrX as an input & output parameter, with operand // mySclar as another argument. sycl_ext::nodekernelNode=myGraph.add([&](handler&cgh){ cgh.set_args(dynParamInput,ptrY,dynParamScalar); cgh.parallel_for(range{n},builtinKernel); }); // Create an executable graph with the updatable property. autoexecGraph=myGraph.finalize({sycl_ext::property::graph::updatable}); // Execute graph, then update without needing to wait for it to complete myQueue.ext_oneapi_graph(execGraph); // Change ptrX argument to ptrZ dynParamInput.update(ptrZ); // Change myScalar argument to newScalar intnewScalar=12; dynParamScalar.update(newScalar); // Update kernelNode in the executable graph with the new parameters execGraph.update(kernelNode); // Execute graph again myQueue.ext_oneapi_graph(execGraph); myQueue.wait(); sycl::free(ptrX,myQueue); sycl::free(ptrY,myQueue); sycl::free(ptrZ,myQueue); sycl::free(ptrQ,myQueue); ``` -------------------------------- ### Build DPC++ Toolchain with libc++ Library Source: https://intel.github.io/llvm/GetStartedGuide Instructions for building the DPC++ toolchain using the libc++ library. This is a common configuration for C++ standard library support. ```bash # Example command (actual command may vary based on build system and options) cmake -DDPCT_USE_LIBCXX=ON .. make ``` -------------------------------- ### Emulating Multiple GPU Tiles Source: https://intel.github.io/llvm/MultiTileCardWithLevelZero Uses environment variables to emulate multiple tiles within a single GPU for testing sub-device partitioning. ```bash CreateMultipleSubDevices=N NEOReadDebugKeys=1 ``` -------------------------------- ### Dynamic Command Group Update in SYCL Graphs Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Shows how to manage and update a SYCL graph that contains a dynamic command group. This allows switching between different command group definitions (e.g., different kernels or operations) within the same graph node. ```cpp using namespacesycl; using namespacesycl_ext=sycl::ext::oneapi::experimental; queueQueue{}; sycl_ext::command_graphGraph{Queue.get_context(),Queue.get_device()}; int*PtrA=malloc_device(1024,Queue); int*PtrB=malloc_device(1024,Queue); autoCgfA=[&](handler&cgh){ cgh.parallel_for(1024,[=](item<1>Item){ PtrA[Item.get_id()]=1; }); }; autoCgfB=[&](handler&cgh){ cgh.parallel_for(512,[=](item<1>Item){ PtrB[Item.get_id()]=2; }); }; // Construct a dynamic command-group with CgfA as the active cgf (index 0). autoDynamicCG=sycl_ext::dynamic_command_group(Graph,{CgfA,CgfB}); // Create a dynamic command-group graph node. autoDynamicCGNode=Graph.add(DynamicCG); autoExecGraph=Graph.finalize(sycl_ext::property::graph::updatable{}); // The graph will execute CgfA. Queue.ext_oneapi_graph(ExecGraph).wait(); // Sets CgfB as active in the dynamic command-group (index 1). DynamicCG.set_active_index(1); // Calls update to update the executable graph node with the changes to DynamicCG. ExecGraph.update(DynamicCGNode); // The graph will execute CgfB. Queue.ext_oneapi_graph(ExecGraph).wait(); ``` -------------------------------- ### Build DPC++ Application with CMake Source: https://intel.github.io/llvm/GetStartedGuide Guidance on how to build a DPC++ application using CMake. This typically involves creating a CMakeLists.txt file to define the build process. ```cmake cmake_minimum_required(VERSION 3.10) project(MyDPCppApp) find_package(DPC++ REQUIRED) add_executable(my_app main.cpp) target_link_libraries(my_app PRIVATE DPC++::dpcpp_runtime) ``` -------------------------------- ### Whole Graph Update in SYCL Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide Illustrates recording multiple SYCL kernels into a graph and then updating the entire graph to execute with different parameters. This involves recording initial kernel executions, finalizing the graph as updatable, and then using a separate recording to update the graph's execution context. ```C++ usingnamespacesycl; namespacesycl_ext=sycl::ext::oneapi::experimental; // Enqueue several kernels which use inputPtr voidrun_kernels(int*inputPtr,queuesyclQueue){ eventeventA=syclQueue.submit([&](handler&CGH){ CGH.parallel_for(...); }); eventeventB=syclQueue.submit([&](handler&CGH){ CGH.depends_on(eventA); CGH.parallel_for(...); }); syclQueue.submit([&](handler&CGH){ CGH.depends_on(eventB); CGH.parallel_for(...); }); } // USM allocations constsize_tn=1024; int*ptrA=malloc_device(n,myQueue); int*ptrB=malloc_device(n,myQueue); // Main graph which will be updated later sycl_ext::command_graphmainGraph(myQueue); // Record the kernels to mainGraph, using ptrA mainGraph.begin_recording(myQueue); run_kernels(ptrA,myQueue); mainGraph.end_recording(); autoexecMainGraph=mainGraph.finalize({sycl_ext::property::graph::updatable}); // Execute execMainGraph myQueue.ext_oneapi_graph(execMainGraph); // Record a second graph which records the same kernels, but using ptrB instead sycl_ext::command_graphupdateGraph(myQueue); updateGraph.begin_recording(myQueue); run_kernels(ptrB,myQueue); updateGraph.end_recording(); // Update execMainGraph using updateGraph. We do not need to finalize // updateGraph (this would be expensive) execMainGraph.update(updateGraph); // Execute execMainGraph again, which will now be operating on ptrB instead of // ptrA myQueue.ext_oneapi_graph(execMainGraph); ``` -------------------------------- ### Example: Setting SYCL Cache Directory Source: https://intel.github.io/llvm/design/KernelProgramCache Example of how to set the SYCL_CACHE_DIR environment variable in a bash shell to specify a custom location for the persistent cache. ```bash export SYCL_CACHE_DIR="/path/to/my/sycl_cache" ``` -------------------------------- ### SYCL Device Management API Source: https://intel.github.io/llvm/syclcompat/README API for managing SYCL devices, queues, and contexts. Includes functions for queue creation, default queue management, device selection, and capability checking. ```APIDOC SYCL Device Management API: create_queue(bool print_on_async_exceptions = false, bool in_order = true) - Creates a new SYCL queue for the current device. - Parameters: - print_on_async_exceptions: If true, async exceptions are printed to stderr. - in_order: If true, operations on the queue are in-order. - Returns: A sycl::queue object. get_default_queue() - Retrieves the default SYCL queue for the current device. - Returns: The default sycl::queue. set_default_queue(const sycl::queue& q) - Sets the default SYCL queue for the current device. - This operation is blocking and waits for any submitted kernels in the previous default queue to complete. - If the previous default queue was the device's saved queue, the saved queue reference is updated. - Parameters: - q: The sycl::queue to set as the default. wait(sycl::queue q = get_default_queue()) - Waits for all queued kernels in the specified queue to complete. - Parameters: - q: The sycl::queue to wait on (defaults to the default queue). wait_and_throw(sycl::queue q = get_default_queue()) - Waits for all queued kernels in the specified queue to complete and throws any unhandled exceptions. - Parameters: - q: The sycl::queue to wait on (defaults to the default queue). get_current_device_id() - Returns the ID of the current SYCL device. - Returns: The unsigned integer ID of the current device. get_current_device() - Returns a reference to the current SYCL device extension (`device_ext`). - Returns: A reference to the current device_ext. get_device(unsigned int id) - Returns a reference to a SYCL device extension by its ID. - Parameters: - id: The ID of the device to retrieve. - Returns: A reference to the device_ext. get_default_context() - Retrieves the SYCL context associated with the default queue of the current device. - Returns: The sycl::context. cpu_device() - Returns a reference to the CPU SYCL device extension. - Returns: A reference to the CPU device_ext. filter_device(const std::vector& dev_subnames) - Filters the available SYCL devices, keeping only those whose names contain any of the provided substrings. - Warning: This may alter device ID mappings and the current device. It's recommended to call this before other SYCLcompat or SYCL APIs. - Parameters: - dev_subnames: A vector of strings representing device name substrings to filter by. list_devices() - Prints all available SYCL devices and their IDs to standard output. select_device(unsigned int id) - Selects a SYCL device by its ID. - Parameters: - id: The ID of the device to select. - Returns: The ID of the selected device. get_device_id(const sycl::device& dev) - Retrieves the ID of a given SYCL device. - Parameters: - dev: The sycl::device object. - Returns: The unsigned integer ID of the device. device_count() - Returns the total number of available SYCL devices. - Returns: The count of available devices. has_capability_or_fail(const sycl::device& dev, const std::initializer_list& props) - Checks if a given SYCL device supports all specified SYCL aspects. - If the device does not support all aspects, a sycl::exception is thrown. - Parameters: - dev: The sycl::device to check. - props: An initializer list of sycl::aspects to verify. Device Extension Class (`device_ext`): device_ext() - Default constructor. device_ext(const sycl::device& base, bool print_on_async_exceptions = false, bool in_order = true) - Constructor that initializes with a base SYCL device and optional queue properties. - Parameters: - base: The base sycl::device. - print_on_async_exceptions: Controls printing of async exceptions. - in_order: Controls queue ordering. ~device_ext() - Destructor. Waits on associated events and cleans up resources. is_native_host_atomic_supported() - Checks if native host atomics are supported by the device. - Returns: True if supported, false otherwise. ```