### Queue Recording Example with Memory Pool

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Illustrates queue recording with a user-provided memory pool. This example shows how to create a memory pool with specific properties (like zero initialization) and use it with async_malloc_from_pool, managing dependencies via events.

```cpp
using namespace sycl;
using namespace sycl::ext::oneapi::experimental;

void* Ptr = nullptr;
size_t AllocSize = 1024;
queue Queue{syclContext, syclDevice};

// Device memory pool with zero init property
sycl_ext::memory_pool MemPool{syclContext, syclDevice, usm::alloc::device,
    {sycl_ext::property::memory_pool::zero_init{}}};
Graph.begin_recording(Queue);
// Add an async_malloc node and capture the returned pointer in Ptr,
// zero_init property and usm::alloc kind of pool will be respected but pool
// is otherwise ignored
event AllocEvent = Queue.submit([&](handler& CGH) {
    Ptr = sycl_ext::async_malloc_from_pool(CGH, AllocSize, MemPool);
});

// Use Ptr in another graph node which depends on AllocNode
event OtherEventA = Queue.submit([&](handler& CGH) {
    CGH.depends_on(AllocEvent);
    // Do something with Ptr
    CGH.parallel_for(...);
});
// Use Ptr in another node which has an indirect dependency on AllocNode
event OtherEventB = Queue.submit([&](handler& CGH) {
    CGH.depends_on(OtherEventA);
    // Do something with Ptr
    CGH.parallel_for(...);
});

// Free Ptr, indicating it is no longer in use at this point in the graph,
// with a dependency on any leaf nodes using Ptr
Queue.submit([&](handler& CGH) {
    CGH.depends_on(OtherEventB);
    sycl_ext::async_free(CGH, Ptr);
});

Graph.end_recording(Queue);

```

--------------------------------

### Windows: Install OpenCL CPU RT and TBB

Source: https://intel.github.io/llvm/GetStartedGuide

Installs the OpenCL CPU runtime and TBB redistributable on Windows by running the install.bat script. Requires Administrator privileges.

```batch
# Install OpenCL CPU RT
# Answer N for ICD records cleanup
c:\oclcpu_rt_<cpu_version>\install.bat
oneapi-tbb-<tbb_version>\redist\intel64\vc14
```

--------------------------------

### Windows: Install OpenCL FPGA Emulation RT and TBB

Source: https://intel.github.io/llvm/GetStartedGuide

Installs the OpenCL FPGA emulation runtime and TBB redistributable on Windows by running the install.bat script. Requires Administrator privileges.

```batch
# Install OpenCL FPGA emulation RT
# Answer Y to clean previous OCL_ICD_FILENAMES configuration and ICD records cleanup
c:\oclfpga_rt_<fpga_version>\install.bat
oneapi-tbb-<tbb_version>\redist\intel64\vc14
```

--------------------------------

### SYCL Graph Host-Task Example

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Demonstrates how to correctly integrate host work into a SYCL Graph by using host-tasks. This ensures host operations are executed as dependencies within the graph, preventing issues with repeated graph execution.

```cpp
Queue.submit([&](sycl::handler&CGH){
// Do some host work here to prepare for the kernel to be executed
do_some_host_work();
CGH.parallel_for(...);
});


Graph.begin_recording(Queue);

sycl::eventHostWorkEvent=Queue.submit([&](sycl::handler&CGH){
CGH.host_task([=](){
do_some_host_work();
});
});

Queue.submit([&](sycl::handler&CGH){
CGH.depends_on(HostWorkEvent);
CGH.parallel_for(...);
});

Graph.end_recording(Queue);
```

--------------------------------

### Linux DPC++ Workspace Setup

Source: https://intel.github.io/llvm/GetStartedGuide

Sets up the DPC++ workspace on Linux by defining the DPCPP_HOME environment variable, creating the directory, and navigating into it.

```bash
export DPCPP_HOME=~/sycl_workspace
mkdir $DPCPP_HOME
cd $DPCPP_HOME

git
```

--------------------------------

### Dot Product with Explicit Graph Creation

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Demonstrates creating a SYCL command graph explicitly to perform a dot product operation. It shows how to add nodes with dependencies and execute the graph.

```C++
#include<sycl/sycl.hpp>
#include<sycl/ext/oneapi/experimental/graph.hpp>

intmain(){
namespacesycl_ext=sycl::ext::oneapi::experimental;

constsize_tn=10;
floatalpha=1.0f;
floatbeta=2.0f;
floatgamma=3.0f;

sycl::queueq;
	sycl_ext::command_graphg(q.get_context(),q.get_device());

float*dotp=sycl::malloc_shared<float>(1,q);
float*x=sycl::malloc_device<float>(n,q);
float*y=sycl::malloc_device<float>(n,q);
float*z=sycl::malloc_device<float>(n,q);

// Add commands to the graph to create the following topology.
//
//     i
//    / \
    //   a   b
//    \ /
//     c

// init data on the device
autonode_i=g.add([&](sycl::handler&h){
	h.parallel_for(n,[=](sycl::id<1>it){
		constsize_ti=it[0];
		x[i]=1.0f;
		y[i]=2.0f;
		z[i]=3.0f;
	});
});

autonode_a=g.add([&](sycl::handler&h){
	h.parallel_for(sycl::range<1>{n},[=](sycl::id<1>it){
		constsize_ti=it[0];
		x[i]=alpha*x[i]+beta*y[i];
	});
},{sycl_ext::property::node::depends_on(node_i)});

autonode_b=g.add([&](sycl::handler&h){
	h.parallel_for(sycl::range<1>{n},[=](sycl::id<1>it){
		constsize_ti=it[0];
		z[i]=gamma*z[i]+beta*y[i];
	});
},{sycl_ext::property::node::depends_on(node_i)});

autonode_c=g.add(
		[&](sycl::handler&h){
		h.single_task([=](){
			for(size_ti=0;i<n;i++){
				*dotp+=x[i]*z[i];
			}
		});
	},
		{sycl_ext::property::node::depends_on(node_a,node_b)});

autoexec=g.finalize();

// use queue shortcut for graph submission
q.ext_oneapi_graph(exec).wait();

// memory can be freed inside or outside the graph
sycl::free(x,q);
sycl::free(y,q);
sycl::free(z,q);
sycl::free(dotp,q);

return0;
}

```

--------------------------------

### Windows DPC++ Workspace Setup

Source: https://intel.github.io/llvm/GetStartedGuide

Sets up the DPC++ workspace on Windows by defining the DPCPP_HOME environment variable, creating the directory, and cloning the LLVM repository.

```batch
set DPCPP_HOME=%USERPROFILE%\sycl_workspace
mkdir %DPCPP_HOME%
cd %DPCPP_HOME%

git clone --config core.autocrlf=false https://github.com/intel/llvm -b sycl

```

--------------------------------

### Install Zstd on Ubuntu

Source: https://intel.github.io/llvm/GetStartedGuide

Installs the zstd compression library on Ubuntu using the apt package manager. Note potential issues with static library builds on newer Ubuntu versions.

```bash
sudo apt update
sudo apt install libzstd-dev
```

--------------------------------

### Enqueue Native Command with SYCL-Graph (CUDA Example)

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Demonstrates how to adapt existing code using ext_codeplay_enqueue_native_command to work with SYCL-Graph, ensuring native library calls are captured correctly as graph nodes when the queue is in a recording state. This example uses the CUDA backend.

```cpp
q.submit([&](sycl::handler&CGH){
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handleIH){
autoNativeStream=IH.get_native_queue<cuda>();
if(IH.ext_codeplay_has_graph()){
autoNativeGraph=
IH.ext_codeplay_get_native_graph<sycl::backend::ext_oneapi_cuda>();

// Start capture stream calls into graph
cuStreamBeginCaptureToGraph(NativeStream,NativeGraph,nullptr,
nullptr,0,
CU_STREAM_CAPTURE_MODE_GLOBAL);

myNativeLibraryCall(NativeStream);

// Stop capturing stream calls into graph
cuStreamEndCapture(NativeStream,&NativeGraph);
}else{
myNativeLibraryCall(NativeStream);
}
});
});
```

--------------------------------

### Diamond Dependency with Queue Recording

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Illustrates using SYCL queue recording to build a command graph with a diamond dependency pattern. It covers recording, ending recording, finalizing, and executing the graph.

```C++
usingnamespacesycl;
namespacesycl_ext=sycl::ext::oneapi::experimental;

queueq{default_selector{}};

// Lifetime of buffers must exceed the lifetime of graphs they are used in.
buffer<T>bufferA{dataA.data(),range<1>{elements}};
bufferA.set_write_back(false);
buffer<T>bufferB{dataB.data(),range<1>{elements}};
bufferB.set_write_back(false);
buffer<T>bufferC{dataC.data(),range<1>{elements}};
bufferC.set_write_back(false);

{
// New object representing graph of command-groups
	sycl_ext::command_graphgraph(q.get_context(),q.get_device(),
		{sycl_ext::property::graph::assume_buffer_outlives_graph{}});


// `q` will be put in the recording state where commands are recorded to
// `graph` rather than submitted for execution immediately.
	graph.begin_recording(q);

// Record commands to `graph` with the following topology.
//
//      increment_kernel
//       /         \
    //   A->/        A->\
    //     /
    //   add_kernel  subtract_kernel
//     \             /
//   B->\        C->/
//       \
//     decrement_kernel

	q.submit([&](handler&cgh){
		autoData=bufferA.get_access<access::mode::read_write>(cgh);
		cgh.parallel_for<increment_kernel>(range<1>(elements),
			[=](item<1>id){pData[id]++;});
	});

	q.submit([&](handler&cgh){
		autoData1=bufferA.get_access<access::mode::read>(cgh);
		autoData2=bufferB.get_access<access::mode::read_write>(cgh);
		cgh.parallel_for<add_kernel>(range<1>(elements),
			[=](item<1>id){pData2[id]+=pData1[id];});
	});

	q.submit([&](handler&cgh){
		autoData1=bufferA.get_access<access::mode::read>(cgh);
		autoData2=bufferC.get_access<access::mode::read_write>(cgh);
		cgh.parallel_for<subtract_kernel>(
			range<1>(elements),[=](item<1>id){
				pData2[id]-=pData1[id];
			});
	});

	q.submit([&](handler&cgh){
		autoData1=bufferB.get_access<access::mode::read_write>(cgh);
		autoData2=bufferC.get_access<access::mode::read_write>(cgh);
		cgh.parallel_for<decrement_kernel>(range<1>(elements),[=](item<1>id){
			pData1[id]--;
			pData2[id]--;
		});
	});

// queue `q` will be returned to the executing state where commands are
// submitted immediately for extension.
	graph.end_recording();

// Finalize the modifiable graph to create an executable graph that can be
// submitted for execution.
	autoexec_graph=graph.finalize();

// Execute graph
	q.submit([&](handler&cgh){
		cgh.ext_oneapi_graph(exec_graph);
	}).wait();
}

// Check output using host accessors
host_accessorhostAccA(bufferA);
host_accessorhostAccB(bufferB);
host_accessorhostAccC(bufferC);

...
```

--------------------------------

### DPC++ Frontend Device Configuration YAML Example

Source: https://intel.github.io/llvm/design/DeviceConfigFile

An example of a YAML file used by the DPC++ frontend to describe new targets. It includes aspects, sub-group sizes, and toolchain information. The frontend parses this file to update target information.

```yaml
intel_gpu_skl:
    aspects: [aspect_name1, aspect_name2]
    may_support_other_aspects: true/false
    sub-group-sizes: [1, 2, 4, 8]
    aot-toolchain: ocloc
    aot-toolchain-options: -device skl
```

--------------------------------

### SYCLcompat Kernel Launch Example

Source: https://intel.github.io/llvm/syclcompat/README

Demonstrates how to use the SYCLcompat launch interface to execute a device function like vectorAdd. It shows launching with dim3 and sycl::nd_range, highlighting the internal kernel lambda construction.

```cpp
void vectorAdd(const float* A, const float* B, float* C, int n);

// Using dim3
syclcompat::launch<vectorAdd>(blocksPerGrid, threadsPerBlock, d_A, d_B, d_C, n);

// Using sycl::nd_range
auto range = sycl::nd_range<3>{blocksPerGrid * threadsPerBlock,
 threadsPerBlock};
syclcompat::launch<vectorAdd>(range, d_A, d_B, d_C, n);
```

--------------------------------

### clang-offload-packager Usage Example

Source: https://intel.github.io/llvm/design/OffloadDesign

Demonstrates the command-line usage of the `clang-offload-packager` tool for bundling device binaries. It specifies the input file, target triple, and offloading kind.

```bash
clang-offload-packager --image=file=<name>,triple=<triple>,kind=<kind>
```

--------------------------------

### Dynamic Device Code Loading Example

Source: https://intel.github.io/llvm/design/SharedLibraries

An example demonstrating how a device binary image can be loaded into memory using a dlopen-like API for dynamic symbol resolution. This API is currently a TODO item as the SYCL standard does not yet define it.

```c++
// suppose, mylib.spv defines SYCL_EXTERNAL function foo, then this call:
device_imageimg=device_dlopen("mylib.spv");
// will make foo available for dynamic symbol resolution. If any subsequent
// JIT compilations try to compile device code with external reference to
// foo, it can now be resolved following the resolution mechanism described
// in this doc, and JIT compilation will succeed.

```

--------------------------------

### SYCL Device Code Compilation Example

Source: https://intel.github.io/llvm/-docs/index

This snippet provides an example of SYCL device code compilation within the DPC++ framework. It links to the relevant section in the Users Manual for detailed instructions and context.

```APIDOC
Users Manual - Example: SYCL device code compilation
  - Description: Provides an example of how to compile SYCL device code using the DPC++ compiler.
  - Related Sections:
    - Users Manual: https://intel.github.io/llvm-docs/UsersManual.html
    - Getting Started with oneAPI DPC++: https://intel.github.io/llvm-docs/GetStartedGuide.html
```

--------------------------------

### Concurrent SYCL Graph Execution with Pre-finalized Graphs

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Shows how to achieve concurrent SYCL graph execution by pre-finalizing multiple graphs for different parameter sets. This avoids host synchronization between executions and can improve device occupancy.

```cpp
namespace sycl_ext = sycl::ext::oneapi::experimental;

sycl_ext::command_graph<graph_state::executable> ExecutableGraph
= ModifiableGraph.finalize();
sycl_ext::command_graph<graph_state::executable> ExecutableGraphOtherParams
= ModifiableGraphWithNewParams.finalize();

Queue.ext_oneapi_graph(ExecutableGraph);
// We can now execute the second graph with no implicit dependency
Queue.ext_oneapi_graph(ExecutableGraphOtherParams);

// Repeatedly executing the graphs will create dependencies on their individual
// preceeding executions, but not on each other.
Queue.ext_oneapi_graph(ExecutableGraph);
Queue.ext_oneapi_graph(ExecutableGraphOtherParams);
```

--------------------------------

### SYCL Native CPU Code Coverage Example

Source: https://intel.github.io/llvm/design/SYCLNativeCPU

Example usage for enabling LLVM's source-based code coverage for SYCL Native CPU, allowing coverage testing across device and host code.

```Shell
=native_cpu\vector-add.exe
llvm-profdata\vector-add.exe=foo.profdata
```

--------------------------------

### Querying Target Information

Source: https://intel.github.io/llvm/design/DeviceConfigFile

Example C++ code demonstrating how to query the TargetTable for specific target information, such as aspects and subgroup sizes.

```C++
DeviceConfigFile::TargetInfoinfo=DeviceConfigFile::targets.find("TargetA");
if(info==DeviceConfigFile::targets.end()){
/* Target not found */
...
}else{
autoaspects=info.aspects;
automaySupportOtherAspects=info.maySupportOtherAspects;
autosubGroupSizes=info.subGroupSizes;
...
```

--------------------------------

### GPU Target Compilation Example

Source: https://intel.github.io/llvm/design/OffloadDesign

Demonstrates how to pass specific options to the OpenCL Offline Compiler (OCLOC) for GPU targets using the `-Xsycl-target-backend` and `--gpu-tool-arg` options.

```bash
# Example with multiple GPU targets and specific options
-fsycl -fsycl-targets=spir64_gen,intel_gpu_skl -Xsycl-target-backend=spir64_gen \"-device pvc -options -extraopt_pvc\" -Xsycl-target-backend=intel_gpu_skl \"-options -extraopt_skl\"
```

--------------------------------

### Device Link Compilation Flow Example

Source: https://intel.github.io/llvm/design/CompilerAndRuntimeDesign

Illustrates a three-step compilation process using `-fsycl-link` to separate device and host linking, optimizing for cases where device code compilation is time-consuming.

```shell
# Step 1: Device link (can be time-consuming)
dev_a.cpp dev_b.cpp -fsycl-link -c -o dev_image.o

# Step 2: Host Compile
host_a.cpp -c -o host_a.o
host_b.cpp -c -o host_b.o

# Step 3: Linking
dev_image.o host_a.o host_b.o -o executable
```

--------------------------------

### DPC++ SYCL Graph Usage Guide

Source: https://intel.github.io/llvm/-docs/index

A guide detailing the usage of SYCL Graph within DPC++. This includes examples and explanations for leveraging SYCL Graph for efficient parallel programming.

```APIDOC
SYCL Graph Usage Guide and Examples
  - Description: Comprehensive guide on using SYCL Graph with DPC++, including usage patterns and practical examples.
  - Link: https://intel.github.io/llvm-docs/syclgraph/SYCLGraphUsageGuide.html
```

--------------------------------

### Discovering SYCL Devices with sycl-ls

Source: https://intel.github.io/llvm/MultiTileCardWithLevelZero

Demonstrates how to use the 'sycl-ls' tool to discover available SYCL devices, including CPUs and GPUs managed by different backends.

```bash
sycl-ls
```

--------------------------------

### Handling Library Warmups in SYCL Graph Recording

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Illustrates a scenario where a library function performs a warmup on its first execution. This example shows how to prevent such warmups from being captured in a SYCL graph by manually triggering the warmup before recording.

```cpp
// Hypothetical library function
void SomeLibrary::Operation(sycl::Queue Queue){
static bool IsFirstTime=true;

if(IsFirstTime){
// Warmup by launching kernel once
do_warmup(Queue);
// Execute the actual operation
execute_operation(Queue);
IsFirstTime=false;
}
else{
execute_operation(Queue);
}
}

// SYCL Application Code

Graph.begin_recording(Queue);

// do_warmup() will be captured here and executed every time the graph is
// executed in future which is undesirable.
SomeLibrary::Operation(Queue);

Graph.end_recording(Queue);
```

--------------------------------

### Creating Sub-devices from a Root Device

Source: https://intel.github.io/llvm/MultiTileCardWithLevelZero

Demonstrates how to partition a root device into sub-devices based on affinity domains using the `create_sub_devices` SYCL API.

```cpp
try{
vector<device>SubDevices=RootDevice.create_sub_devices<
sycl::info::partition_property::partition_by_affinity_domain>(
sycl::info::partition_affinity_domain::next_partitionable);
}

```

--------------------------------

### Original Native Command Invocation (CUDA Example)

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Shows the original way to invoke a native library call using ext_codeplay_enqueue_native_command without SYCL-Graph compatibility. This code will schedule asynchronous work immediately if the queue is in a recording state.

```cpp
q.submit([&](sycl::handler&CGH){
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handleIH){
autoNativeStream=IH.get_native_queue<cuda>();
myNativeLibraryCall(NativeStream);
});
});
```

--------------------------------

### Explicit Graph Example with async_malloc

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Demonstrates explicit graph creation using SYCL's experimental oneAPI extensions. It shows how to add nodes for asynchronous memory allocation (async_malloc) and deallocation (async_free), managing dependencies between these operations within the graph.

```cpp
using namespace sycl;
using namespace sycl::ext::oneapi::experimental;

void* Ptr = nullptr;
size_t AllocSize = 1024;
// Add an async_malloc node and capturing the returned pointer in Ptr
auto AllocNode = Graph.add([&](handler& CGH) {
    Ptr = sycl_ext::async_malloc(CGH, usm::alloc::device, AllocSize);
});

// Use Ptr in another graph node which depends on AllocNode
auto OtherNodeA = Graph.add(..., {property::graph::depends_on{AllocNode}}); 
// Use Ptr in another node which has an indirect dependency on AllocNode
auto OtherNodeB = Graph.add(..., {property::graph::depends_on{OtherNodeA}}); 

// Free Ptr, indicating it is no longer in use at this point in the graph,
// with a dependency on any leaf nodes using Ptr
Graph.add([&](handler& CGH) {
    sycl_ext::async_free(CGH, Ptr);
}, {property::graph::depends_on{OtherNodeB}}); 

```

--------------------------------

### Build DPC++ Toolchain with HIP AMD Support

Source: https://intel.github.io/llvm/GetStartedGuide

Instructions for building the DPC++ toolchain with support for AMD HIP accelerators. This involves enabling the HIP backend during the build configuration.

```bash
# Example command (actual command may vary based on build system and options)
cmake -DDPCT_ENABLE_HIP=ON -DHIP_ROOT_DIR=/opt/rocm ..
make
```

--------------------------------

### Run DPC++ E2E Tests

Source: https://intel.github.io/llvm/GetStartedGuide

Instructions on how to execute the end-to-end (E2E) tests for the DPC++ toolchain. These tests verify the functionality of the compiler and runtime.

```bash
# Example command (actual command may vary based on test runner and configuration)
./bin/dpcpp --test=e2e
```

--------------------------------

### Dynamic Command Groups with Dynamic Parameters in SYCL

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Demonstrates creating and updating a SYCL graph with a dynamic command group that utilizes dynamic parameters. This example shows how to update a dynamic parameter and switch between different command group functions within the graph, then execute the updated graph.

```C++
using namespacesycl;
namespacesycl_ext=sycl::ext::oneapi::experimental;

size_tN=1024;
queueQueue{};
autoMyContext=Queue.get_context();
autoMyDevice=Queue.get_device();
sycl_ext::command_graphGraph{MyContext,MyDevice};

int*PtrA=malloc_device<int>(N,Queue);
int*PtrB=malloc_device<int>(N,Queue);

// Kernels loaded from kernel bundle
conststd::vector<kernel_id>BuiltinKernelIds=
MyDevice.get_info<info::device::built_in_kernel_ids>();
kernel_bundle<bundle_state::executable>MyBundle=
get_kernel_bundle<sycl::bundle_state::executable>(MyContext,{MyDevice},BuiltinKernelIds);

kernelBuiltinKernelA=MyBundle.get_kernel(BuiltinKernelIds[0]);
kernelBuiltinKernelB=MyBundle.get_kernel(BuiltinKernelIds[1]);

// Create a dynamic parameter with an initial value of PtrA
sycl_ext::dynamic_parameterDynamicPointerArg{Graph,PtrA};

// Create command groups for both kernels which use DynamicPointerArg
autoCgfA=[&](handler&cgh){
cgh.set_arg(0,DynamicPointerArg);
cgh.parallel_for(range{N},BuiltinKernelA);
};

autoCgfB=[&](handler&cgh){
cgh.set_arg(0,DynamicPointerArg);
cgh.parallel_for(range{N/2},BuiltinKernelB);
};

// Construct a dynamic command-group with CgfA as the active cgf (index 0).
autoDynamicCG=sycl_ext::dynamic_command_group(Graph,{CgfA,CgfB});

// Create a dynamic command-group graph node.
autoDynamicCGNode=Graph.add(DynamicCG);

autoExecGraph=Graph.finalize(sycl_ext::property::graph::updatable{});

// The graph will execute CgfA with PtrA.
Queue.ext_oneapi_graph(ExecGraph).wait();

//Update DynamicPointerArg with a new value
DynamicPointerArg.update(PtrB);

// Sets CgfB as active in the dynamic command-group (index 1).
DynamicCG.set_active_index(1);

// Calls update to update the executable graph node with the changes to
// DynamicCG and DynamicPointerArg.
ExecGraph.update(DynamicCGNode);

// The graph will execute CgfB with PtrB.
Queue.ext_oneapi_graph(ExecGraph).wait();

```

--------------------------------

### Build Doxygen Documentation

Source: https://intel.github.io/llvm/GetStartedGuide

Builds the Doxygen documentation for the DPC++ toolchain. Requires doxygen, graphviz, and sphinx. The documentation is generated in the `$DPCPP_HOME/llvm/build/tools/sycl/doc/html` directory.

```cmake
cmake -DSYCL_BUILD_DOCS=ON <path_to_llvm_source>
cmake --build . --target doxygen-sycl
```

--------------------------------

### SYCL Device Assertion Example

Source: https://intel.github.io/llvm/design/Assert

Demonstrates the use of the standard C++ assert API within SYCL device code. When an assertion fails (e.g., Item[0] % 2 is false), it triggers a call to std::abort() on the host side. The example includes necessary headers and a basic SYCL queue setup.

```c++
#include<cassert>
#include<sycl/sycl.hpp>

using namespacesycl;

voiduser_func(item<2>Item){
assert((Item[0]%2)&&“Nil”);
}

intmain(){
queueQ;
Q.submit([&](handler&CGH){
CGH.parallel_for<classTheKernel>(range<2>{N,M},[=](item<2>It){
do_smth();
user_func(It);
do_smth_else();
});
});
Q.wait();
std::cout<<“Oneshouldn’tseethismessage.“;
return0;
}

```

--------------------------------

### Concurrent SYCL Graph Execution with Updates

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Demonstrates how to execute the same SYCL graph concurrently by updating it with new parameters. This method involves a host synchronization when the graph is updated.

```cpp
namespace sycl_ext = sycl::ext::oneapi::experimental;

sycl_ext::command_graph<graph_state::executable> ExecutableGraph
= ModifiableGraph.finalize();

Queue.ext_oneapi_graph(ExecutableGraph);
// Updating the graph here to use new memory, this forces a host synchronization
ExecutableGraph.update(ModifiableGraphWithNewParams);

// Re-execute the update graph
Queue.ext_oneapi_graph(ExecutableGraph);
```

--------------------------------

### SYCL Graph-Owned Memory Allocation Example

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Demonstrates allocating, using, and freeing temporary memory within a SYCL graph. This pattern is useful for libraries that need specialized SYCL kernels and want to integrate with SYCL graphs for lifetime management and memory reuse. It shows the use of `sycl_ext::async_malloc` and `sycl_ext::async_free` with proper event dependencies.

```cpp
using namespace sycl;

// Library code, this example is assuming an out of order SYCL queue
void launchLibraryKernel(queue& SyclQueue) {
    size_t TempMemSize = 1024;
    void* Ptr = nullptr;

    // Get a pointer to some temporary memory for use in the kernel
    // This call creates an allocation node in the graph if this call is being
    // recorded.
    event AllocEvent = SyclQueue.submit([&](handler& CGH) {
        Ptr = sycl_ext::async_malloc(CGH, usm::alloc::device, TempMemSize);
    });

    // Submit the actual library kernel
    event KernelEvent = SyclQueue.submit([&](handler& CGH) {
        // Mark the allocation as a dependency so that the temporary memory
        // is available
        CGH.depends_on(AllocEvent);
        // Submit a kernel that uses the temp memory in Ptr
        CGH.parallel_for(...);
    });

    // Free the memory back to the pool or graph, indicating that it is free to
    // be re-used. Memory will not actually be released back to the OS.
    SyclQueue.submit([&](handler& CGH) {
        // Mark the kernel as a dependency before freeing
        CGH.depends_on(KernelEvent);
        sycl_ext::async_free(CGH, Ptr);
    });
}

// Application code
void recordLibraryCall(queue& SyclQueue, sycl_ext::command_graph& Graph) {
    Graph.begin_recording(SyclQueue);
    // Call into library to record queue commands to the graph
    launchLibraryKernel(SyclQueue);

    Graph.end_recording(SyclQueue);
}
```

--------------------------------

### Compile DPC++ Toolchain

Source: https://intel.github.io/llvm/GetStartedGuide

This snippet shows the command to compile the DPC++ toolchain after configuration, typically using a `compile.py` script.

```bash
CC=gcc CXX=g++ $DPCPP_HOME/llvm/buildbot/compile.py
```

--------------------------------

### Linux: Configure Library Paths

Source: https://intel.github.io/llvm/GetStartedGuide

Configures system-wide library paths to include the DPC++ runtime libraries, requiring sudo access.

```bash
echo|
echo|
```

--------------------------------

### Dynamic Parameter Update with Accessors

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Illustrates how to use `sycl_ext::dynamic_parameter` with SYCL accessors for updating graph node arguments. This method is useful when the handler is not directly available outside the command-group scope.

```cpp
sycl::bufferbufferA{...};
sycl::bufferbufferB{...};

// Create graph dynamic parameter using a placeholder accessor, since the
// sycl::handler is not available here outside of the command-group scope.
sycl_ext::dynamic_parameterdynParamAccessor(myGraph,bufferA.get_access());

sycl_ext::nodekernelNode=myGraph.add([&](handler&cgh){
// Require the accessor contained in the dynamic paramter
cgh.require(dynParamAccessor);
// Set the arg on the kernel using the dynamic parameter directly
cgh.set_args(dynParamAccessor);
cgh.parallel_for(range{n},builtinKernel);
});

... 
// Update the dynamic parameter with a placeholder accessor from bufferB instead
dynParamAccessor.update(bufferB.get_access());

```

--------------------------------

### Linux: Extract OpenCL FPGA Emulation RT

Source: https://intel.github.io/llvm/GetStartedGuide

Commands to create a directory and extract the OpenCL FPGA emulation runtime archive on a Linux system.

```bash
mkdir
cd
```

--------------------------------

### In-Order Queue Recording with Event-less async_malloc

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Demonstrates queue recording using an in-order queue and event-less asynchronous memory allocation functions. This simplifies dependency management by leveraging the in-order execution property of the queue.

```cpp
using namespace sycl;
using namespace sycl::ext::oneapi::experimental;

void* Ptr = nullptr;
size_t AllocSize = 1024;
queue Queue{syclContext, syclDevice, {property::queue::in_order{}}};

Graph.begin_recording(Queue);
// Add an async_malloc node and capturing the returned pointer in Ptr
Ptr = sycl_ext::async_malloc(Queue, usm::alloc::device, AllocSize);

// Use Ptr in another graph node which has an in-order dependency on the
// allocation node
Queue.submit([&](handler& CGH) {
    // Do something with Ptr
    CGH.parallel_for(...);
});
// Use Ptr in another node which has an in-order dependency on the
// previous kernel.
Queue.submit([&](handler& CGH) {
    // Do something with Ptr
    CGH.parallel_for(...);
});

// Free Ptr, indicating it is no longer in use at this point in the graph,
// with an in-order dependency on the previous kernel.
sycl_ext::async_free(Queue, Ptr);

Graph.end_recording(Queue);

```

--------------------------------

### Dynamic Parameter Update with SYCL Graphs

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Demonstrates how to create a SYCL graph with a kernel node, update its arguments dynamically using `sycl_ext::dynamic_parameter`, and re-execute the graph. It covers updating both pointer and scalar arguments.

```cpp
using namespacesycl;
using namespacesycl_ext=sycl::ext::oneapi::experimental;

queuemyQueue;
automyContext=myQueue.get_context();
automyDevice=myQueue.get_device();

// USM allocations for kernel input/output
constsize_tn=1024;
int*ptrX=malloc_shared<int>(n,myQueue);
int*ptrY=malloc_device<int>(n,myQueue);

int*ptrZ=malloc_shared<int>(n,myQueue);
int*ptrQ=malloc_device<int>(n,myQueue);

// Kernel loaded from kernel bundle
conststd::vector<kernel_id>builtinKernelIds=
myDevice.get_info<info::device::built_in_kernel_ids>();
kernel_bundle<bundle_state::executable>myBundle=
get_kernel_bundle(myContext,{myDevice},builtinKernelIds);
kernelbuiltinKernel=myBundle.get_kernel(builtinKernelIds[0]);

// Graph containing a kernel node
sycl_ext::command_graphmyGraph(myContext,myDevice);

intmyScalar=42;
// Create graph dynamic parameters
sycl_ext::dynamic_parameterdynParamInput(myGraph,ptrX);
sycl_ext::dynamic_parameterdynParamScalar(myGraph,myScalar);

// The node uses ptrX as an input & output parameter, with operand
// mySclar as another argument.
sycl_ext::nodekernelNode=myGraph.add([&](handler&cgh){
cgh.set_args(dynParamInput,ptrY,dynParamScalar);
cgh.parallel_for(range{n},builtinKernel);
});

// Create an executable graph with the updatable property.
autoexecGraph=myGraph.finalize({sycl_ext::property::graph::updatable});

// Execute graph, then update without needing to wait for it to complete
myQueue.ext_oneapi_graph(execGraph);

// Change ptrX argument to ptrZ
dynParamInput.update(ptrZ);

// Change myScalar argument to newScalar
intnewScalar=12;
dynParamScalar.update(newScalar);

// Update kernelNode in the executable graph with the new parameters
execGraph.update(kernelNode);
// Execute graph again
myQueue.ext_oneapi_graph(execGraph);
myQueue.wait();

sycl::free(ptrX,myQueue);
sycl::free(ptrY,myQueue);
sycl::free(ptrZ,myQueue);
sycl::free(ptrQ,myQueue);

```

--------------------------------

### Build DPC++ Toolchain with libc++ Library

Source: https://intel.github.io/llvm/GetStartedGuide

Instructions for building the DPC++ toolchain using the libc++ library. This is a common configuration for C++ standard library support.

```bash
# Example command (actual command may vary based on build system and options)
cmake -DDPCT_USE_LIBCXX=ON ..
make
```

--------------------------------

### Emulating Multiple GPU Tiles

Source: https://intel.github.io/llvm/MultiTileCardWithLevelZero

Uses environment variables to emulate multiple tiles within a single GPU for testing sub-device partitioning.

```bash
CreateMultipleSubDevices=N NEOReadDebugKeys=1
```

--------------------------------

### Dynamic Command Group Update in SYCL Graphs

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Shows how to manage and update a SYCL graph that contains a dynamic command group. This allows switching between different command group definitions (e.g., different kernels or operations) within the same graph node.

```cpp
using namespacesycl;
using namespacesycl_ext=sycl::ext::oneapi::experimental;

queueQueue{};
sycl_ext::command_graphGraph{Queue.get_context(),Queue.get_device()};

int*PtrA=malloc_device<int>(1024,Queue);
int*PtrB=malloc_device<int>(1024,Queue);

autoCgfA=[&](handler&cgh){
cgh.parallel_for(1024,[=](item<1>Item){
PtrA[Item.get_id()]=1;
});
};

autoCgfB=[&](handler&cgh){
cgh.parallel_for(512,[=](item<1>Item){
PtrB[Item.get_id()]=2;
});
};

// Construct a dynamic command-group with CgfA as the active cgf (index 0).
autoDynamicCG=sycl_ext::dynamic_command_group(Graph,{CgfA,CgfB});

// Create a dynamic command-group graph node.
autoDynamicCGNode=Graph.add(DynamicCG);

autoExecGraph=Graph.finalize(sycl_ext::property::graph::updatable{});

// The graph will execute CgfA.
Queue.ext_oneapi_graph(ExecGraph).wait();

// Sets CgfB as active in the dynamic command-group (index 1).
DynamicCG.set_active_index(1);

// Calls update to update the executable graph node with the changes to DynamicCG.
ExecGraph.update(DynamicCGNode);

// The graph will execute CgfB.
Queue.ext_oneapi_graph(ExecGraph).wait();

```

--------------------------------

### Build DPC++ Application with CMake

Source: https://intel.github.io/llvm/GetStartedGuide

Guidance on how to build a DPC++ application using CMake. This typically involves creating a CMakeLists.txt file to define the build process.

```cmake
cmake_minimum_required(VERSION 3.10)
project(MyDPCppApp)

find_package(DPC++ REQUIRED)

add_executable(my_app main.cpp)
target_link_libraries(my_app PRIVATE DPC++::dpcpp_runtime)
```

--------------------------------

### Whole Graph Update in SYCL

Source: https://intel.github.io/llvm/syclgraph/SYCLGraphUsageGuide

Illustrates recording multiple SYCL kernels into a graph and then updating the entire graph to execute with different parameters. This involves recording initial kernel executions, finalizing the graph as updatable, and then using a separate recording to update the graph's execution context.

```C++
usingnamespacesycl;
namespacesycl_ext=sycl::ext::oneapi::experimental;

// Enqueue several kernels which use inputPtr
voidrun_kernels(int*inputPtr,queuesyclQueue){
  eventeventA=syclQueue.submit([&](handler&CGH){
    CGH.parallel_for(...);
  });
  eventeventB=syclQueue.submit([&](handler&CGH){
    CGH.depends_on(eventA);
    CGH.parallel_for(...);
  });
  syclQueue.submit([&](handler&CGH){
    CGH.depends_on(eventB);
    CGH.parallel_for(...);
  });
}

// USM allocations
constsize_tn=1024;
int*ptrA=malloc_device<int>(n,myQueue);
int*ptrB=malloc_device<int>(n,myQueue);

// Main graph which will be updated later
sycl_ext::command_graphmainGraph(myQueue);

// Record the kernels to mainGraph, using ptrA
mainGraph.begin_recording(myQueue);
run_kernels(ptrA,myQueue);
mainGraph.end_recording();

autoexecMainGraph=mainGraph.finalize({sycl_ext::property::graph::updatable});

// Execute execMainGraph
myQueue.ext_oneapi_graph(execMainGraph);

// Record a second graph which records the same kernels, but using ptrB instead
sycl_ext::command_graphupdateGraph(myQueue);
updateGraph.begin_recording(myQueue);
run_kernels(ptrB,myQueue);
updateGraph.end_recording();

// Update execMainGraph using updateGraph. We do not need to finalize
// updateGraph (this would be expensive)
execMainGraph.update(updateGraph);

// Execute execMainGraph again, which will now be operating on ptrB instead of
// ptrA
myQueue.ext_oneapi_graph(execMainGraph);

```

--------------------------------

### Example: Setting SYCL Cache Directory

Source: https://intel.github.io/llvm/design/KernelProgramCache

Example of how to set the SYCL_CACHE_DIR environment variable in a bash shell to specify a custom location for the persistent cache.

```bash
export SYCL_CACHE_DIR="/path/to/my/sycl_cache"
```

--------------------------------

### SYCL Device Management API

Source: https://intel.github.io/llvm/syclcompat/README

API for managing SYCL devices, queues, and contexts. Includes functions for queue creation, default queue management, device selection, and capability checking.

```APIDOC
SYCL Device Management API:

create_queue(bool print_on_async_exceptions = false, bool in_order = true)
  - Creates a new SYCL queue for the current device.
  - Parameters:
    - print_on_async_exceptions: If true, async exceptions are printed to stderr.
    - in_order: If true, operations on the queue are in-order.
  - Returns: A sycl::queue object.

get_default_queue()
  - Retrieves the default SYCL queue for the current device.
  - Returns: The default sycl::queue.

set_default_queue(const sycl::queue& q)
  - Sets the default SYCL queue for the current device.
  - This operation is blocking and waits for any submitted kernels in the previous default queue to complete.
  - If the previous default queue was the device's saved queue, the saved queue reference is updated.
  - Parameters:
    - q: The sycl::queue to set as the default.

wait(sycl::queue q = get_default_queue())
  - Waits for all queued kernels in the specified queue to complete.
  - Parameters:
    - q: The sycl::queue to wait on (defaults to the default queue).

wait_and_throw(sycl::queue q = get_default_queue())
  - Waits for all queued kernels in the specified queue to complete and throws any unhandled exceptions.
  - Parameters:
    - q: The sycl::queue to wait on (defaults to the default queue).

get_current_device_id()
  - Returns the ID of the current SYCL device.
  - Returns: The unsigned integer ID of the current device.

get_current_device()
  - Returns a reference to the current SYCL device extension (`device_ext`).
  - Returns: A reference to the current device_ext.

get_device(unsigned int id)
  - Returns a reference to a SYCL device extension by its ID.
  - Parameters:
    - id: The ID of the device to retrieve.
  - Returns: A reference to the device_ext.

get_default_context()
  - Retrieves the SYCL context associated with the default queue of the current device.
  - Returns: The sycl::context.

cpu_device()
  - Returns a reference to the CPU SYCL device extension.
  - Returns: A reference to the CPU device_ext.

filter_device(const std::vector<std::string>& dev_subnames)
  - Filters the available SYCL devices, keeping only those whose names contain any of the provided substrings.
  - Warning: This may alter device ID mappings and the current device. It's recommended to call this before other SYCLcompat or SYCL APIs.
  - Parameters:
    - dev_subnames: A vector of strings representing device name substrings to filter by.

list_devices()
  - Prints all available SYCL devices and their IDs to standard output.

select_device(unsigned int id)
  - Selects a SYCL device by its ID.
  - Parameters:
    - id: The ID of the device to select.
  - Returns: The ID of the selected device.

get_device_id(const sycl::device& dev)
  - Retrieves the ID of a given SYCL device.
  - Parameters:
    - dev: The sycl::device object.
  - Returns: The unsigned integer ID of the device.

device_count()
  - Returns the total number of available SYCL devices.
  - Returns: The count of available devices.

has_capability_or_fail(const sycl::device& dev, const std::initializer_list<sycl::aspect>& props)
  - Checks if a given SYCL device supports all specified SYCL aspects.
  - If the device does not support all aspects, a sycl::exception is thrown.
  - Parameters:
    - dev: The sycl::device to check.
    - props: An initializer list of sycl::aspects to verify.

Device Extension Class (`device_ext`):
  device_ext()
    - Default constructor.
  device_ext(const sycl::device& base, bool print_on_async_exceptions = false, bool in_order = true)
    - Constructor that initializes with a base SYCL device and optional queue properties.
    - Parameters:
      - base: The base sycl::device.
      - print_on_async_exceptions: Controls printing of async exceptions.
      - in_order: Controls queue ordering.
  ~device_ext()
    - Destructor. Waits on associated events and cleans up resources.
  is_native_host_atomic_supported()
    - Checks if native host atomics are supported by the device.
    - Returns: True if supported, false otherwise.
```