### Cupy Distributed Computing Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.signal.windows.blackmanharris Initialization and backend setup for distributed GPU computing. ```APIDOC ## Distributed Computing API ### Description This API facilitates distributed computation across multiple GPUs and nodes by providing initialization functions and backend support for communication libraries. ### Endpoints - `cupyx.distributed.init_process_group` - `cupyx.distributed.NCCLBackend` ### Details - **init_process_group**: Initializes the distributed process group, setting up communication. - **NCCLBackend**: Specifies the NCCL backend for distributed operations, leveraging efficient NCCL communication primitives. ``` -------------------------------- ### Launch and Synchronize CUDA Graph on a Stream (CuPy) Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ExternalStream Demonstrates launching a CUDA graph 'g' on a specific stream 's1' and then synchronizing that stream to ensure completion. It also shows how to use a context manager for launching on another stream 's2'. ```python g.launch(stream=s1) s1.synchronize() s2 = cp.cuda.Stream() with s2: g.launch() s2.synchronize() ``` -------------------------------- ### cuPy.cuda.runtime.driverGetVersion Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.signal.savgol_filter Gets the version of the installed NVIDIA driver. ```APIDOC ## cupy.cuda.runtime.driverGetVersion ### Description Gets the version of the installed NVIDIA driver. ### Method GET ### Endpoint `cupy.cuda.runtime.driverGetVersion` ### Parameters N/A ### Request Example N/A ### Response #### Success Response (200) - **version** (int) - The driver version. #### Response Example N/A ``` -------------------------------- ### Get CUDA Driver Version (Python) Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.runtime.driverGetVersion Retrieves the version of the NVIDIA CUDA driver installed on the system. This function does not take any arguments and returns an integer representing the driver version. ```python import cupy driver_version = cupy.cuda.runtime.driverGetVersion() print(f"CUDA Driver Version: {driver_version}") ``` -------------------------------- ### Cupy JIT Kernel Syntax Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.polynomial.polynomial.polycompanion API documentation for Just-In-Time (JIT) kernel construction utilities. ```APIDOC ## cupyx.jit ### Description Utilities for Just-In-Time (JIT) compilation of CUDA kernels, providing access to thread and block dimensions, synchronization primitives, and shared memory. ### Endpoints * `cupyx.jit.rawkernel` * `cupyx.jit.threadIdx` * `cupyx.jit.blockDim` * `cupyx.jit.blockIdx` * `cupyx.jit.gridDim` * `cupyx.jit.grid` * `cupyx.jit.gridsize` * `cupyx.jit.laneid` * `cupyx.jit.warpsize` * `cupyx.jit.range` * `cupyx.jit.syncthreads` * `cupyx.jit.syncwarp` * `cupyx.jit.shfl_sync` * `cupyx.jit.shfl_up_sync` * `cupyx.jit.shfl_down_sync` * `cupyx.jit.shfl_xor_sync` * `cupyx.jit.shared_memory` * `cupyx.jit.atomic_add` * `cupyx.jit.atomic_sub` * `cupyx.jit.atomic_exch` * `cupyx.jit.atomic_min` * `cupyx.jit.atomic_max` * `cupyx.jit.atomic_inc` * `cupyx.jit.atomic_dec` * `cupyx.jit.atomic_cas` * `cupyx.jit.atomic_and` * `cupyx.jit.atomic_or` * `cupyx.jit.atomic_xor` * `cupyx.jit.cg.this_grid` * `cupyx.jit.cg.this_thread_block` * `cupyx.jit.cg.sync` * `cupyx.jit.cg.memcpy_async` * `cupyx.jit.cg.wait` * `cupyx.jit.cg.wait_prior` * `cupyx.jit._interface._JitRawKernel` ``` -------------------------------- ### Cupy CUDA Kernel Launch API Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.jit.cg.this_grid Documentation for launching custom kernels (ElementwiseKernel, ReductionKernel, RawKernel) and managing kernel modules. ```APIDOC ## Cupy CUDA Kernel Launch API ### Description Provides classes and functions for defining, compiling, and launching custom CUDA kernels, including element-wise, reduction, and raw kernels. ### Classes - **`cupy.ElementwiseKernel(input_dtype, output_dtype, kernel, name, options, backend)`** - A kernel that applies an operation element-wise to input arrays. - **`cupy.ReductionKernel(input_dtype, output_dtype, kernel, neutral_element, reducer, name, options, backend)`** - A kernel that performs reduction operations on input arrays. - **`cupy.RawKernel(code, name, options, backend)`** - A kernel that directly uses raw CUDA C/C++ code. - **`cupy.RawModule(code, options, backend)`** - A module that can contain multiple raw kernels. ### Functions - **`cupy.fuse(*kernels, **kwargs)`** - Fuses multiple kernels into a single kernel for optimization. ### JIT (Just-In-Time) Compilation API - **`cupyx.jit.rawkernel(code, name, options, backend)`** - Decorator for defining raw kernels with JIT compilation. - **`cupyx.jit.threadIdx`**, **`cupyx.jit.blockDim`**, **`cupyx.jit.blockIdx`**, **`cupyx.jit.gridDim`** - Thread and block indexing variables for JIT kernels. - **`cupyx.jit.grid`**, **`cupyx.jit.gridsize`** - Functions to get grid dimensions. - **`cupyx.jit.laneid`**, **`cupyx.jit.warpsize`** - Thread and warp identifiers. - **`cupyx.jit.range(start, end)`** - Creates a range for loop unrolling in JIT kernels. - **`cupyx.jit.syncthreads()`**, **`cupyx.jit.syncwarp()`** - Synchronization primitives within a thread block or warp. - **`cupyx.jit.shfl_sync`**, **`cupyx.jit.shfl_up_sync`**, **`cupyx.jit.shfl_down_sync`**, **`cupyx.jit.shfl_xor_sync`** - Warp shuffle operations. - **`cupyx.jit.shared_memory`** - Access to shared memory in JIT kernels. - **`cupyx.jit.atomic_add`**, **`cupyx.jit.atomic_sub`**, **`cupyx.jit.atomic_exch`**, **`cupyx.jit.atomic_min`**, **`cupyx.jit.atomic_max`**, **`cupyx.jit.atomic_inc`**, **`cupyx.jit.atomic_dec`**, **`cupyx.jit.atomic_cas`**, **`cupyx.jit.atomic_and`**, **`cupyx.jit.atomic_or`**, **`cupyx.jit.atomic_xor`** - Atomic operations for JIT kernels. - **`cupyx.jit.cg.this_grid`**, **`cupyx.jit.cg.this_thread_block`**, **`cupyx.jit.cg.sync`**, **`cupyx.jit.cg.memcpy_async`**, **`cupyx.jit.cg.wait`**, **`cupyx.jit.cg.wait_prior`** - CUDA Graph related JIT APIs. ### Internal Interfaces - **`cupyx.jit._interface._JitRawKernel`** - Internal base class for JIT raw kernels. ### Memoization - **`cupy.memoize()`** - Decorator to memoize kernel compilation results. - **`cupy.clear_memo()`** - Clears the kernel compilation cache. ``` -------------------------------- ### Start Nested Profiling Range with RangePush and RangePop (Python) Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.nvtx.RangePush This example demonstrates how to use `RangePush` and `RangePop` to create nested profiling ranges. It's useful for marking specific sections of code execution for performance analysis. The `message` parameter names the range, and `id_color` can be used to assign a specific color for visualization. This functionality is crucial for debugging and optimizing performance by identifying bottlenecks. ```python from cupy.cuda.nvtx import RangePush, RangePop RangePush("Nested Powers of A") for i in range(N): RangePush("Iter {}: Double A".format(i)) A = 2*A RangePop() RangePop() ``` -------------------------------- ### cupy.show_config Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.tril_indices Print the configuration of the current CuPy environment. ```APIDOC ## cupy.show_config ### Description Print the configuration of the current CuPy environment. ### Method GET ### Endpoint N/A (Function within the library) ### Parameters #### Path Parameters None #### Query Parameters None #### Request Body None ### Request Example ```python import cupy as cp cp.show_config() ``` ### Response #### Success Response (200) Prints the CuPy configuration to standard output. #### Response Example ``` Platform: linux Arch: x86_64 OS: linux CUDA available: True CUDA root: /usr/local/cuda-11.7 CUDA version: 11.7 cuDNN available: True cuDNN version: 8.4 BLAS vendor: OpenBLAS BLAS version: 0.3.19 ``` ``` -------------------------------- ### cuPy.get_array_module Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.minres Gets the array module. ```APIDOC ## cupy.get_array_module ### Description Gets the array module, which can be either CuPy or NumPy depending on the input. ### Method N/A (Function) ### Endpoint N/A (Function) ### Parameters N/A ### Request Example N/A ### Response N/A ``` -------------------------------- ### Launching and Synchronizing CUDA Graphs on Streams Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.Stream Demonstrates how to launch a CUDA graph on a specific stream and synchronize its execution. This allows for fine-grained control over asynchronous operations and ensures that subsequent code waits for the graph to complete. ```python g.launch(stream=s1) s1.synchronize() ``` ```python s2 = cp.cuda.Stream() with s2: g.launch() s2.synchronize() ``` -------------------------------- ### cupyx.scipy.get_array_module Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.minres Gets the array module for SciPy. ```APIDOC ## cupyx.scipy.get_array_module ### Description Gets the array module specifically for SciPy functions when used with CuPy. ### Method N/A (Function) ### Endpoint N/A (Function) ### Parameters N/A ### Request Example N/A ### Response N/A ``` -------------------------------- ### cupy.tril_indices_from Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.tril_indices_from Get the indices for the lower-triangle of an array. ```APIDOC ## cupy.tril_indices_from ### Description Returns the indices for the lower-triangle of arr. ### Method N/A (This is a function call, not a REST API endpoint) ### Endpoint N/A ### Parameters #### Path Parameters N/A #### Query Parameters N/A #### Request Body N/A ### Request Example ```python import cupy arr = cupy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) tril_indices = cupy.tril_indices_from(arr, k=0) print(tril_indices) # Output: (array([0, 1, 1, 2, 2, 2]), array([0, 0, 1, 0, 1, 2])) ``` ### Response #### Success Response (200) N/A (This is a function call, not a REST API endpoint) #### Response Example N/A ``` -------------------------------- ### Cupy Distributed Backend Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.signal.vectorstrength Initialization for distributed computing with NCCL backend. ```APIDOC ## Cupy Distributed Backend ### Description Functions for initializing distributed process groups, primarily using the NCCL backend for multi-GPU communication. ### API - `cupyx.distributed.init_process_group`: Initializes a distributed process group. - `cupyx.distributed.NCCLBackend`: Specifies the NCCL backend for distributed operations. ``` -------------------------------- ### CuPy SciPy Sparse Matrix Constructors Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.SuperLU Documentation for creating sparse matrices using various formats like COO, CSC, and CSR. ```APIDOC ## CuPy SciPy Sparse Matrix Constructors ### Description Provides constructors for creating sparse matrices in different formats, including Coordinate (COO), Compressed Sparse Row (CSR), and Compressed Sparse Column (CSC). ### Method Various (primarily class constructors) ### Endpoint N/A (Class instantiations) ### Parameters Refer to individual sparse matrix class documentation for specific parameters. ### Request Example N/A ### Response N/A (Returns sparse matrix objects) #### Success Response (N/A) N/A #### Response Example N/A **Sparse Matrix Constructors:** - `coo_matrix` - `csc_matrix` - `csr_matrix` - `dia_matrix` ``` -------------------------------- ### cupy.cuda.runtime.deviceGetMemPool Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.runtime.deviceGetMemPool Get the current mempool on the current device. ```APIDOC ## cupy.cuda.runtime.deviceGetMemPool ### Description Get the current mempool on the current device. ### Method GET ### Endpoint /cupy/cuda/runtime/deviceGetMemPool ### Parameters #### Path Parameters - **device** (int) - Required - The device ID for which to retrieve the memory pool. ### Request Example ```json { "device": 0 } ``` ### Response #### Success Response (200) - **mempool_ptr** (intptr_t) - A pointer to the memory pool. #### Response Example ```json { "mempool_ptr": 140707073071488 } ``` ``` -------------------------------- ### cuPy.cuda.runtime.getDeviceCount Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.linalg.eigvalsh Describes the getDeviceCount function to get the number of devices. ```APIDOC ## cupy.cuda.runtime.getDeviceCount ### Description Retrieves the number of CUDA devices available. ### Method GET ### Endpoint /cupy/cuda/runtime/getDeviceCount ### Parameters N/A ### Request Example N/A ### Response N/A ``` -------------------------------- ### cupyx.scipy.sparse Matrix Constructors and Utilities Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.ndimage.uniform_filter Documentation for sparse matrix constructors and utility functions in Cupy's SciPy implementation. ```APIDOC ## cupyx.scipy.sparse Matrix Constructors and Utilities ### Description Provides functionalities for creating and manipulating sparse matrices, including various formats and utility functions. ### Method N/A (These are function/class calls, not API endpoints) ### Endpoint N/A ### Parameters Refer to individual function or class documentation for specific parameters. ### Request Example N/A ### Response N/A ### Available Functions/Classes: - **Constructors:** - `coo_matrix` - `csc_matrix` - `csr_matrix` - `dia_matrix` - **Utilities:** - `spmatrix` (Base class) - `eye` - `identity` - `kron` - `kronsum` - `diags` - `spdiags` - `tril` - `triu` - `bmat` - `hstack` - `vstack` - `rand` - `random` - `find` - `issparse` - `isspmatrix` - `isspmatrix_csc` - `isspmatrix_csr` - `isspmatrix_coo` - `isspmatrix_dia` ``` -------------------------------- ### cupyx.scipy.sparse Matrix Constructors and Operations Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.tanh Documentation for sparse matrix constructors and basic operations in the cupyx.scipy.sparse module. ```APIDOC ## cupyx.scipy.sparse Matrix Constructors and Operations ### Description Provides tools for creating and manipulating sparse matrices, which are matrices where most elements are zero. ### Methods - **coo_matrix**: Constructor for the COOrdinate format sparse matrix. - **csc_matrix**: Constructor for the Compressed Sparse Column format sparse matrix. - **csr_matrix**: Constructor for the Compressed Sparse Row format sparse matrix. - **dia_matrix**: Constructor for the DIAgonal format sparse matrix. - **spmatrix**: Base class for sparse matrices. - **eye**: Creates an identity matrix in sparse format. - **identity**: Creates an identity matrix in sparse format. - **kron**: Kronecker product of two sparse matrices. - **kronsum**: Kronecker sum of two sparse matrices. - **diags**: Extract diagonals and form a sparse matrix. - **spdiags**: Construct a sparse matrix from diagonals. - **tril**: Return the lower triangular part of a matrix. - **triu**: Return the upper triangular part of a matrix. - **bmat**: Construct a sparse matrix from a list of blocks. - **hstack**: Stack sparse matrices horizontally. - **vstack**: Stack sparse matrices vertically. - **rand**: Create a random sparse matrix. - **random**: Create a random sparse matrix. - **find**: Return the indices and values of the non-zero elements. - **issparse**: Check if an object is a sparse matrix. - **isspmatrix**: Check if an object is a sparse matrix (generic). - **isspmatrix_csc**: Check if an object is a CSC sparse matrix. - **isspmatrix_csr**: Check if an object is a CSR sparse matrix. - **isspmatrix_coo**: Check if an object is a COO sparse matrix. - **isspmatrix_dia**: Check if an object is a DIA sparse matrix. ### Example ```python import cupy as cp from cupyx.scipy.sparse import csr_matrix # Create a sparse matrix row = cp.array([0, 0, 1, 2, 2, 2]) col = cp.array([0, 2, 2, 0, 1, 2]) data = cp.array([1, 2, 3, 4, 5, 6]) s = csr_matrix((data, (row, col)), shape=(3, 3)) print(s.toarray()) # Convert to dense array for printing ``` ``` -------------------------------- ### cuPy.cuda.runtime.getDeviceProperties Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.linalg.eigvalsh Describes the getDeviceProperties function to get the device properties. ```APIDOC ## cupy.cuda.runtime.getDeviceProperties ### Description Retrieves properties of a CUDA device. ### Method GET ### Endpoint /cupy/cuda/runtime/getDeviceProperties ### Parameters N/A ### Request Example N/A ### Response N/A ``` -------------------------------- ### Distributed Computing Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.linalg.companion Utilities for initializing distributed training environments. ```APIDOC ## Distributed Computing ### Description Provides functionalities to initialize distributed communication backends, essential for multi-GPU and multi-node training. ### Endpoints #### `cupyx.distributed.init_process_group` - **Description**: Initializes a process group for distributed communication. #### `cupyx.distributed.NCCLBackend` - **Description**: Specifies the NCCL backend for distributed communication. ``` -------------------------------- ### cuPy.cuda.runtime.getDevice Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.linalg.eigvalsh Describes the getDevice function to get the current device. ```APIDOC ## cupy.cuda.runtime.getDevice ### Description Retrieves the currently selected device. ### Method GET ### Endpoint /cupy/cuda/runtime/getDevice ### Parameters N/A ### Request Example N/A ### Response N/A ``` -------------------------------- ### cupy.cuda.nvtx.RangePop Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.nvtx.RangePop Ends a nested range started by a RangePush*() call. ```APIDOC ## cupy.cuda.nvtx.RangePop ### Description Ends a nested range started by a `RangePush*()` call. ### Method `cupy.cuda.nvtx.RangePop()` ### Endpoint N/A (This is a Python function, not a REST endpoint) ### Parameters This function does not take any parameters. ### Request Example ```python import cupy # Assuming a RangePush was made previously cupy.cuda.nvtx.RangePop() ``` ### Response This function does not return a value. Its effect is to mark the end of a range in NVTX profiling. #### Success Response (200) N/A #### Response Example N/A ``` -------------------------------- ### cupyx.scipy.sparse Matrix Constructors and Utilities Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.signal.argrelmax Documentation for creating and manipulating sparse matrices using CuPy. ```APIDOC ## cupyx.scipy.sparse Matrix Operations ### Description This section covers the creation of various sparse matrix formats and utility functions for sparse matrices. ### Method Various (primarily class instantiation and function calls) ### Endpoints - `cupyx.scipy.sparse.coo_matrix` - `cupyx.scipy.sparse.csc_matrix` - `cupyx.scipy.sparse.csr_matrix` - `cupyx.scipy.sparse.dia_matrix` - `cupyx.scipy.sparse.spmatrix` - `cupyx.scipy.sparse.eye` - `cupyx.scipy.sparse.identity` - `cupyx.scipy.sparse.kron` - `cupyx.scipy.sparse.kronsum` - `cupyx.scipy.sparse.diags` - `cupyx.scipy.sparse.spdiags` - `cupyx.scipy.sparse.tril` - `cupyx.scipy.sparse.triu` - `cupyx.scipy.sparse.bmat` - `cupyx.scipy.sparse.hstack` - `cupyx.scipy.sparse.vstack` - `cupyx.scipy.sparse.rand` - `cupyx.scipy.sparse.random` - `cupyx.scipy.sparse.find` - `cupyx.scipy.sparse.issparse` - `cupyx.scipy.sparse.isspmatrix` - `cupyx.scipy.sparse.isspmatrix_csc` - `cupyx.scipy.sparse.isspmatrix_csr` - `cupyx.scipy.sparse.isspmatrix_coo` - `cupyx.scipy.sparse.isspmatrix_dia` ### Parameters Parameters vary by class constructor or function. Common parameters include: - `data` (cupy.ndarray): The non-zero elements of the matrix. - `indices` (cupy.ndarray): Indices for the elements (e.g., column indices for CSR/CSC). - `indptr` (cupy.ndarray): Pointers to the start of rows (for CSR/CSC). - `shape` (tuple): The dimensions of the sparse matrix. - `format` (str): The desired sparse matrix format. ### Request Example ```python import cupy as cp from cupyx.scipy.sparse import csr_matrix # Create a CSR sparse matrix row = cp.array([0, 0, 1, 2, 2, 2]) col = cp.array([0, 2, 2, 0, 1, 2]) data = cp.array([1, 2, 3, 4, 5, 6]) s = csr_matrix((data, (row, col)), shape=(3, 3)) print(s) print(s.toarray()) # Convert to dense array for printing ``` ### Response Returns a sparse matrix object of the specified format or a boolean for utility functions. #### Success Response (200) - **sparse_matrix** (cupyx.scipy.sparse.spmatrix subclass) - A sparse matrix object. - **boolean** - For utility functions like `issparse`. ``` -------------------------------- ### CUDA Profiler Source: https://docs.cupy.dev/en/stable/reference/array_api Functions to start and stop the CUDA profiler. ```APIDOC ## CUDA Profiler ### Description Provides basic control for starting and stopping the CUDA profiler. ### Endpoints - `cupy.cuda.runtime.profilerStart` - `cupy.cuda.runtime.profilerStop` ``` -------------------------------- ### Distributed Computing Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.signal.hilbert Utilities for initializing distributed training environments using NCCL backend. ```APIDOC ## Distributed Computing ### Description Tools for setting up and managing distributed training processes, leveraging the NCCL backend for communication. ### Endpoints - `cupyx.distributed.init_process_group`: Initializes a distributed process group. - `cupyx.distributed.NCCLBackend`: Represents the NCCL backend for distributed communication. ``` -------------------------------- ### Cupy JIT (Just-In-Time) Compilation Helpers Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.fft.ifftn Utilities for writing JIT-compiled kernels, including thread/block/grid dimensions and synchronization primitives. ```APIDOC ## cupyx.jit ### Description Provides JIT compilation capabilities and CUDA-specific built-in functions for custom kernels. ### Endpoints - `cupyx.jit.rawkernel` - `cupyx.jit.threadIdx` - `cupyx.jit.blockDim` - `cupyx.jit.blockIdx` - `cupyx.jit.gridDim` - `cupyx.jit.grid` - `cupyx.jit.gridsize` - `cupyx.jit.laneid` - `cupyx.jit.warpsize` - `cupyx.jit.range` - `cupyx.jit.syncthreads` - `cupyx.jit.syncwarp` - `cupyx.jit.shfl_sync` - `cupyx.jit.shfl_up_sync` - `cupyx.jit.shfl_down_sync` - `cupyx.jit.shfl_xor_sync` - `cupyx.jit.shared_memory` - Atomic Operations: - `cupyx.jit.atomic_add` - `cupyx.jit.atomic_sub` - `cupyx.jit.atomic_exch` - `cupyx.jit.atomic_min` - `cupyx.jit.atomic_max` - `cupyx.jit.atomic_inc` - `cupyx.jit.atomic_dec` - `cupyx.jit.atomic_cas` - `cupyx.jit.atomic_and` - `cupyx.jit.atomic_or` - `cupyx.jit.atomic_xor` - Cooperative Groups: - `cupyx.jit.cg.this_grid` - `cupyx.jit.cg.this_thread_block` - `cupyx.jit.cg.sync` - `cupyx.jit.cg.memcpy_async` - `cupyx.jit.cg.wait` - `cupyx.jit.cg.wait_prior` ``` -------------------------------- ### CUDA Version Source: https://docs.cupy.dev/en/stable/reference/cuda Retrieves information about the installed CUDA runtime version. ```APIDOC ## cupy.cuda.get_local_runtime_version ### Description Returns the version of the CUDA Runtime installed in the environment. ### Method `cupy.cuda.get_local_runtime_version` ### Parameters None ### Response - **version** (str): The version string of the CUDA Runtime. ``` -------------------------------- ### CUDA Profiler Control Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.sparse.linalg.svds APIs to start and stop the CUDA profiler. ```APIDOC ## CUDA Profiler Control API ### Description APIs to control the CUDA profiler, enabling and disabling profiling of GPU execution. ### Endpoints - `cupy.cuda.runtime.profilerStart` - `cupy.cuda.runtime.profilerStop` ``` -------------------------------- ### Optimize kernel launch parameters with cupyx.optimizing.optimize in Python Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.optimizing.optimize This code snippet demonstrates how to use the `cupyx.optimizing.optimize` context manager in Python to optimize kernel launch parameters for CuPy routines. It requires Optuna to be installed and currently supports reduction operations. The manager automatically finds and caches optimal values for thread and block counts, reusing them for subsequent calls with similar array characteristics. ```python import cupy from cupyx import optimizing x = cupy.arange(100) with optimizing.optimize(): cupy.sum(x) ``` -------------------------------- ### CuPy LineProfileHook Usage Example Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.memory_hooks.LineProfileHook This example demonstrates how to use the LineProfileHook to profile CuPy memory usage. It initializes the hook, wraps a block of CuPy code with the hook using a 'with' statement, and then prints the profiling report. This hook is useful for identifying memory hotspots in CuPy applications. ```python from cupy.cuda import memory_hooks hook = memory_hooks.LineProfileHook() with hook: # some CuPy codes pass # Replace with your CuPy operations hook.print_report() ``` -------------------------------- ### cupyx.scipy.sparse Matrix Constructors and Utilities Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.fft.fftshift Documentation for creating and manipulating sparse matrices, including various formats and utility functions. ```APIDOC ## Sparse Matrix Operations ### Description Provides tools for creating, manipulating, and querying sparse matrices, supporting formats like COO, CSC, CSR, and DIA. ### Constructors - `coo_matrix` - `csc_matrix` - `csr_matrix` - `dia_matrix` - `spmatrix` ### Utility Functions - `eye` - `identity` - `kron` - `kronsum` - `diags` - `spdiags` - `tril` - `triu` - `bmat` - `hstack` - `vstack` - `rand` - `random` - `find` - `issparse` - `isspmatrix` - `isspmatrix_csc` - `isspmatrix_csr` - `isspmatrix_coo` - `isspmatrix_dia` ### Endpoints All functions and constructors are accessed via `cupyx.scipy.sparse`. ``` -------------------------------- ### Start Nested Range with NVTX Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.nvtx.RangePushC Demonstrates how to start a nested profiling range using RangePushC and end it with RangePop. This function is useful for marking specific sections of code for performance analysis, especially when using tools like Nsight Systems. The color parameter allows for visual differentiation of ranges. ```python from cupy.cuda.nvtx import RangePushC, RangePop RangePush("Nested Powers of A") for i in range(N): RangePushC("Iter {}: Double A".format(i)) A = 2*A RangePop() RangePop() ``` -------------------------------- ### Distributed Computing Source: https://docs.cupy.dev/en/stable/reference/scipy_linalg Utilities for initializing distributed training environments using NCCL. ```APIDOC ## Distributed Computing ### Description APIs for initializing distributed process groups, primarily for use with the NCCL backend for multi-node, multi-GPU training. ### Endpoints - `cupyx.distributed.init_process_group`: Initializes a distributed process group. - `cupyx.distributed.NCCLBackend`: Represents the NCCL backend for distributed communication. ``` -------------------------------- ### STFT and iSTFT Example with CuPy Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.signal.istft Demonstrates the process of generating a test signal, computing its Short Time Fourier Transform (STFT) using CuPy, and then reconstructing the original signal using the inverse STFT (iSTFT). This example showcases the typical workflow for time-frequency analysis and signal reconstruction with CuPy. ```python import cupy from cupyx.scipy.signal import stft, istft import matplotlib.pyplot as plt # Generate a test signal: a 2 Vrms sine wave at 50Hz with white noise fs = 1024 nperseg = 512 amp = 2 * cupy.sqrt(2) noise_power = 0.001 * fs / 2 N = 10 * fs time = cupy.arange(N) / float(fs) carrier = amp * cupy.sin(2 * cupy.pi * 50 * time) noise = cupy.random.normal(scale=cupy.sqrt(noise_power), size=time.shape) x = carrier + noise # Compute the STFT # Note: In a real scenario, you would define nperseg and noverlap based on your analysis needs. # For this example, let's assume nperseg and noverlap are defined as above. freqs, times, Zxx = stft(x, fs=fs, nperseg=nperseg) # Reconstruct the signal using iSTFT t_reconstructed, x_reconstructed = istft(Zxx, fs=fs, nperseg=nperseg) # Plotting (requires matplotlib and numpy, assuming conversion if needed) # If you are running this in a CuPy environment, you might need to move data to CPU for plotting # For example: x_cpu = cupy.asnumpy(x) # plt.plot(time, x_cpu, label='Original Signal') # plt.plot(t_reconstructed, x_reconstructed, label='Reconstructed Signal') # plt.xlabel('Time [s]') # plt.ylabel('Amplitude') # plt.title('Original vs Reconstructed Signal') # plt.legend() # plt.show() ``` -------------------------------- ### Cupy Distributed Initialization and NCCL Backend Source: https://docs.cupy.dev/en/stable/reference/generated/cupyx.scipy.stats.zmap Supports distributed computing setups using Cupy, primarily with the NCCL backend. `cupyx.distributed.init_process_group` initializes the distributed environment, setting up communication primitives. `cupyx.distributed.NCCLBackend` specifies NCCL as the backend for collective operations. ```python import cupy as cp from cupyx.distributed import init_process_group, NCCLBackend # Example of initializing a distributed process group # This typically runs in a distributed environment (e.g., using mpirun or torch.distributed) # rank = 0 # Current process rank # world_size = 2 # Total number of processes # init_process_group(backend=NCCLBackend(), rank=rank, world_size=world_size) # After initialization, you can use NCCL-based collective operations. # print(f"Distributed process group initialized on rank {rank}.") ``` -------------------------------- ### Cupy CUDA Pointer Attributes Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.tri Function to get attributes of a pointer on the device. ```APIDOC ## CUDA Pointer Attributes API ### Description Retrieves detailed attributes associated with a given device pointer. ### Endpoints - `cupy.cuda.runtime.pointerGetAttributes`: Gets attributes of a pointer. ``` -------------------------------- ### Cupy JIT (Just-In-Time) Compilation Helpers Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.tri Helper functions and variables for writing JIT-compiled kernels. ```APIDOC ## JIT Compilation Helpers API ### Description Provides intrinsic functions and variables usable within JIT-compiled kernels for thread and block coordination, synchronization, and shared memory access. ### Functions and Variables - `cupyx.jit.threadIdx`: Thread index within a block. - `cupyx.jit.blockDim`: Dimensions of a block. - `cupyx.jit.blockIdx`: Block index within a grid. - `cupyx.jit.gridDim`: Dimensions of a grid. - `cupyx.jit.grid`: Grid dimensions. - `cupyx.jit.gridsize`: Total grid size. - `cupyx.jit.laneid`: Lane ID within a warp. - `cupyx.jit.warpsize`: Warp size. - `cupyx.jit.range`: Creates a range iterator. - `cupyx.jit.syncthreads`: Synchronizes all threads in a block. - `cupyx.jit.syncwarp`: Synchronizes all threads in a warp. - `cupyx.jit.shfl_sync`: Shuffle operation with synchronization. - `cupyx.jit.shfl_up_sync`: Upward shuffle operation with synchronization. - `cupyx.jit.shfl_down_sync`: Downward shuffle operation with synchronization. - `cupyx.jit.shfl_xor_sync`: XOR shuffle operation with synchronization. - `cupyx.jit.shared_memory`: Accesses shared memory. - `cupyx.jit.atomic_add`, `cupyx.jit.atomic_sub`, `cupyx.jit.atomic_exch`, `cupyx.jit.atomic_min`, `cupyx.jit.atomic_max`, `cupyx.jit.atomic_inc`, `cupyx.jit.atomic_dec`, `cupyx.jit.atomic_cas`, `cupyx.jit.atomic_and`, `cupyx.jit.atomic_or`, `cupyx.jit.atomic_xor`: Atomic operations. - `cupyx.jit.cg.this_grid`: Current grid. - `cupyx.jit.cg.this_thread_block`: Current thread block. - `cupyx.jit.cg.sync`: Synchronizes threads. - `cupyx.jit.cg.memcpy_async`: Asynchronous memory copy. - `cupyx.jit.cg.wait`: Waits for operations. - `cupyx.jit.cg.wait_prior`: Waits for prior operations. ``` -------------------------------- ### Cupy Distributed Computing Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.runtime.deviceSetLimit Documentation for initializing distributed environments and using communication backends in CuPy. ```APIDOC ## Distributed Computing API ### Description Tools for setting up and managing distributed computations across multiple processes and nodes using CuPy. ### Endpoints - **Initialization** - **cupyx.distributed.init_process_group** - Description: Initializes a distributed process group. - **Communication Backends** - **cupyx.distributed.NCCLBackend** - Description: The NCCL backend for distributed communication. ``` -------------------------------- ### cuPy.cuda.runtime.deviceGetDefaultMemPool Source: https://docs.cupy.dev/en/stable/reference/generated/cupy.linalg.eigvalsh Describes the deviceGetDefaultMemPool function to get the default memory pool for a device. ```APIDOC ## cupy.cuda.runtime.deviceGetDefaultMemPool ### Description Retrieves the default memory pool for a device. ### Method GET ### Endpoint /cupy/cuda/runtime/deviceGetDefaultMemPool ### Parameters N/A ### Request Example N/A ### Response N/A ```