### JAX Basic Training with DALI Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/index Demonstrates how to train a neural network using DALI for data loading and JAX as the machine learning framework. This example covers basic integration and data pipeline setup. ```python import jax import jax.numpy as jnp from jax import grad, jit, vmap import nvidia.dali.fn as fn from nvidia.dali.plugin.jax import DALIGenerator # Assuming a DALI pipeline is defined elsewhere # pipeline = fn. ... # DALI iterator for JAX dali_iter = DALIGenerator(pipeline=pipeline, jax_format='...', batch_size=32, num_threads=4, device_id=0) def train_step(params, batch): # Your JAX training logic here pass # Example training loop for epoch in range(num_epochs): for batch in dali_iter: # Process batch (e.g., move to device if needed) # params = train_step(params, batch) pass ``` -------------------------------- ### Multi-GPU Data Loading Example Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.caffe2 An example showcasing how to read data in a multi-GPU setup using NVIDIA DALI. ```APIDOC ## Example: Multi-GPU Data Loading ### Description This example demonstrates how to configure NVIDIA DALI for reading data efficiently across multiple GPUs in a distributed training environment. ### Method N/A (Example Code) ### Endpoint N/A (Example Code) ### Parameters N/A (Example Code) ### Request Example ```python # Example Python code snippet for multi-GPU setup # (Actual code would be more detailed and involve distributed settings) from nvidia.dali.plugin.pytorch import DALIClassificationIterator import nvidia.dali.fn as fn import os # Assuming data is partitioned or a reader supports distribution # Example with a reader that can be sharded pipe_per_gpu = fn.readers.external(shard_id=world_rank, num_shards=world_size) # ... further pipeline definition ... # DALI iterators are typically created per GPU # dali_iter_gpu = DALIClassificationIterator(pipe_per_gpu, ...) # In a real scenario, world_rank and world_size would come from torch.distributed world_rank = int(os.environ["RANK"]) world_size = int(os.environ["WORLD_SIZE"]) # ... training loop ... ``` ### Response #### Success Response (N/A) N/A #### Response Example N/A ``` -------------------------------- ### NumPy Reader Examples Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.numpy Illustrative examples of using the NumPy Reader, including reading directly to GPU memory with GPUDirect storage and multi-GPU setups. ```APIDOC ## NumPy Reader Usage Examples ### Description Examples demonstrating how to use the NumPy Reader for various data loading scenarios. ### Method N/A (Examples) ### Endpoint N/A (Examples) ### Parameters N/A (Examples) ### Request Example ```python # Example of reading NumPy array files, including reading directly to GPU memory utilizing the GPUDirect storage # Refer to: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/data_loading/numpy_reader.html.md # Example of reading the data in the multi-GPU setup. # Refer to: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/multigpu.html.md ``` ### Response #### Success Response (N/A) N/A #### Response Example N/A ``` -------------------------------- ### Custom JAX Augmentations in DALI Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/other_index Demonstrates integrating custom data augmentations written using JAX into NVIDIA DALI pipelines. This allows leveraging JAX's capabilities for automatic differentiation and hardware acceleration within DALI workflows. It requires JAX to be installed and configured. ```python import jax import jax.numpy as jnp from nvidia.dali.pipeline import Pipeline from nvidia.dali.ops import jax_op @jax.jit def jax_augmentation(x): # Implement custom JAX augmentation logic return augmented_x # Example usage in a DALI pipeline: pipeline = Pipeline(batch_size=..., device_id=...) with pipeline: input_tensor = dali.fn.external_source(source=..., device='gpu') output_tensor = jax_op(input_tensor, function=jax_augmentation) pipeline.set_outputs(output_tensor) ``` -------------------------------- ### T5X Basic Training with DALI Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/index Provides an example of using NVIDIA DALI with T5X for training large language models. This snippet demonstrates setting up a peekable iterator for efficient data handling within the T5X framework. ```python import jax import jax.numpy as jnp import nvidia.dali.fn as fn from nvidia.dali.plugin.jax import DALIGenerator # Assuming a DALI pipeline is defined # pipeline = fn. ... # DALI iterator for T5X, using peekable iterator dali_iter = DALIGenerator(pipeline=pipeline, jax_format='...', batch_size=32, num_threads=4, device_id=0, peekable=True) # Example training loop within T5X framework # Assume 'state' is your T5X training state # Assume 'train_step_fn' is your T5X training step function while not dali_iter.has_boundary(): batch = dali_iter.next() # Process batch and update training state # state, loss = train_step_fn(state, batch) pass ``` -------------------------------- ### JAX Multi-GPU Training with DALI Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/index Illustrates how to leverage multiple GPUs for training neural networks with DALI and JAX. This includes examples of automatic parallelization and using pmapped iterators for efficient distributed data loading. ```python import jax import jax.numpy as jnp from jax import grad, jit, vmap, pmap import nvidia.dali.fn as fn from nvidia.dali.plugin.jax import DALIGenerator def create_dali_pipeline(batch_size, num_threads, device_id): # Define your DALI pipeline here pipeline = fn. ... return pipeline num_gpus = jax.local_device_count() # Create DALI iterators for each GPU dali_iters = [] for i in range(num_gpus): pipeline = create_dali_pipeline(batch_size=32, num_threads=4, device_id=i) dali_iter = DALIGenerator(pipeline=pipeline, jax_format='...', batch_size=32, num_threads=4, device_id=i) dali_iters.append(dali_iter) @pmap def train_step_pmap(params, batch): # Your JAX training logic for pmapped execution pass # Example training loop using pmap for epoch in range(num_epochs): # Gather batches from each iterator # batched_data = [next(iter) for iter in dali_iters] # params = train_step_pmap(params, batched_data) pass ``` -------------------------------- ### Flax Basic Training with DALI Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/index Shows how to integrate NVIDIA DALI with Flax for training neural networks. This example focuses on setting up the DALI data pipeline and feeding data into a Flax model. ```python import jax import jax.numpy as jnp import flax.linen as nn from flax.training import train_state import nvidia.dali.fn as fn from nvidia.dali.plugin.jax import DALIGenerator # Assuming a DALI pipeline is defined # pipeline = fn. ... # DALI iterator for Flax (often used with JAX format) dali_iter = DALIGenerator(pipeline=pipeline, jax_format='...', batch_size=32, num_threads=4, device_id=0) # Define your Flax model class SimpleCNN(nn.Module): @nn.compact def __call__(self, x): x = nn.Conv(features=32, kernel_size=(3, 3))(x) x = nn.relu(x) x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2)) x = nn.Flatten()(x) x = nn.Dense(features=10)(x) return x # Example training loop key = jax.random.PRNGKey(0) model = SimpleCNN() params = model.init(key, jnp.ones([1, 28, 28, 1]))['params'] for epoch in range(num_epochs): for batch in dali_iter: # Process batch and train Flax model pass ``` -------------------------------- ### DALI Augmentation Gallery Example Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/operations_index Provides a consolidated view of various image augmentation techniques available in DALI, demonstrating their application and parameters. This helps in selecting and combining augmentations for robust models. ```python from nvidia.dali.pipeline import Pipeline import nvidia.dali.ops as ops import nvidia.dali.types as types class AugmentationGalleryPipeline(Pipeline): def __init__(self, batch_size=2, num_threads=2, device_id=0): super(AugmentationGalleryPipeline, self).__init__(batch_size, num_threads, device_id) self.input = self.feed_input(name='input', shape=[100, 100, 3]) # Example image shape # Example augmentations: self.flip = ops.Flip(device='gpu', horizontal=True) self.rotate = ops.Rotate(device='gpu', angle=15.0) self.crop = ops.Crop(device='gpu', crop_pos=[25, 25], crop_size=[50, 50]) self.brightness_contrast = ops.BrightnessContrast(device='gpu', brightness=1.1, contrast=0.9) self.saturation = ops.Saturation(device='gpu', saturation=1.5) def define_graph(self): # Applying multiple augmentations sequentially flipped = self.flip(self.input) rotated = self.rotate(flipped) cropped = self.crop(rotated) bc = self.brightness_contrast(cropped) saturated = self.saturation(bc) return saturated # This is a conceptual example. To run this, you would need actual image data. # pipeline = AugmentationGalleryPipeline() # pipeline.build() # ... provide input data ... # output = pipeline.run() ``` -------------------------------- ### DALI Image Decoder Examples Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/operations_index Demonstrates various methods for decoding different image formats (e.g., JPEG, PNG) using DALI's `ImageDecoder` and related operators. It covers options for output types and device placement. ```python from nvidia.dali.pipeline import Pipeline import nvidia.dali.ops as ops import nvidia.dali.types as types class ImageDecoderPipeline(Pipeline): def __init__(self, batch_size=2, num_threads=2, device_id=0): super(ImageDecoderPipeline, self).__init__(batch_size, num_threads, device_id) # Using external source for raw image data (e.g., JPEGs) self.image_filenames = self.external_source(name='filenames', num_outputs=1) self.file_reader = ops.FileReader(device='cpu') # Decode as RGB, place on GPU self.jpeg_decoder = ops.ImageDecoder(device='gpu', output_type=types.RGB) # Decode as Grayscale, place on CPU self.png_decoder = ops.ImageDecoder(device='cpu', output_type=types.GRAY) def define_graph(self): # Read files from filenames files = self.file_reader(self.image_filenames) # Decode JPEG to RGB on GPU jpegs_rgb = self.jpeg_decoder(files) # Decode PNG to Grayscale on CPU (assuming files could be PNG) # pngs_gray = self.png_decoder(files) return jpegs_rgb # or pngs_gray # This is a conceptual example. To run this, you would need actual image files. # pipeline = ImageDecoderPipeline() # pipeline.build() # filenames = ["/path/to/image1.jpg", "/path/to/image2.jpg"] # output = pipeline.run(input_dict={'filenames': filenames}) ``` -------------------------------- ### Get Help for main.py Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme This command displays the help message for the main.py script, listing all available command-line options and their descriptions. This is useful for understanding all configurable parameters for training. ```python python main.py -h ``` -------------------------------- ### Visualize Array Sharding with JAX in Python Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/t5x-basic_example Uses JAX's `visualize_array_sharding` to display how a tensor is distributed across available devices, useful for verifying distributed training setups. ```python import jax jax.debug.visualize_array_sharding(batch["images"].ravel()) ``` -------------------------------- ### DALI Pipeline Instantiation Example Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/webdataset-externalsource Demonstrates how to instantiate the `webdataset_pipeline` with specific arguments. This example shows how to pass the dataset paths and configure parameters like shuffling and batch padding. ```python pipeline = webdataset_pipeline( tar_dataset_paths, # Paths for the sharded dataset random_shuffle=True, # Random buffered shuffling on pad_last_batch=False, # Last batch is filled to the full size read_ahead=False, ) ``` -------------------------------- ### Read Video Frames as Images with DALI Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/index This example demonstrates how to read video frames that are stored as individual image files using NVIDIA DALI. It covers the setup and usage of DALI's sequence reading capabilities, suitable for workflows where video data is pre-processed into image sequences. ```python import nvidia.dali.fn as fn from nvidia.dali.pipeline import Pipeline # Assuming image files are in a directory named 'frames' # and follow a naming convention like frame_0000.png, frame_0001.png, etc. num_frames = 100 class VideoReaderPipeline(Pipeline): def __init__(self, batch_size=1, num_threads=2, device_id=0, data_dir="frames/"): super(VideoReaderPipeline, self).__init__(batch_size, num_threads, device_id, seed=42) self.input = fn.readers.images( files=f"{data_dir}frame_*.png", sequential=True, name="FileReader", dtype=types.UINT8) def define_graph(self): self.images = self.input return self.images # Example usage: pipeline = VideoReaderPipeline() pipeline.build() # Fetch a batch of frames images = pipeline.run()[0] print(f"Shape of the fetched batch of frames: {images.shape}") ``` -------------------------------- ### Parallel External Source (Fork) with DALI Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/data_loading/index Illustrates the 'fork' variant of parallel external source in NVIDIA DALI, often used with Python's `multiprocessing.Process`. This method involves starting separate Python workers that manage their own DALI pipelines and feed data. It includes steps for starting workers and handling data. ```python from nvidia.dali.pipeline import Pipeline import nvidia.dali.ops as ops import multiprocessing import time # Define the DALI pipeline that uses ExternalSource class ForkPipeline(Pipeline): def __init__(self, batch_size, device_id, shard_id, num_shards): super(ForkPipeline, self).__init__(batch_size, num_threads=2, device_id=device_id, read_leadsto_worker_threads=2, shard_id=shard_id, num_shards=num_shards) self.source = ops.ExternalSource() def define_graph(self): data = self.source() # Example DALI operations processed_data = ops.Cast(data, types.FLOAT) return processed_data # Function executed by each worker process def worker_process(shard_id, num_shards, batch_size, output_queue): # Each worker creates its own pipeline instance pipeline = ForkPipeline(batch_size=batch_size, device_id=shard_id, shard_id=shard_id, num_shards=num_shards) pipeline.build() # Simulate loading and feeding data for i in range(5): # Number of batches per worker # Generate or load external data for this batch # Data must be in a format DALI ExternalSource can accept (e.g., numpy arrays) dummy_data = [np.random.rand(32, 32) * (shard_id + 1) for _ in range(batch_size)] # Feed the data to the pipeline using the name 'source' pipeline.feed_input('source', dummy_data) # Get processed data from the pipeline try: batch = pipeline.run()[0] # Get the first output tensor output_queue.put((shard_id, batch)) time.sleep(0.1) # Simulate work except Exception as e: print(f"Worker {shard_id} error: {e}") # Main process setup # if __name__ == '__main__': # num_workers = 4 # batch_size = 16 # output_queue = multiprocessing.Queue() # processes = [] # print(f"Starting {num_workers} worker processes...") # for i in range(num_workers): # p = multiprocessing.Process(target=worker_process, args=(i, num_workers, batch_size, output_queue)) # processes.append(p) # p.start() # # Collect results from workers # collected_batches = 0 # total_batches_expected = num_workers * 5 # while collected_batches < total_batches_expected: # try: # shard_id, batch = output_queue.get(timeout=10) # print(f"Main process received batch from shard {shard_id}. Batch shape: {batch.shape}") # collected_batches += 1 # except multiprocessing.queues.Empty: # print("Queue empty, waiting...") # break # Timeout or all workers finished # # Wait for all processes to finish # print("Waiting for worker processes to join...") # for p in processes: # p.join() # print("All worker processes finished.") ``` -------------------------------- ### Define DALI Preprocessing Pipeline Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/jax-getting_started Defines a simple DALI processing pipeline using nvidia.dali.fn. This pipeline reads JPEG images and their labels, decodes the images, and resizes them. It serves as a basic example for image preprocessing tasks in computer vision, optimized for performance. ```python import nvidia.dali.fn as fn def simple_pipeline(): jpegs, labels = fn.readers.file(file_root=image_dir, name="image_reader") images = fn.decoders.image(jpegs) images = fn.resize(images, resize_x=256, resize_y=256) return images, labels ``` -------------------------------- ### DALI Tutorials Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.video Links to various DALI tutorials covering optical flow, multi-GPU processing, and video reading functionalities. ```APIDOC ## DALI Tutorials ### Description This section provides links to several tutorials demonstrating advanced DALI features for deep learning workflows. ### Tutorials: - **Optical Flow Calculation**: [Tutorial describing how to calculate optical flow from video inputs](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/optical_flow_example.html.md) - **Multi-GPU Data Reading**: [Reading the data in the multi-GPU setup.](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/multigpu.html.md) - **Simple Video Reader**: [Tutorial describing how to use video reader](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/video/video_reader_simple_example.html.md) - **Video Reader with Labels**: [Tutorial describing how to use video reader to output frames with labels](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/video/video_reader_label_example.html.md) - **Video File List Outputs**: [Tutorial describing how to output frames with labels assigned to dedicated ranges of frame numbers/timestamps](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/video/video_file_list_outputs.html.md) - **Per-Frame Arguments**: [Examples of processing video in DALI](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/video/video_processing_per_frame_arguments.html.md) ``` -------------------------------- ### Build and Run CPU BrightnessContrast Pipeline Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/brightness_contrast_example Instantiates, builds, and runs the CPU DALI pipeline. It configures the batch size, number of threads, and device ID, then executes the pipeline and stores the output. ```python pipe_cpu = bc_cpu_pipeline(batch_size=batch_size, num_threads=1, device_id=0) pipe_cpu.build() cpu_output = pipe_cpu.run() ``` -------------------------------- ### Run Training on DGX-A100 (Standard Config) Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme This command initiates training on a DGX-A100 system with a standard configuration, employing multiple GPUs, AMP, and DALI with AutoAugment. It includes settings for batch size and the dataset path. ```python python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 256 $PATH_TO_IMAGENET ``` -------------------------------- ### Image Decoder (Hybrid) with Advanced Options (Python) Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/index Illustrates image decoding using DALI's hybrid backend, combining CPU and GPU acceleration. This example explores variations with random cropping, fixed cropping, external window sizes, and anchors for flexible image region selection. It requires DALI to be installed with GPU support. ```python import nvidia.dali.fn as fn import nvidia.dali.types as types @pipeline(batch_size=2, num_threads=2, device_id=0) # Assuming GPU device 0 def hybrid_decoder_pipeline(): images, metadata = fn.readers.file(file_root='') # Example: Hybrid decoder with external window size and anchor decoded_images = fn.decoders.image(images, device='mixed', output_type=types.RGB, window_size=[100, 100], anchor=[10, 10]) # Other variations like random cropping or fixed cropping would use different parameters # e.g., for random cropping: fn.decoders.image_crop(images, crop_pos_rng=[[0,0],[1,1]], crop_size=[100,100], ...) return decoded_images ``` -------------------------------- ### Run Training (AMP) on Single GPU Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme This command starts training with Automatic Mixed Precision (AMP) on a single GPU. It requires setting the batch size and AMP parameters, along with the dataset path. The batch size can be tuned for optimal performance on the specific hardware. ```python python ./main.py --batch-size 64 --amp --static-loss-scale 128 $PATH_TO_IMAGENET ``` -------------------------------- ### Run Training (FP32) on Single GPU Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme This command initiates training using the main.py script for FP32 precision on a single GPU. It requires specifying the batch size and the path to the ImageNet dataset. The batch size can be adjusted based on the machine's capabilities. ```python python ./main.py --batch-size 64 $PATH_TO_IMAGENET ``` -------------------------------- ### Numba Function Callback DALI Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/other_index Shows how to use Numba to compile Python functions for use as callbacks within NVIDIA DALI pipelines. This allows for efficient execution of custom Python logic on the CPU or GPU by leveraging Numba's Just-In-Time (JIT) compilation. It requires Numba to be installed. ```python import numba from nvidia.dali.pipeline import Pipeline from nvidia.dali.ops import numba_op @numba.jit(nopython=True) def numba_callback(input_array): # Perform computations using Numba return result_array # Example usage in a DALI pipeline: pipeline = Pipeline(...) with pipeline: input_tensor = dali.fn.external_source(source=...) output_tensor = numba_op(input_tensor, function=numba_callback) pipeline.set_outputs(output_tensor) ``` -------------------------------- ### Data Loading Examples Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.webdataset Examples demonstrating how to load data using the WebDataset format and configure multi-GPU setups with NVIDIA DALI. ```APIDOC ## Data Loading Examples ### Description Provides examples for reading data stored in the Webdataset format and configuring data loading in a multi-GPU setup using NVIDIA DALI. ### Method N/A (Example Usage) ### Endpoint N/A ### Parameters N/A ### Request Example N/A ### Response N/A #### Response Example N/A ``` -------------------------------- ### Run Training on DGX1V-16G (Standard Config) Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme This command executes training in a standard configuration on a DGX1V-16G system, utilizing multiple GPUs, AMP, and DALI with AutoAugment. It specifies the batch size and dataset path. ```python python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 $PATH_TO_IMAGENET ``` -------------------------------- ### Get JAX Array Device Information Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/jax-getting_started Prints the device backing the 'images' JAX array. This helps in verifying if preprocessing operations are utilizing the CPU or GPU. ```python print(f'Images backing device: {batch["images"].device()}') ``` -------------------------------- ### Instantiate and Build DALI Pipeline (Python) Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/warp Instantiates the defined DALI pipeline with specified batch size, number of threads, and device ID. It then builds the pipeline, preparing it for execution. ```python batch_size = 32 pipe = example_pipeline(batch_size=batch_size, num_threads=2, device_id=0) pipe.build() ``` -------------------------------- ### Build and Run GPU BrightnessContrast Pipeline Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/brightness_contrast_example Instantiates, builds, and runs the GPU DALI pipeline. It configures the batch size, number of threads, and device ID, then executes the pipeline and stores the output. ```python pipe_gpu = bc_gpu_pipeline(batch_size=batch_size, num_threads=1, device_id=0) pipe_gpu.build() gpu_output = pipe_gpu.run() ``` -------------------------------- ### DALI Pipeline Definition and Usage in Python Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index This Python snippet demonstrates how to define a DALI pipeline using decorators and functions. It imports necessary DALI modules and shows a basic setup for using DALI with PyTorch, including an iterator for generic data loading. Ensure DALI and PyTorch are installed. This code is intended as a starting point for building custom data pipelines. ```python from nvidia.dali.pipeline import pipeline_def import nvidia.dali.types as types import nvidia.dali.fn as fn from nvidia.dali.plugin.pytorch import DALIGenericIterator import os # To run with different data, see documentation of nvidia.dali.fn.readers.file ``` -------------------------------- ### DALI Geometric Transforms Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/operations_index Provides examples of geometric transformations applicable to DALI tensors, such as scaling, rotation, and translation. These are fundamental for data augmentation in computer vision tasks. ```python from nvidia.dali.pipeline import Pipeline import nvidia.dali.ops as ops import nvidia.dali.types as types class GeometricPipeline(Pipeline): def __init__(self, batch_size=2, num_threads=2, device_id=0): super(GeometricPipeline, self).__init__(batch_size, num_threads, device_id) self.input = self.feed_input(name='input', shape=[100, 100, 3]) # Example image shape # Example: Resize operation self.resize = ops.Resize(device='gpu', size=[128, 128]) # Example: Rotate operation (requires a rotation matrix) # rotation_matrix = [[cos(theta), -sin(theta)], [sin(theta), cos(theta)]] # self.rotate = ops.Rotate(device='gpu', matrix=[...]) def define_graph(self): resized_output = self.resize(self.input) # rotated_output = self.rotate(resized_output) return resized_output # or rotated_output pipeline = GeometricPipeline() pipeline.build() # Dummy image data import numpy as np image_data = np.random.randint(0, 256, size=(2, 100, 100, 3), dtype=np.uint8) data_batch = list(image_data) output = pipeline.run(input_dict={'input': data_batch}) print(output[0].as_array()) ``` -------------------------------- ### TensorListGPU Class Documentation Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/data_types Documentation for the TensorListGPU class, including its constructor overloads and key methods. ```APIDOC ## Class TensorListGPU ### Description List of tensors residing in the GPU memory. ### Methods #### __init__ (Overloaded) 1. **__init__(self: TensorListGPU, object: CapsuleType, layout: str | None = None) -> None** * **object** (CapsuleType) - DLPack object representing TensorList. * **layout** (str, optional) - Layout of the data. 2. **__init__(self: TensorListGPU, tl: TensorListGPU, layout: str | None = None) -> None** * **tl** (TensorListGPU) - Another TensorListGPU object to copy from. * **layout** (str, optional) - Layout of the data. 3. **__init__(self: TensorListGPU, list_of_tensors: list, layout: str | None = None) -> None** * **list_of_tensors** (list[TensorGPU]) - Python list of TensorGPU objects. * **layout** (str, optional) - Layout of the data. 4. **__init__(self: TensorListGPU, object: object, layout: str | None = None, device_id: SupportsInt = -1) -> None** * **object** (object) - Python object that implements CUDA Array Interface. * **layout** (str, optional) - Layout of the data. * **device_id** (int, optional) - Device of where this tensor resides. If not provided, the current device is used. 5. **__init__(self: TensorListGPU) -> None** Initializes an empty TensorListGPU. #### __getitem__ * **__getitem__(self: TensorListGPU, i: SupportsInt) -> TensorGPU** Returns a tensor at the given position `i` in the list. #### as_cpu * **as_cpu(self: TensorListGPU) -> TensorListCPU** Returns a `TensorListCPU` object being a copy of this `TensorListGPU`. #### as_reshaped_tensor * **as_reshaped_tensor(self: TensorListGPU, arg0: Sequence[SupportsInt]) -> TensorGPU** Returns a tensor that is a view of this `TensorList` cast to the given shape. This function can only be called if `TensorList` is contiguous in memory and the volumes of requested Tensor and TensorList match. #### as_tensor * **as_tensor(self: TensorListGPU) -> TensorGPU** Returns a tensor that is a view of this `TensorList`. This function can only be called if `is_dense_tensor` returns True. #### at * **at(self: TensorListGPU, arg0: SupportsInt) -> TensorGPU** Returns a tensor at the given position in the list. Deprecated for `__getitem__()`. ``` -------------------------------- ### Start DALI Pipeline Thread (Python) Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/plugins/pytorch_dali_proxy Starts the DALI pipeline thread. It is recommended to use the scope's __enter__/__exit__ for managing the thread lifecycle. ```python from nvidia.dali.plugin.pytorch.experimental.proxy import DALIServer dali_server = DALIServer() dali_server.start_thread() ``` -------------------------------- ### Build and Run Hybrid Pipeline (Python) Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/getting_started This code snippet demonstrates how to instantiate, build, and run the hybrid decoding pipeline defined previously. It configures the pipeline with batch size, number of threads, and device ID. Finally, it executes the pipeline and displays the decoded images. ```python pipe = hybrid_pipeline( batch_size=max_batch_size, num_threads=1, device_id=0, seed=1234 ) pipe.build() pipe_out = pipe.run() images, labels = pipe_out show_images(images.as_cpu()) ``` -------------------------------- ### DALI WarpAffine Transformation Example Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/index Illustrates the use of the DALI WarpAffine operator for applying affine transformations to images. This includes translation, scaling, rotation, and shearing. The example covers defining the transformation matrix and applying it to input images. ```python from nvidia.dali.pipeline import Pipeline import nvidia.dali.ops as ops import numpy as np class WarpAffinePipeline(Pipeline): def __init__(self, batch_size=1, device='cpu', matrix=None): super(WarpAffinePipeline, self).__init__(batch_size, num_threads=1, device=device) self.input = ops.CudaHostInput(device='cpu', batch_size=batch_size) # If matrix is None, a default identity matrix can be used or generated. # For specific transformations, provide a numpy array representing the 2x3 affine matrix. self.warp = ops.WarpAffine(device=device, matrix=matrix) def define_graph(self): self.images = self.input(name='input') transformed_images = self.warp(self.images) return transformed_images # Example usage: # Define an affine transformation matrix (e.g., for rotation and translation) # Example: Rotate 30 degrees and translate by (10, 5) angle = np.deg2rad(30) cos_a, sin_a = np.cos(angle), np.sin(angle) tx, ty = 10, 5 # DALI expects a 2x3 matrix for affine transformations # The matrix format is [[a, b, tx], [c, d, ty]] where # [a, c] is the first column of the rotation/scale part # [b, d] is the second column of the rotation/scale part # [tx, ty] is the translation part # For rotation by 'angle' and scale 's': # a = s * cos(angle), b = -s * sin(angle) # c = s * sin(angle), d = s * cos(angle) s = 1.0 # No scaling in this example affine_matrix = np.array([ [cos_a, -sin_a, tx], [sin_a, cos_a, ty] ], dtype=np.float32) # If using GPU, instantiate pipeline with device='gpu' pipe = WarpAffinePipeline(matrix=affine_matrix, device='cpu') pipe.build() # ... feed image data to pipe.run() ``` -------------------------------- ### Iterate Through Batches with JAX Iterator Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/jax-getting_started Shows how to iterate through all batches provided by a DALI JAX iterator using a for loop. It also demonstrates how to get the total number of batches in the dataset using the `len()` function. ```python iterator = iterator_fn(batch_size=1) print(f"Iterator size: {len(iterator)}") for batch_id, batch in enumerate(iterator): print(batch_id) ``` -------------------------------- ### Caffe2 LMDB Data Loading Example Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.caffe2 An example demonstrating how to read data stored in LMDB in the Caffe 2 format using NVIDIA DALI. ```APIDOC ## Example: Reading LMDB Data in Caffe 2 Format ### Description This example illustrates the process of loading data from an LMDB database that is structured in the Caffe 2 format using the NVIDIA DALI library. ### Method N/A (Example Code) ### Endpoint N/A (Example Code) ### Parameters N/A (Example Code) ### Request Example ```python # Example Python code snippet demonstrating LMDB reading # (Actual code would be more detailed) from nvidia.dali.plugin.pytorch import DALIClassificationIterator import nvidia.dali.fn as fn # Assuming lmdb_path is defined and DALI pipeline is set up pipe = fn.readers.caffe2(path=lmdb_path, random_shuffle=True) # ... further pipeline definition ... dali_iter = DALIClassificationIterator(pipe, size=num_samples) for data in dali_iter: # Process loaded data pass ``` ### Response #### Success Response (N/A) N/A #### Response Example N/A ``` -------------------------------- ### DALI Iterator with Sharding for Distributed Runs Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/custom_operations/index This snippet details the setup of a DALI iterator that supports sharding for distributed execution. It ensures that each process in a multi-GPU setup receives a unique subset of the data. ```python from nvidia.dali.plugin.base_iterator import DALIBaseIterator class ShardedIterator(DALIBaseIterator): def __init__(self, pipelines, size, auto_reset=False, fill_value=None, last_batch_padded=False): super(ShardedIterator, self).__init__(pipelines, size, auto_reset=auto_reset, fill_value=fill_value, last_batch_padded=last_batch_padded) # Initialize sharding specific variables if needed self.rank = 0 # Example: will be set externally self.world_size = 1 # Example: will be set externally def __next__(self): # Logic to fetch and potentially shard batches batch = super(ShardedIterator, self).__next__() # Apply sharding logic here if not handled by DALI pipeline itself return batch ```