### JAX Basic Training with DALI

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/index

Demonstrates how to train a neural network using DALI for data loading and JAX as the machine learning framework. This example covers basic integration and data pipeline setup.

```python
import jax
import jax.numpy as jnp
from jax import grad, jit, vmap

import nvidia.dali.fn as fn
from nvidia.dali.plugin.jax import DALIGenerator

# Assuming a DALI pipeline is defined elsewhere
# pipeline = fn. ...

# DALI iterator for JAX
dali_iter = DALIGenerator(pipeline=pipeline, 
                          jax_format='...',
                          batch_size=32, 
                          num_threads=4, 
                          device_id=0)

def train_step(params, batch):
    # Your JAX training logic here
    pass

# Example training loop
for epoch in range(num_epochs):
    for batch in dali_iter:
        # Process batch (e.g., move to device if needed)
        # params = train_step(params, batch)
        pass

```

--------------------------------

### Multi-GPU Data Loading Example

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.caffe2

An example showcasing how to read data in a multi-GPU setup using NVIDIA DALI.

```APIDOC
## Example: Multi-GPU Data Loading

### Description
This example demonstrates how to configure NVIDIA DALI for reading data efficiently across multiple GPUs in a distributed training environment.

### Method
N/A (Example Code)

### Endpoint
N/A (Example Code)

### Parameters
N/A (Example Code)

### Request Example
```python
# Example Python code snippet for multi-GPU setup
# (Actual code would be more detailed and involve distributed settings)
from nvidia.dali.plugin.pytorch import DALIClassificationIterator
import nvidia.dali.fn as fn
import os

# Assuming data is partitioned or a reader supports distribution
# Example with a reader that can be sharded
pipe_per_gpu = fn.readers.external(shard_id=world_rank, num_shards=world_size)
# ... further pipeline definition ...

# DALI iterators are typically created per GPU
# dali_iter_gpu = DALIClassificationIterator(pipe_per_gpu, ...)

# In a real scenario, world_rank and world_size would come from torch.distributed
world_rank = int(os.environ["RANK"])
world_size = int(os.environ["WORLD_SIZE"])

# ... training loop ...
```

### Response
#### Success Response (N/A)
N/A

#### Response Example
N/A
```

--------------------------------

### NumPy Reader Examples

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.numpy

Illustrative examples of using the NumPy Reader, including reading directly to GPU memory with GPUDirect storage and multi-GPU setups.

```APIDOC
## NumPy Reader Usage Examples

### Description
Examples demonstrating how to use the NumPy Reader for various data loading scenarios.

### Method
N/A (Examples)

### Endpoint
N/A (Examples)

### Parameters
N/A (Examples)

### Request Example
```python
# Example of reading NumPy array files, including reading directly to GPU memory utilizing the GPUDirect storage
# Refer to: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/data_loading/numpy_reader.html.md

# Example of reading the data in the multi-GPU setup.
# Refer to: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/multigpu.html.md
```

### Response
#### Success Response (N/A)
N/A

#### Response Example
N/A
```

--------------------------------

### Custom JAX Augmentations in DALI

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/other_index

Demonstrates integrating custom data augmentations written using JAX into NVIDIA DALI pipelines. This allows leveraging JAX's capabilities for automatic differentiation and hardware acceleration within DALI workflows. It requires JAX to be installed and configured.

```python
import jax
import jax.numpy as jnp
from nvidia.dali.pipeline import Pipeline
from nvidia.dali.ops import jax_op

@jax.jit
def jax_augmentation(x):
    # Implement custom JAX augmentation logic
    return augmented_x

# Example usage in a DALI pipeline:
pipeline = Pipeline(batch_size=..., device_id=...)
with pipeline:
    input_tensor = dali.fn.external_source(source=..., device='gpu')
    output_tensor = jax_op(input_tensor, function=jax_augmentation)
    pipeline.set_outputs(output_tensor)

```

--------------------------------

### T5X Basic Training with DALI

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/index

Provides an example of using NVIDIA DALI with T5X for training large language models. This snippet demonstrates setting up a peekable iterator for efficient data handling within the T5X framework.

```python
import jax
import jax.numpy as jnp

import nvidia.dali.fn as fn
from nvidia.dali.plugin.jax import DALIGenerator

# Assuming a DALI pipeline is defined
# pipeline = fn. ...

# DALI iterator for T5X, using peekable iterator
dali_iter = DALIGenerator(pipeline=pipeline, 
                          jax_format='...',
                          batch_size=32, 
                          num_threads=4, 
                          device_id=0,
                          peekable=True)

# Example training loop within T5X framework
# Assume 'state' is your T5X training state
# Assume 'train_step_fn' is your T5X training step function

while not dali_iter.has_boundary():
    batch = dali_iter.next()
    # Process batch and update training state
    # state, loss = train_step_fn(state, batch)
    pass

```

--------------------------------

### JAX Multi-GPU Training with DALI

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/index

Illustrates how to leverage multiple GPUs for training neural networks with DALI and JAX. This includes examples of automatic parallelization and using pmapped iterators for efficient distributed data loading.

```python
import jax
import jax.numpy as jnp
from jax import grad, jit, vmap, pmap

import nvidia.dali.fn as fn
from nvidia.dali.plugin.jax import DALIGenerator

def create_dali_pipeline(batch_size, num_threads, device_id):
    # Define your DALI pipeline here
    pipeline = fn. ... 
    return pipeline

num_gpus = jax.local_device_count()

# Create DALI iterators for each GPU
dali_iters = []
for i in range(num_gpus):
    pipeline = create_dali_pipeline(batch_size=32, num_threads=4, device_id=i)
    dali_iter = DALIGenerator(pipeline=pipeline, 
                              jax_format='...',
                              batch_size=32, 
                              num_threads=4, 
                              device_id=i)
    dali_iters.append(dali_iter)

@pmap
def train_step_pmap(params, batch):
    # Your JAX training logic for pmapped execution
    pass

# Example training loop using pmap
for epoch in range(num_epochs):
    # Gather batches from each iterator
    # batched_data = [next(iter) for iter in dali_iters]
    # params = train_step_pmap(params, batched_data)
    pass

```

--------------------------------

### Flax Basic Training with DALI

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/index

Shows how to integrate NVIDIA DALI with Flax for training neural networks. This example focuses on setting up the DALI data pipeline and feeding data into a Flax model.

```python
import jax
import jax.numpy as jnp
import flax.linen as nn
from flax.training import train_state

import nvidia.dali.fn as fn
from nvidia.dali.plugin.jax import DALIGenerator

# Assuming a DALI pipeline is defined
# pipeline = fn. ...

# DALI iterator for Flax (often used with JAX format)
dali_iter = DALIGenerator(pipeline=pipeline, 
                          jax_format='...',
                          batch_size=32, 
                          num_threads=4, 
                          device_id=0)

# Define your Flax model
class SimpleCNN(nn.Module):
    @nn.compact
    def __call__(self, x):
        x = nn.Conv(features=32, kernel_size=(3, 3))(x)
        x = nn.relu(x)
        x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2))
        x = nn.Flatten()(x)
        x = nn.Dense(features=10)(x)
        return x

# Example training loop
key = jax.random.PRNGKey(0)
model = SimpleCNN()
params = model.init(key, jnp.ones([1, 28, 28, 1]))['params']

for epoch in range(num_epochs):
    for batch in dali_iter:
        # Process batch and train Flax model
        pass

```

--------------------------------

### DALI Augmentation Gallery Example

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/operations_index

Provides a consolidated view of various image augmentation techniques available in DALI, demonstrating their application and parameters. This helps in selecting and combining augmentations for robust models.

```python
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types

class AugmentationGalleryPipeline(Pipeline):
    def __init__(self, batch_size=2, num_threads=2, device_id=0):
        super(AugmentationGalleryPipeline, self).__init__(batch_size, num_threads, device_id)
        self.input = self.feed_input(name='input', shape=[100, 100, 3]) # Example image shape
        
        # Example augmentations:
        self.flip = ops.Flip(device='gpu', horizontal=True)
        self.rotate = ops.Rotate(device='gpu', angle=15.0)
        self.crop = ops.Crop(device='gpu', crop_pos=[25, 25], crop_size=[50, 50])
        self.brightness_contrast = ops.BrightnessContrast(device='gpu', brightness=1.1, contrast=0.9)
        self.saturation = ops.Saturation(device='gpu', saturation=1.5)

    def define_graph(self):
        # Applying multiple augmentations sequentially
        flipped = self.flip(self.input)
        rotated = self.rotate(flipped)
        cropped = self.crop(rotated)
        bc = self.brightness_contrast(cropped)
        saturated = self.saturation(bc)
        return saturated

# This is a conceptual example. To run this, you would need actual image data.
# pipeline = AugmentationGalleryPipeline()
# pipeline.build()
# ... provide input data ...
# output = pipeline.run()

```

--------------------------------

### DALI Image Decoder Examples

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/operations_index

Demonstrates various methods for decoding different image formats (e.g., JPEG, PNG) using DALI's `ImageDecoder` and related operators. It covers options for output types and device placement.

```python
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types

class ImageDecoderPipeline(Pipeline):
    def __init__(self, batch_size=2, num_threads=2, device_id=0):
        super(ImageDecoderPipeline, self).__init__(batch_size, num_threads, device_id)
        # Using external source for raw image data (e.g., JPEGs)
        self.image_filenames = self.external_source(name='filenames', num_outputs=1)
        self.file_reader = ops.FileReader(device='cpu')
        # Decode as RGB, place on GPU
        self.jpeg_decoder = ops.ImageDecoder(device='gpu', output_type=types.RGB)
        # Decode as Grayscale, place on CPU
        self.png_decoder = ops.ImageDecoder(device='cpu', output_type=types.GRAY)

    def define_graph(self):
        # Read files from filenames
        files = self.file_reader(self.image_filenames)
        # Decode JPEG to RGB on GPU
        jpegs_rgb = self.jpeg_decoder(files)
        # Decode PNG to Grayscale on CPU (assuming files could be PNG)
        # pngs_gray = self.png_decoder(files)
        return jpegs_rgb # or pngs_gray

# This is a conceptual example. To run this, you would need actual image files.
# pipeline = ImageDecoderPipeline()
# pipeline.build()
# filenames = ["/path/to/image1.jpg", "/path/to/image2.jpg"]
# output = pipeline.run(input_dict={'filenames': filenames})

```

--------------------------------

### Get Help for main.py

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme

This command displays the help message for the main.py script, listing all available command-line options and their descriptions. This is useful for understanding all configurable parameters for training.

```python
python main.py -h
```

--------------------------------

### Visualize Array Sharding with JAX in Python

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/t5x-basic_example

Uses JAX's `visualize_array_sharding` to display how a tensor is distributed across available devices, useful for verifying distributed training setups.

```python
import jax

jax.debug.visualize_array_sharding(batch["images"].ravel())
```

--------------------------------

### DALI Pipeline Instantiation Example

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/webdataset-externalsource

Demonstrates how to instantiate the `webdataset_pipeline` with specific arguments. This example shows how to pass the dataset paths and configure parameters like shuffling and batch padding.

```python
pipeline = webdataset_pipeline(
    tar_dataset_paths,   # Paths for the sharded dataset
    random_shuffle=True, # Random buffered shuffling on
    pad_last_batch=False, # Last batch is filled to the full size
    read_ahead=False,
)
```

--------------------------------

### Read Video Frames as Images with DALI

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/index

This example demonstrates how to read video frames that are stored as individual image files using NVIDIA DALI. It covers the setup and usage of DALI's sequence reading capabilities, suitable for workflows where video data is pre-processed into image sequences.

```python
import nvidia.dali.fn as fn
from nvidia.dali.pipeline import Pipeline

# Assuming image files are in a directory named 'frames'
# and follow a naming convention like frame_0000.png, frame_0001.png, etc.
num_frames = 100

class VideoReaderPipeline(Pipeline):
    def __init__(self, batch_size=1, num_threads=2, device_id=0, data_dir="frames/"):
        super(VideoReaderPipeline, self).__init__(batch_size, num_threads, device_id, seed=42)
        self.input = fn.readers.images(
            files=f"{data_dir}frame_*.png",
            sequential=True,
            name="FileReader",
            dtype=types.UINT8)

    def define_graph(self):
        self.images = self.input
        return self.images

# Example usage:
pipeline = VideoReaderPipeline()
pipeline.build()

# Fetch a batch of frames
images = pipeline.run()[0]
print(f"Shape of the fetched batch of frames: {images.shape}")
```

--------------------------------

### Parallel External Source (Fork) with DALI

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/data_loading/index

Illustrates the 'fork' variant of parallel external source in NVIDIA DALI, often used with Python's `multiprocessing.Process`. This method involves starting separate Python workers that manage their own DALI pipelines and feed data. It includes steps for starting workers and handling data.

```python
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import multiprocessing
import time

# Define the DALI pipeline that uses ExternalSource
class ForkPipeline(Pipeline):
    def __init__(self, batch_size, device_id, shard_id, num_shards):
        super(ForkPipeline, self).__init__(batch_size, num_threads=2, device_id=device_id, 
                                            read_leadsto_worker_threads=2, 
                                            shard_id=shard_id, num_shards=num_shards)
        self.source = ops.ExternalSource()

    def define_graph(self):
        data = self.source()
        # Example DALI operations
        processed_data = ops.Cast(data, types.FLOAT)
        return processed_data

# Function executed by each worker process
def worker_process(shard_id, num_shards, batch_size, output_queue):
    # Each worker creates its own pipeline instance
    pipeline = ForkPipeline(batch_size=batch_size, device_id=shard_id, shard_id=shard_id, num_shards=num_shards)
    pipeline.build()
    
    # Simulate loading and feeding data
    for i in range(5): # Number of batches per worker
        # Generate or load external data for this batch
        # Data must be in a format DALI ExternalSource can accept (e.g., numpy arrays)
        dummy_data = [np.random.rand(32, 32) * (shard_id + 1) for _ in range(batch_size)]
        
        # Feed the data to the pipeline using the name 'source'
        pipeline.feed_input('source', dummy_data)
        
        # Get processed data from the pipeline
        try:
            batch = pipeline.run()[0] # Get the first output tensor
            output_queue.put((shard_id, batch))
            time.sleep(0.1) # Simulate work
        except Exception as e:
            print(f"Worker {shard_id} error: {e}")

# Main process setup
# if __name__ == '__main__':
#     num_workers = 4
#     batch_size = 16
#     output_queue = multiprocessing.Queue()
#     processes = []

#     print(f"Starting {num_workers} worker processes...")
#     for i in range(num_workers):
#         p = multiprocessing.Process(target=worker_process, args=(i, num_workers, batch_size, output_queue))
#         processes.append(p)
#         p.start()

#     # Collect results from workers
#     collected_batches = 0
#     total_batches_expected = num_workers * 5
#     while collected_batches < total_batches_expected:
#         try:
#             shard_id, batch = output_queue.get(timeout=10)
#             print(f"Main process received batch from shard {shard_id}. Batch shape: {batch.shape}")
#             collected_batches += 1
#         except multiprocessing.queues.Empty:
#             print("Queue empty, waiting...")
#             break # Timeout or all workers finished

#     # Wait for all processes to finish
#     print("Waiting for worker processes to join...")
#     for p in processes:
#         p.join()

#     print("All worker processes finished.")

```

--------------------------------

### Define DALI Preprocessing Pipeline

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/jax-getting_started

Defines a simple DALI processing pipeline using nvidia.dali.fn. This pipeline reads JPEG images and their labels, decodes the images, and resizes them. It serves as a basic example for image preprocessing tasks in computer vision, optimized for performance.

```python
import nvidia.dali.fn as fn

def simple_pipeline():
    jpegs, labels = fn.readers.file(file_root=image_dir, name="image_reader")
    images = fn.decoders.image(jpegs)
    images = fn.resize(images, resize_x=256, resize_y=256)

    return images, labels
```

--------------------------------

### DALI Tutorials

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.video

Links to various DALI tutorials covering optical flow, multi-GPU processing, and video reading functionalities.

```APIDOC
## DALI Tutorials

### Description
This section provides links to several tutorials demonstrating advanced DALI features for deep learning workflows.

### Tutorials:
- **Optical Flow Calculation**: [Tutorial describing how to calculate optical flow from video inputs](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/optical_flow_example.html.md)
- **Multi-GPU Data Reading**: [Reading the data in the multi-GPU setup.](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/multigpu.html.md)
- **Simple Video Reader**: [Tutorial describing how to use video reader](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/video/video_reader_simple_example.html.md)
- **Video Reader with Labels**: [Tutorial describing how to use video reader to output frames with labels](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/video/video_reader_label_example.html.md)
- **Video File List Outputs**: [Tutorial describing how to output frames with labels assigned to dedicated ranges of frame numbers/timestamps](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/video/video_file_list_outputs.html.md)
- **Per-Frame Arguments**: [Examples of processing video in DALI](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/sequence_processing/video/video_processing_per_frame_arguments.html.md)
```

--------------------------------

### Build and Run CPU BrightnessContrast Pipeline

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/brightness_contrast_example

Instantiates, builds, and runs the CPU DALI pipeline. It configures the batch size, number of threads, and device ID, then executes the pipeline and stores the output.

```python
pipe_cpu = bc_cpu_pipeline(batch_size=batch_size, num_threads=1, device_id=0)
pipe_cpu.build()
cpu_output = pipe_cpu.run()
```

--------------------------------

### Run Training on DGX-A100 (Standard Config)

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme

This command initiates training on a DGX-A100 system with a standard configuration, employing multiple GPUs, AMP, and DALI with AutoAugment. It includes settings for batch size and the dataset path.

```python
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 256 $PATH_TO_IMAGENET
```

--------------------------------

### Image Decoder (Hybrid) with Advanced Options (Python)

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/index

Illustrates image decoding using DALI's hybrid backend, combining CPU and GPU acceleration. This example explores variations with random cropping, fixed cropping, external window sizes, and anchors for flexible image region selection. It requires DALI to be installed with GPU support.

```python
import nvidia.dali.fn as fn
import nvidia.dali.types as types

@pipeline(batch_size=2, num_threads=2, device_id=0) # Assuming GPU device 0
def hybrid_decoder_pipeline():
    images, metadata = fn.readers.file(file_root='<path_to_images>')
    # Example: Hybrid decoder with external window size and anchor
    decoded_images = fn.decoders.image(images, device='mixed', output_type=types.RGB, window_size=[100, 100], anchor=[10, 10])
    # Other variations like random cropping or fixed cropping would use different parameters
    # e.g., for random cropping: fn.decoders.image_crop(images, crop_pos_rng=[[0,0],[1,1]], crop_size=[100,100], ...)
    return decoded_images
```

--------------------------------

### Run Training (AMP) on Single GPU

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme

This command starts training with Automatic Mixed Precision (AMP) on a single GPU. It requires setting the batch size and AMP parameters, along with the dataset path. The batch size can be tuned for optimal performance on the specific hardware.

```python
python ./main.py --batch-size 64 --amp --static-loss-scale 128 $PATH_TO_IMAGENET
```

--------------------------------

### Run Training (FP32) on Single GPU

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme

This command initiates training using the main.py script for FP32 precision on a single GPU. It requires specifying the batch size and the path to the ImageNet dataset. The batch size can be adjusted based on the machine's capabilities.

```python
python ./main.py --batch-size 64 $PATH_TO_IMAGENET
```

--------------------------------

### Numba Function Callback DALI

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/other_index

Shows how to use Numba to compile Python functions for use as callbacks within NVIDIA DALI pipelines. This allows for efficient execution of custom Python logic on the CPU or GPU by leveraging Numba's Just-In-Time (JIT) compilation. It requires Numba to be installed.

```python
import numba
from nvidia.dali.pipeline import Pipeline
from nvidia.dali.ops import numba_op

@numba.jit(nopython=True)
def numba_callback(input_array):
    # Perform computations using Numba
    return result_array

# Example usage in a DALI pipeline:
pipeline = Pipeline(...)
with pipeline:
    input_tensor = dali.fn.external_source(source=...)
    output_tensor = numba_op(input_tensor, function=numba_callback)
    pipeline.set_outputs(output_tensor)

```

--------------------------------

### Data Loading Examples

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.webdataset

Examples demonstrating how to load data using the WebDataset format and configure multi-GPU setups with NVIDIA DALI.

```APIDOC
## Data Loading Examples

### Description
Provides examples for reading data stored in the Webdataset format and configuring data loading in a multi-GPU setup using NVIDIA DALI.

### Method
N/A (Example Usage)

### Endpoint
N/A

### Parameters
N/A

### Request Example
N/A

### Response
N/A

#### Response Example
N/A
```

--------------------------------

### Run Training on DGX1V-16G (Standard Config)

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/use_cases/pytorch/efficientnet/readme

This command executes training in a standard configuration on a DGX1V-16G system, utilizing multiple GPUs, AMP, and DALI with AutoAugment. It specifies the batch size and dataset path.

```python
python multiproc.py --nproc_per_node 8 ./main.py --amp --static-loss-scale 128 --batch-size 128 $PATH_TO_IMAGENET
```

--------------------------------

### Get JAX Array Device Information

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/jax-getting_started

Prints the device backing the 'images' JAX array. This helps in verifying if preprocessing operations are utilizing the CPU or GPU.

```python
print(f'Images backing device: {batch["images"].device()}')
```

--------------------------------

### Instantiate and Build DALI Pipeline (Python)

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/warp

Instantiates the defined DALI pipeline with specified batch size, number of threads, and device ID. It then builds the pipeline, preparing it for execution.

```python
batch_size = 32
pipe = example_pipeline(batch_size=batch_size, num_threads=2, device_id=0)
pipe.build()
```

--------------------------------

### Build and Run GPU BrightnessContrast Pipeline

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/brightness_contrast_example

Instantiates, builds, and runs the GPU DALI pipeline. It configures the batch size, number of threads, and device ID, then executes the pipeline and stores the output.

```python
pipe_gpu = bc_gpu_pipeline(batch_size=batch_size, num_threads=1, device_id=0)
pipe_gpu.build()
gpu_output = pipe_gpu.run()
```

--------------------------------

### DALI Pipeline Definition and Usage in Python

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index

This Python snippet demonstrates how to define a DALI pipeline using decorators and functions. It imports necessary DALI modules and shows a basic setup for using DALI with PyTorch, including an iterator for generic data loading. Ensure DALI and PyTorch are installed. This code is intended as a starting point for building custom data pipelines.

```python
from nvidia.dali.pipeline import pipeline_def
import nvidia.dali.types as types
import nvidia.dali.fn as fn
from nvidia.dali.plugin.pytorch import DALIGenericIterator
import os

# To run with different data, see documentation of nvidia.dali.fn.readers.file
```

--------------------------------

### DALI Geometric Transforms

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/operations_index

Provides examples of geometric transformations applicable to DALI tensors, such as scaling, rotation, and translation. These are fundamental for data augmentation in computer vision tasks.

```python
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types

class GeometricPipeline(Pipeline):
    def __init__(self, batch_size=2, num_threads=2, device_id=0):
        super(GeometricPipeline, self).__init__(batch_size, num_threads, device_id)
        self.input = self.feed_input(name='input', shape=[100, 100, 3]) # Example image shape
        # Example: Resize operation
        self.resize = ops.Resize(device='gpu', size=[128, 128])
        # Example: Rotate operation (requires a rotation matrix)
        # rotation_matrix = [[cos(theta), -sin(theta)], [sin(theta), cos(theta)]]
        # self.rotate = ops.Rotate(device='gpu', matrix=[...])

    def define_graph(self):
        resized_output = self.resize(self.input)
        # rotated_output = self.rotate(resized_output)
        return resized_output # or rotated_output

pipeline = GeometricPipeline()
pipeline.build()

# Dummy image data
import numpy as np
image_data = np.random.randint(0, 256, size=(2, 100, 100, 3), dtype=np.uint8)
data_batch = list(image_data)

output = pipeline.run(input_dict={'input': data_batch})
print(output[0].as_array())

```

--------------------------------

### TensorListGPU Class Documentation

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/data_types

Documentation for the TensorListGPU class, including its constructor overloads and key methods.

```APIDOC
## Class TensorListGPU

### Description
List of tensors residing in the GPU memory.

### Methods

#### __init__ (Overloaded)

1.  **__init__(self: TensorListGPU, object: CapsuleType, layout: str | None = None) -> None**
    *   **object** (CapsuleType) - DLPack object representing TensorList.
    *   **layout** (str, optional) - Layout of the data.

2.  **__init__(self: TensorListGPU, tl: TensorListGPU, layout: str | None = None) -> None**
    *   **tl** (TensorListGPU) - Another TensorListGPU object to copy from.
    *   **layout** (str, optional) - Layout of the data.

3.  **__init__(self: TensorListGPU, list_of_tensors: list, layout: str | None = None) -> None**
    *   **list_of_tensors** (list[TensorGPU]) - Python list of TensorGPU objects.
    *   **layout** (str, optional) - Layout of the data.

4.  **__init__(self: TensorListGPU, object: object, layout: str | None = None, device_id: SupportsInt = -1) -> None**
    *   **object** (object) - Python object that implements CUDA Array Interface.
    *   **layout** (str, optional) - Layout of the data.
    *   **device_id** (int, optional) - Device of where this tensor resides. If not provided, the current device is used.

5.  **__init__(self: TensorListGPU) -> None**
    Initializes an empty TensorListGPU.

#### __getitem__

*   **__getitem__(self: TensorListGPU, i: SupportsInt) -> TensorGPU**
    Returns a tensor at the given position `i` in the list.

#### as_cpu

*   **as_cpu(self: TensorListGPU) -> TensorListCPU**
    Returns a `TensorListCPU` object being a copy of this `TensorListGPU`.

#### as_reshaped_tensor

*   **as_reshaped_tensor(self: TensorListGPU, arg0: Sequence[SupportsInt]) -> TensorGPU**
    Returns a tensor that is a view of this `TensorList` cast to the given shape.
    This function can only be called if `TensorList` is contiguous in memory and the volumes of requested Tensor and TensorList match.

#### as_tensor

*   **as_tensor(self: TensorListGPU) -> TensorGPU**
    Returns a tensor that is a view of this `TensorList`.
    This function can only be called if `is_dense_tensor` returns True.

#### at

*   **at(self: TensorListGPU, arg0: SupportsInt) -> TensorGPU**
    Returns a tensor at the given position in the list. Deprecated for `__getitem__()`.

```

--------------------------------

### Start DALI Pipeline Thread (Python)

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/plugins/pytorch_dali_proxy

Starts the DALI pipeline thread. It is recommended to use the scope's __enter__/__exit__ for managing the thread lifecycle.

```python
from nvidia.dali.plugin.pytorch.experimental.proxy import DALIServer

dali_server = DALIServer()
dali_server.start_thread()
```

--------------------------------

### Build and Run Hybrid Pipeline (Python)

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/getting_started

This code snippet demonstrates how to instantiate, build, and run the hybrid decoding pipeline defined previously. It configures the pipeline with batch size, number of threads, and device ID. Finally, it executes the pipeline and displays the decoded images.

```python
pipe = hybrid_pipeline(
    batch_size=max_batch_size, num_threads=1, device_id=0, seed=1234
)
pipe.build()

pipe_out = pipe.run()
images, labels = pipe_out
show_images(images.as_cpu())
```

--------------------------------

### DALI WarpAffine Transformation Example

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/image_processing/index

Illustrates the use of the DALI WarpAffine operator for applying affine transformations to images. This includes translation, scaling, rotation, and shearing. The example covers defining the transformation matrix and applying it to input images.

```python
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import numpy as np

class WarpAffinePipeline(Pipeline):
    def __init__(self, batch_size=1, device='cpu', matrix=None):
        super(WarpAffinePipeline, self).__init__(batch_size, num_threads=1, device=device)
        self.input = ops.CudaHostInput(device='cpu', batch_size=batch_size)
        # If matrix is None, a default identity matrix can be used or generated.
        # For specific transformations, provide a numpy array representing the 2x3 affine matrix.
        self.warp = ops.WarpAffine(device=device, matrix=matrix)

    def define_graph(self):
        self.images = self.input(name='input')
        transformed_images = self.warp(self.images)
        return transformed_images

# Example usage:
# Define an affine transformation matrix (e.g., for rotation and translation)
# Example: Rotate 30 degrees and translate by (10, 5)
angle = np.deg2rad(30)
cos_a, sin_a = np.cos(angle), np.sin(angle)
tx, ty = 10, 5

# DALI expects a 2x3 matrix for affine transformations
# The matrix format is [[a, b, tx], [c, d, ty]] where
# [a, c] is the first column of the rotation/scale part
# [b, d] is the second column of the rotation/scale part
# [tx, ty] is the translation part

# For rotation by 'angle' and scale 's':
# a = s * cos(angle), b = -s * sin(angle)
# c = s * sin(angle), d = s * cos(angle)

s = 1.0 # No scaling in this example

affine_matrix = np.array([
    [cos_a, -sin_a, tx],
    [sin_a,  cos_a, ty]
], dtype=np.float32)

# If using GPU, instantiate pipeline with device='gpu'
pipe = WarpAffinePipeline(matrix=affine_matrix, device='cpu')
pipe.build()

# ... feed image data to pipe.run()

```

--------------------------------

### Iterate Through Batches with JAX Iterator

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/frameworks/jax/jax-getting_started

Shows how to iterate through all batches provided by a DALI JAX iterator using a for loop. It also demonstrates how to get the total number of batches in the dataset using the `len()` function.

```python
iterator = iterator_fn(batch_size=1)
print(f"Iterator size: {len(iterator)}")

for batch_id, batch in enumerate(iterator):
    print(batch_id)
```

--------------------------------

### Caffe2 LMDB Data Loading Example

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/operations/nvidia.dali.fn.readers.caffe2

An example demonstrating how to read data stored in LMDB in the Caffe 2 format using NVIDIA DALI.

```APIDOC
## Example: Reading LMDB Data in Caffe 2 Format

### Description
This example illustrates the process of loading data from an LMDB database that is structured in the Caffe 2 format using the NVIDIA DALI library.

### Method
N/A (Example Code)

### Endpoint
N/A (Example Code)

### Parameters
N/A (Example Code)

### Request Example
```python
# Example Python code snippet demonstrating LMDB reading
# (Actual code would be more detailed)
from nvidia.dali.plugin.pytorch import DALIClassificationIterator
import nvidia.dali.fn as fn

# Assuming lmdb_path is defined and DALI pipeline is set up
pipe = fn.readers.caffe2(path=lmdb_path, random_shuffle=True)
# ... further pipeline definition ...

dali_iter = DALIClassificationIterator(pipe, size=num_samples)

for data in dali_iter:
    # Process loaded data
    pass
```

### Response
#### Success Response (N/A)
N/A

#### Response Example
N/A
```

--------------------------------

### DALI Iterator with Sharding for Distributed Runs

Source: https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/custom_operations/index

This snippet details the setup of a DALI iterator that supports sharding for distributed execution. It ensures that each process in a multi-GPU setup receives a unique subset of the data.

```python
from nvidia.dali.plugin.base_iterator import DALIBaseIterator

class ShardedIterator(DALIBaseIterator):
    def __init__(self, pipelines, size, auto_reset=False, fill_value=None, last_batch_padded=False):
        super(ShardedIterator, self).__init__(pipelines, size, auto_reset=auto_reset, fill_value=fill_value, last_batch_padded=last_batch_padded)
        # Initialize sharding specific variables if needed
        self.rank = 0 # Example: will be set externally
        self.world_size = 1 # Example: will be set externally

    def __next__(self):
        # Logic to fetch and potentially shard batches
        batch = super(ShardedIterator, self).__next__()
        # Apply sharding logic here if not handled by DALI pipeline itself
        return batch
```