========================
CODE SNIPPETS
========================
TITLE: Running the DeviceMesh 2D Setup with TorchRun
DESCRIPTION: Command line instruction to run the 2D parallel setup with DeviceMesh using TorchRun, requiring 8 processes per node.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/distributed_device_mesh.rst#2025-04-22_snippet_3

LANGUAGE: python
CODE:
```
torchrun --nproc_per_node=8 2d_setup_with_device_mesh.py
```

----------------------------------------

TITLE: Running the HSDP Setup with TorchRun
DESCRIPTION: Command line instruction to run the Hybrid Sharding Data Parallel setup with TorchRun, requiring 8 processes per node.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/distributed_device_mesh.rst#2025-04-22_snippet_5

LANGUAGE: python
CODE:
```
torchrun --nproc_per_node=8 hsdp.py
```

----------------------------------------

TITLE: Building Tutorial Documentation
DESCRIPTION: Command for building HTML version of the tutorial website without executing code examples.

SOURCE: https://github.com/pytorch/tutorials/blob/main/CONTRIBUTING.md#2025-04-22_snippet_2

LANGUAGE: bash
CODE:
```
make html-noplot
```

----------------------------------------

TITLE: Installing NUMA Control Tools
DESCRIPTION: Commands to install numactl and taskset utilities on Ubuntu and CentOS

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/xeon_run_cpu.rst#2025-04-22_snippet_1

LANGUAGE: console
CODE:
```
$ apt-get install numactl
```

LANGUAGE: console
CODE:
```
$ yum install numactl
```

LANGUAGE: console
CODE:
```
$ apt-get install util-linux
```

LANGUAGE: console
CODE:
```
$ yum install util-linux
```

----------------------------------------

TITLE: Installing Memory Allocators
DESCRIPTION: Commands to install TCMalloc and JeMalloc memory allocators on different platforms

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/xeon_run_cpu.rst#2025-04-22_snippet_3

LANGUAGE: console
CODE:
```
$ apt-get install google-perftools
```

LANGUAGE: console
CODE:
```
$ yum install gperftools
```

LANGUAGE: console
CODE:
```
$ conda install conda-forge::gperftools
```

LANGUAGE: console
CODE:
```
$ apt-get install libjemalloc2
```

LANGUAGE: console
CODE:
```
$ yum install jemalloc
```

LANGUAGE: console
CODE:
```
$ conda install conda-forge::jemalloc
```

----------------------------------------

TITLE: Installing Intel OpenMP Runtime Library
DESCRIPTION: Commands to install Intel OpenMP Runtime Library using pip or conda

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/xeon_run_cpu.rst#2025-04-22_snippet_2

LANGUAGE: console
CODE:
```
$ pip install intel-openmp
```

LANGUAGE: console
CODE:
```
$ conda install mkl
```

----------------------------------------

TITLE: Setting up 2D Parallel Pattern With DeviceMesh in PyTorch
DESCRIPTION: This code demonstrates how to use DeviceMesh to simplify the setup of a 2D parallel pattern. It shows how to initialize a device mesh and access the underlying process groups.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/distributed_device_mesh.rst#2025-04-22_snippet_2

LANGUAGE: python
CODE:
```
from torch.distributed.device_mesh import init_device_mesh
mesh_2d = init_device_mesh("cuda", (2, 4), mesh_dim_names=("replicate", "shard"))

# Users can access the underlying process group thru `get_group` API.
replicate_group = mesh_2d.get_group(mesh_dim="replicate")
shard_group = mesh_2d.get_group(mesh_dim="shard")
```

----------------------------------------

TITLE: Installing System and Python Prerequisites for WSI Analysis - Bash
DESCRIPTION: This snippet gives shell commands to install required system-level libraries (OpenJpeg, OpenSlide, Pixman) via apt-get for Linux, and the main Python dependencies (TIAToolbox <1.5, HistoEncoder) via pip. Successful installation is echoed in the terminal. Optional alternate homebrew instructions for macOS are mentioned in the surrounding text, but not directly included as a snippet. These commands are prerequisites for running subsequent code blocks that rely on TIAToolbox or handle .svs/.tif WSI data. No inputs/outputs other than terminal installation logs and confirmation message.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/tiatoolbox_tutorial.rst#2025-04-22_snippet_4

LANGUAGE: bash
CODE:
```
apt-get -y -qq install libopenjp2-7-dev libopenjp2-tools openslide-tools libpixman-1-dev
pip install -q 'tiatoolbox<1.5' histoencoder && echo "Installation is done."

```

----------------------------------------

TITLE: Installing PyTorch Dependencies
DESCRIPTION: Commands for installing the nightly build of PyTorch and torchvision, including options for CPU and CUDA support.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/quantized_transfer_learning_tutorial.rst#2025-04-22_snippet_0

LANGUAGE: shell
CODE:
```
pip install numpy
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
```

----------------------------------------

TITLE: Shell Commands to Build the Example LibTorch C++ Application
DESCRIPTION: This shell script demonstrates the common commands to build the C++ example application using CMake. It requires that CMake and LibTorch are properly installed. /path/to/libtorch must be the full path to the LibTorch directory. The commands create a build directory, run cmake to configure with the LibTorch prefix, and build the application in Release mode.

SOURCE: https://github.com/pytorch/tutorials/blob/main/advanced_source/cpp_export.rst#2025-04-22_snippet_6

LANGUAGE: sh
CODE:
```
mkdir build\ncd build\ncmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..\ncmake --build . --config Release
```

----------------------------------------

TITLE: Installing PyTorch Dependencies
DESCRIPTION: Command to install the latest version of PyTorch and related packages using pip.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/FSDP_advanced_tutorial.rst#2025-04-22_snippet_0

LANGUAGE: bash
CODE:
```
pip3 install torch torchvision torchaudio
```

----------------------------------------

TITLE: Adding PyTorch Entry Point in torch_npu setup (Diff)
DESCRIPTION: Shows a diff for torch_npu's setup.py to add the entry point required for device backend autoloading, specifying torch_npu's _autoload function. This ensures the extension can be discovered and loaded automatically by PyTorch at runtime. Requires setup function from setuptools and the _autoload function defined in torch_npu. The snippet documents only the modifications relevant to extension registration.

SOURCE: https://github.com/pytorch/tutorials/blob/main/prototype_source/python_extension_autoload.rst#2025-04-22_snippet_6

LANGUAGE: diff
CODE:
```
setup(
    name="torch_npu",
    version="2.5",
+   entry_points={
+       'torch.backends': [
+           'torch_npu = torch_npu:_autoload',
+       ],
+   }
)
```

----------------------------------------

TITLE: Running the 2D Setup with TorchRun
DESCRIPTION: Command line instruction to run the 2D parallel setup using TorchRun (PyTorch Elastic). It specifies to use 8 processes per node with a rendezvous endpoint.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/distributed_device_mesh.rst#2025-04-22_snippet_1

LANGUAGE: python
CODE:
```
torchrun --nproc_per_node=8 --rdzv_id=100 --rdzv_endpoint=localhost:29400 2d_setup.py
```

----------------------------------------

TITLE: Building Single Tutorial with Gallery Pattern
DESCRIPTION: Commands demonstrating how to build a specific tutorial using the GALLERY_PATTERN environment variable.

SOURCE: https://github.com/pytorch/tutorials/blob/main/README.md#2025-04-22_snippet_1

LANGUAGE: bash
CODE:
```
GALLERY_PATTERN="neural_style_transfer_tutorial.py" make html
```

LANGUAGE: bash
CODE:
```
GALLERY_PATTERN="neural_style_transfer_tutorial.py" sphinx-build . _build
```

----------------------------------------

TITLE: Running PyTorch Inference with run_cpu Script
DESCRIPTION: Example commands for running PyTorch inference in different configurations using the run_cpu script

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/xeon_run_cpu.rst#2025-04-22_snippet_4

LANGUAGE: console
CODE:
```
$ python -m torch.backends.xeon.run_cpu --ninstances 1 --ncores-per-instance 1 <program.py> [program_args]
```

LANGUAGE: console
CODE:
```
$ python -m torch.backends.xeon.run_cpu --node-id 0 <program.py> [program_args]
```

LANGUAGE: console
CODE:
```
$ python -m torch.backends.xeon.run_cpu --ninstances 8 --ncores-per-instance 14 <program.py> [program_args]
```

LANGUAGE: console
CODE:
```
$ python -m torch.backends.xeon.run_cpu --throughput-mode <program.py> [program_args]
```

LANGUAGE: console
CODE:
```
$ python -m torch.backends.xeon.run_cpu –h
usage: run_cpu.py [-h] [--multi-instance] [-m] [--no-python] [--enable-tcmalloc] [--enable-jemalloc] [--use-default-allocator] [--disable-iomp] [--ncores-per-instance] [--ninstances] [--skip-cross-node-cores] [--rank] [--latency-mode] [--throughput-mode] [--node-id] [--use-logical-core] [--disable-numactl] [--disable-taskset] [--core-list] [--log-path] [--log-file-prefix] <program> [program_args]
```

----------------------------------------

TITLE: Specifying the Entry Point in setup.py for Autoloading (Python)
DESCRIPTION: Shows how to add an entry_points section to the setup() function in setup.py so that the package registers itself as a PyTorch backend. This informs PyTorch to call the specified function (here, _autoload) via the entry point mechanism. Requires setuptools and PyTorch installation. The 'torch.backends' entry ensures the specified module:function is discoverable by PyTorch's autoload machinery. The parameters include the package name, version, and entry_points dictionary.

SOURCE: https://github.com/pytorch/tutorials/blob/main/prototype_source/python_extension_autoload.rst#2025-04-22_snippet_1

LANGUAGE: python
CODE:
```
setup(
    name="torch_foo",
    version="1.0",
    entry_points={
        "torch.backends": [
            "torch_foo = torch_foo:_autoload",
        ],
    }
)
```

----------------------------------------

TITLE: Building and Running the PyTorch C++ DCGAN Example
DESCRIPTION: Contains shell commands demonstrating how to build the DCGAN C++ example using `make` and then run the resulting executable (`./dcgan`). The included sample output verifies that the program runs and successfully loads data batches from the MNIST dataset, printing batch sizes and labels as defined in the iteration loop. Requires `make`, a C++ toolchain, and a configured build system (likely CMake).

SOURCE: https://github.com/pytorch/tutorials/blob/main/advanced_source/cpp_frontend.rst#2025-04-22_snippet_22

LANGUAGE: shell
CODE:
```
  root@fa350df05ecf:/home/build# make
  Scanning dependencies of target dcgan
  [ 50%] Building CXX object CMakeFiles/dcgan.dir/dcgan.cpp.o
  [100%] Linking CXX executable dcgan
  [100%] Built target dcgan
  root@fa350df05ecf:/home/build# make
  [100%] Built target dcgan
  root@fa350df05ecf:/home/build# ./dcgan
  Batch size: 64 | Labels: 5 2 6 7 2 1 6 7 0 1 6 2 3 6 9 1 8 4 0 6 5 3 3 0 4 6 6 6 4 0 8 6 0 6 9 2 4 0 2 8 6 3 3 2 9 2 0 1 4 2 3 4 8 2 9 9 3 5 8 0 0 7 9 9
  Batch size: 64 | Labels: 2 2 4 7 1 2 8 8 6 9 0 2 2 9 3 6 1 3 8 0 4 4 8 8 8 9 2 6 4 7 1 5 0 9 7 5 4 3 5 4 1 2 8 0 7 1 9 6 1 6 5 3 4 4 1 2 3 2 3 5 0 1 6 2
  Batch size: 64 | Labels: 4 5 4 2 1 4 8 3 8 3 6 1 5 4 3 6 2 2 5 1 3 1 5 0 8 2 1 5 3 2 4 4 5 9 7 2 8 9 2 0 6 7 4 3 8 3 5 8 8 3 0 5 8 0 8 7 8 5 5 6 1 7 8 0
  Batch size: 64 | Labels: 3 3 7 1 4 1 6 1 0 3 6 4 0 2 5 4 0 4 2 8 1 9 6 5 1 6 3 2 8 9 2 3 8 7 4 5 9 6 0 8 3 0 0 6 4 8 2 5 4 1 8 3 7 8 0 0 8 9 6 7 2 1 4 7
  Batch size: 64 | Labels: 3 0 5 5 9 8 3 9 8 9 5 9 5 0 4 1 2 7 7 2 0 0 5 4 8 7 7 6 1 0 7 9 3 0 6 3 2 6 2 7 6 3 3 4 0 5 8 8 9 1 9 2 1 9 4 4 9 2 4 6 2 9 4 0
  Batch size: 64 | Labels: 9 6 7 5 3 5 9 0 8 6 6 7 8 2 1 9 8 8 1 1 8 2 0 7 1 4 1 6 7 5 1 7 7 4 0 3 2 9 0 6 6 3 4 4 8 1 2 8 6 9 2 0 3 1 2 8 5 6 4 8 5 8 6 2
  Batch size: 64 | Labels: 9 3 0 3 6 5 1 8 6 0 1 9 9 1 6 1 7 7 4 4 4 7 8 8 6 7 8 2 6 0 4 6 8 2 5 3 9 8 4 0 9 9 3 7 0 5 8 2 4 5 6 2 8 2 5 3 7 1 9 1 8 2 2 7
  Batch size: 64 | Labels: 9 1 9 2 7 2 6 0 8 6 8 7 7 4 8 6 1 1 6 8 5 7 9 1 3 2 0 5 1 7 3 1 6 1 0 8 6 0 8 1 0 5 4 9 3 8 5 8 4 8 0 1 2 6 2 4 2 7 7 3 7 4 5 3
  Batch size: 64 | Labels: 8 8 3 1 8 6 4 2 9 5 8 0 2 8 6 6 7 0 9 8 3 8 7 1 6 6 2 7 7 4 5 5 2 1 7 9 5 4 9 1 0 3 1 9 3 9 8 8 5 3 7 5 3 6 8 9 4 2 0 1 2 5 4 7
  Batch size: 64 | Labels: 9 2 7 0 8 4 4 2 7 5 0 0 6 2 0 5 9 5 9 8 8 9 3 5 7 5 4 7 3 0 5 7 6 5 7 1 6 2 8 7 6 3 2 6 5 6 1 2 7 7 0 0 5 9 0 0 9 1 7 8 3 2 9 4
  Batch size: 64 | Labels: 7 6 5 7 7 5 2 2 4 9 9 4 8 7 4 8 9 4 5 7 1 2 6 9 8 5 1 2 3 6 7 8 1 1 3 9 8 7 9 5 0 8 5 1 8 7 2 6 5 1 2 0 9 7 4 0 9 0 4 6 0 0 8 6
  ...
```

----------------------------------------

TITLE: Installing Vulkan SDK on macOS (Shell)
DESCRIPTION: Shell commands to navigate to the Vulkan SDK root directory, source the environment setup script, and run the Python installation script for the Vulkan SDK on macOS. Requires the Vulkan SDK to be downloaded and unpacked, and the `VULKAN_SDK_ROOT` environment variable to be set.

SOURCE: https://github.com/pytorch/tutorials/blob/main/prototype_source/vulkan_workflow.rst#2025-04-22_snippet_0

LANGUAGE: shell
CODE:
```
cd $VULKAN_SDK_ROOT
source setup-env.sh
sudo python install_vulkan.py
```

----------------------------------------

TITLE: FSDP Main Training Setup
DESCRIPTION: Main function that sets up FSDP training environment, including data loading, model wrapping, and training initialization.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/FSDP_tutorial.rst#2025-04-22_snippet_5

LANGUAGE: python
CODE:
```
def fsdp_main(rank, world_size, args):
    setup(rank, world_size)

    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])

    dataset1 = datasets.MNIST('../data', train=True, download=True,
                        transform=transform)
    dataset2 = datasets.MNIST('../data', train=False,
                        transform=transform)

    sampler1 = DistributedSampler(dataset1, rank=rank, num_replicas=world_size, shuffle=True)
    sampler2 = DistributedSampler(dataset2, rank=rank, num_replicas=world_size)

    train_kwargs = {'batch_size': args.batch_size, 'sampler': sampler1}
    test_kwargs = {'batch_size': args.test_batch_size, 'sampler': sampler2}
    cuda_kwargs = {'num_workers': 2,
                    'pin_memory': True,
                    'shuffle': False}
    train_kwargs.update(cuda_kwargs)
    test_kwargs.update(cuda_kwargs)

    train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs)
    test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)
    my_auto_wrap_policy = functools.partial(
        size_based_auto_wrap_policy, min_num_params=100
    )
    torch.cuda.set_device(rank)
    init_start_event = torch.cuda.Event(enable_timing=True)
```

----------------------------------------

TITLE: Adding PyTorch Entry Point to setup() for habana_frameworks Extension (Diff)
DESCRIPTION: Presents a diff showing how to add an entry_points section to setup.py in the habana_frameworks package. This registers 'device_backend' as an entrypoint, mapped to the __autoload function, allowing PyTorch to autoload the Intel Gaudi HPU backend. The snippet is written in diff format to reflect changes required. It requires setuptool's setup function and that the entry point module and function are available. This modification is essential for making the extension compatible with the PyTorch autoload feature.

SOURCE: https://github.com/pytorch/tutorials/blob/main/prototype_source/python_extension_autoload.rst#2025-04-22_snippet_3

LANGUAGE: diff
CODE:
```
setup(
    name="habana_frameworks",
    version="2.5",
+   entry_points={
+       'torch.backends': [
+           "device_backend = habana_frameworks:__autoload",
+       ],
+   }
)
```

----------------------------------------

TITLE: Installing TorchVision (Shell/Pip)
DESCRIPTION: Command to install the `torchvision` library using pip. TorchVision provides access to popular datasets, model architectures, and common image transformations for computer vision, needed here to get a pretrained model.

SOURCE: https://github.com/pytorch/tutorials/blob/main/prototype_source/vulkan_workflow.rst#2025-04-22_snippet_6

LANGUAGE: shell
CODE:
```
pip install torchvision
```

----------------------------------------

TITLE: Python Setup Configuration
DESCRIPTION: Python setup script for building the C++ extension using PyTorch's cpp_extension module.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/process_group_cpp_extension_tutorial.rst#2025-04-22_snippet_3

LANGUAGE: Python
CODE:
```
# file name: setup.py
import os
```

----------------------------------------

TITLE: Simplified Process Group Initialization with torchrun
DESCRIPTION: Demonstrates the simplified process group initialization using torchrun compared to manual setup.

SOURCE: https://github.com/pytorch/tutorials/blob/main/beginner_source/ddp_series_fault_tolerance.rst#2025-04-22_snippet_1

LANGUAGE: diff
CODE:
```
- def ddp_setup(rank, world_size):
+ def ddp_setup():
-     """
-     Args:
-         rank: Unique identifier of each process
-         world_size: Total number of processes
-     """
-     os.environ["MASTER_ADDR"] = "localhost"
-     os.environ["MASTER_PORT"] = "12355"
-     init_process_group(backend="nccl", rank=rank, world_size=world_size)
+     init_process_group(backend="nccl")
     torch.cuda.set_device(int(os.environ["LOCAL_RANK"]))
```

----------------------------------------

TITLE: Folder Layouts for LibTorch and Example Application (Shell)
DESCRIPTION: These shell code listings present typical directory layouts for a LibTorch installation and an example C++ application project. They aid in understanding file/folder placement and are useful for configuring build systems or referencing header/library locations. The structure distinguishes between library files and the application source.

SOURCE: https://github.com/pytorch/tutorials/blob/main/advanced_source/cpp_export.rst#2025-04-22_snippet_5

LANGUAGE: sh
CODE:
```
libtorch/\n  bin/\n  include/\n  lib/\n  share/
```

LANGUAGE: sh
CODE:
```
example-app/\n  CMakeLists.txt\n  example-app.cpp
```

----------------------------------------

TITLE: Building a PyTorch C++ Extension with setuptools in Python
DESCRIPTION: This Python snippet provides an example setup.py script for building a PyTorch C++ extension (out-of-tree backend) using setuptools and torch.utils.cpp_extension. It specifies the package name, C++ source files, include directories, compiler and linker flags, and custom build extensions. The example assumes existence of variables such as torch_xla_sources, include_dirs, and extra_compile_args, which must be defined as required by your backend.

SOURCE: https://github.com/pytorch/tutorials/blob/main/advanced_source/extend_dispatcher.rst#2025-04-22_snippet_6

LANGUAGE: Python
CODE:
```
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CppExtension

setup(
    name='torch_xla',
    ext_modules=[
        CppExtension(
            '_XLAC',
            torch_xla_sources,
            include_dirs=include_dirs,
            extra_compile_args=extra_compile_args,
            library_dirs=library_dirs,
            extra_link_args=extra_link_args + \
                [make_relative_rpath('torch_xla/lib')],
        ),
    ],
    cmdclass={
        'build_ext': Build,  # Build is a derived class of BuildExtension
    }
    # more configs...
)

```

----------------------------------------

TITLE: Distributed Training Setup Functions
DESCRIPTION: Helper functions to initialize and cleanup distributed training process groups for FSDP implementation.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/FSDP_tutorial.rst#2025-04-22_snippet_1

LANGUAGE: python
CODE:
```
def setup(rank, world_size):
    os.environ['MASTER_ADDR'] = 'localhost'
    os.environ['MASTER_PORT'] = '12355'
    # initialize the process group
    dist.init_process_group("nccl", rank=rank, world_size=world_size)

def cleanup():
    dist.destroy_process_group()
```

----------------------------------------

TITLE: Installing PyTorch and Intel GPU Backend Dependencies (bash)
DESCRIPTION: This command installs the required PyTorch stack and Triton backend for Intel GPUs from the official index. It is a prerequisite for running all subsequent Python code in the tutorial. Dependencies installed include torch, torchvision, torchaudio, and pytorch-triton-xpu.

SOURCE: https://github.com/pytorch/tutorials/blob/main/prototype_source/pt2e_quant_xpu_inductor.rst#2025-04-22_snippet_0

LANGUAGE: bash
CODE:
```
pip3 install torch torchvision torchaudio pytorch-triton-xpu --index-url https://download.pytorch.org/whl/xpu
```

----------------------------------------

TITLE: Running QAT Example with Inductor Freezing Enabled
DESCRIPTION: Example command to run the Quantization-Aware Training example with the Inductor freezing feature enabled. This is necessary since the freezing feature is not enabled by default in PyTorch.

SOURCE: https://github.com/pytorch/tutorials/blob/main/prototype_source/pt2e_quant_x86_inductor.rst#2025-04-22_snippet_11

LANGUAGE: bash
CODE:
```
TORCHINDUCTOR_FREEZING=1 python example_x86inductorquantizer_qat.py
```

----------------------------------------

TITLE: Installing PyTorch for AWS Graviton
DESCRIPTION: Command to install PyTorch which supports AWS Graviton3 optimizations starting with version 2.0.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/inference_tuning_on_aws_graviton.rst#2025-04-22_snippet_0

LANGUAGE: bash
CODE:
```
python3 -m pip install torch
```

----------------------------------------

TITLE: Training ResNet50 with FP32 using Intel Extension for PyTorch
DESCRIPTION: Demonstrates how to train a ResNet50 model on CIFAR10 dataset using FP32 precision with Intel Extension for PyTorch backend. Includes data loading, model setup, optimization and training loop implementation.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/torch_compile_backend_ipex.rst#2025-04-22_snippet_0

LANGUAGE: python
CODE:
```
import torch
import torchvision

LR = 0.001
DOWNLOAD = True
DATA = 'datasets/cifar10/'

transform = torchvision.transforms.Compose([
  torchvision.transforms.Resize((224, 224)),
  torchvision.transforms.ToTensor(),
  torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_dataset = torchvision.datasets.CIFAR10(
  root=DATA,
  train=True,
  transform=transform,
  download=DOWNLOAD,
)
train_loader = torch.utils.data.DataLoader(
  dataset=train_dataset,
  batch_size=128
)

model = torchvision.models.resnet50()
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = LR, momentum=0.9)
model.train()

import intel_extension_for_pytorch as ipex

# Invoke the following API optionally, to apply frontend optimizations
model, optimizer = ipex.optimize(model, optimizer=optimizer)

compile_model = torch.compile(model, backend="ipex")

for batch_idx, (data, target) in enumerate(train_loader):
    optimizer.zero_grad()
    output = compile_model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
```

----------------------------------------

TITLE: Example Output of CommDebugMode with MLPModule
DESCRIPTION: This shows sample output from CommDebugMode when applied to an MLPModule at noise level 0. It displays the collective operation counts at module level, showing where operations like all_reduce occur in the forward pass of the model.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/distributed_comm_debug_mode.rst#2025-04-22_snippet_1

LANGUAGE: python
CODE:
```
Expected Output:
    Global
      FORWARD PASS
        *c10d_functional.all_reduce: 1
        MLPModule
          FORWARD PASS
            *c10d_functional.all_reduce: 1
            MLPModule.net1
            MLPModule.relu
            MLPModule.net2
              FORWARD PASS
                *c10d_functional.all_reduce: 1
```

----------------------------------------

TITLE: Verifying PyTorch Installation (Python)
DESCRIPTION: A simple Python script to import the `torch` library and print its version number. This is used to verify that PyTorch has been successfully built and installed.

SOURCE: https://github.com/pytorch/tutorials/blob/main/prototype_source/vulkan_workflow.rst#2025-04-22_snippet_3

LANGUAGE: python
CODE:
```
import torch
print(torch.__version__)
```

----------------------------------------

TITLE: Data Loading and Transformation Setup
DESCRIPTION: Configures data loading pipelines with transformations for training and validation datasets using torchvision.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/quantized_transfer_learning_tutorial.rst#2025-04-22_snippet_1

LANGUAGE: python
CODE:
```
import torch
from torchvision import transforms, datasets

data_transforms = {
    'train': transforms.Compose([
        transforms.Resize(224),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'data/hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=16,
                                              shuffle=True, num_workers=8)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
```

----------------------------------------

TITLE: FSDP Training Setup and Imports
DESCRIPTION: Imports required packages for FSDP training including PyTorch core libraries, transformers, and distributed training utilities.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/FSDP_advanced_tutorial.rst#2025-04-22_snippet_1

LANGUAGE: python
CODE:
```
import os
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from transformers import AutoTokenizer, GPT2TokenizerFast
from transformers import T5Tokenizer, T5ForConditionalGeneration
import functools
from torch.optim.lr_scheduler import StepLR
import torch.nn.functional as F
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.utils.data.distributed import DistributedSampler
from transformers.models.t5.modeling_t5 import T5Block

from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
 checkpoint_wrapper,
 CheckpointImpl,
 apply_activation_checkpointing_wrapper)

from torch.distributed.fsdp import (
    FullyShardedDataParallel as FSDP,
    MixedPrecision,
    BackwardPrefetch,
    ShardingStrategy,
    FullStateDictConfig,
    StateDictType,
)
from torch.distributed.fsdp.wrap import (
    transformer_auto_wrap_policy,
    enable_wrap,
    wrap,
)
from functools import partial
from torch.utils.data import DataLoader
from pathlib import Path
from summarization_dataset import *
from transformers.models.t5.modeling_t5 import T5Block
from typing import Type
import time
import tqdm
from datetime import datetime
```

----------------------------------------

TITLE: Installing LibTorch Dependencies in Shell
DESCRIPTION: Downloads and extracts the LibTorch distribution for CPU usage on Ubuntu Linux.

SOURCE: https://github.com/pytorch/tutorials/blob/main/advanced_source/cpp_frontend.rst#2025-04-22_snippet_0

LANGUAGE: shell
CODE:
```
wget https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip
unzip libtorch-shared-with-deps-latest.zip
```

----------------------------------------

TITLE: HuggingFace T5 Model Setup
DESCRIPTION: Function to initialize the T5 model and tokenizer from HuggingFace pretrained models.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/FSDP_advanced_tutorial.rst#2025-04-22_snippet_3

LANGUAGE: python
CODE:
```
def setup_model(model_name):
    model = T5ForConditionalGeneration.from_pretrained(model_name)
    tokenizer =  T5Tokenizer.from_pretrained(model_name)
    return model, tokenizer
```

----------------------------------------

TITLE: Referencing C++ Custom Operator Example in PyTorch
DESCRIPTION: Example of how to reference the C++ custom operator tutorial in PyTorch documentation. This shows the syntax for referencing other documentation pages.

SOURCE: https://github.com/pytorch/tutorials/blob/main/advanced_source/custom_ops_landing_page.rst#2025-04-22_snippet_1

LANGUAGE: rst
CODE:
```
:ref:`cpp-custom-ops-tutorial`
```

----------------------------------------

TITLE: Setup Script for C++ Extension
DESCRIPTION: Python setup.py script to build the C++ extension using setuptools and torch.utils.cpp_extension.

SOURCE: https://github.com/pytorch/tutorials/blob/main/advanced_source/cpp_extension.rst#2025-04-22_snippet_2

LANGUAGE: python
CODE:
```
from setuptools import setup, Extension
from torch.utils import cpp_extension

setup(name='lltm_cpp',
      ext_modules=[cpp_extension.CppExtension('lltm_cpp', ['lltm.cpp'])],
      cmdclass={'build_ext': cpp_extension.BuildExtension})
```

----------------------------------------

TITLE: Referencing Python Custom Operator Example in PyTorch
DESCRIPTION: Example of how to reference the Python custom operator tutorial in PyTorch documentation. This shows the syntax for referencing other documentation pages.

SOURCE: https://github.com/pytorch/tutorials/blob/main/advanced_source/custom_ops_landing_page.rst#2025-04-22_snippet_0

LANGUAGE: rst
CODE:
```
:ref:`python-custom-ops-tutorial`
```

----------------------------------------

TITLE: Implementing Dynamic Neural Network with Control Flow in PyTorch
DESCRIPTION: A PyTorch implementation that demonstrates control flow and weight sharing in neural networks. This example creates a model that dynamically chooses between 3rd, 4th, or 5th order polynomials during each forward pass.

SOURCE: https://github.com/pytorch/tutorials/blob/main/beginner_source/pytorch_with_examples.rst#2025-04-22_snippet_7

LANGUAGE: python
CODE:
```
# -*- coding: utf-8 -*-
import random
import torch
import math


class DynamicNet(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate five parameters and assign them as members.
        """
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        self.e = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 4, 5
        and reuse the e parameter to compute the contribution of these orders.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same parameter many
        times when defining a computational graph.
        """
        y = self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3
        for exp in range(4, random.randint(4, 6)):
            y = y + self.e * x ** exp
        return y

    def string(self):
        """
        Just like any class in Python, you can also define custom method on PyTorch modules
        """
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3 + {self.e.item()} x^4 ? + {self.e.item()} x^5 ?'


# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Construct our model by instantiating the class defined above
model = DynamicNet()

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-8, momentum=0.9)
for t in range(30000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    if t % 2000 == 1999:
        print(t, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')
```

----------------------------------------

TITLE: Adding Gallery Items for PyTorch Neural Network Examples
DESCRIPTION: This code snippet adds gallery items for two PyTorch tutorial examples: a polynomial module and a dynamic network. It uses reStructuredText directives to include these examples in a gallery display.

SOURCE: https://github.com/pytorch/tutorials/blob/main/beginner_source/pytorch_with_examples.rst#2025-04-22_snippet_8

LANGUAGE: reStructuredText
CODE:
```
.. galleryitem:: /beginner/examples_nn/polynomial_module.py

.. galleryitem:: /beginner/examples_nn/dynamic_net.py

.. raw:: html

    <div style='clear:both'></div>
```

----------------------------------------

TITLE: Implementing Polynomial Fitting with PyTorch Optimizer
DESCRIPTION: A PyTorch implementation that uses the optim package to update model parameters. This example demonstrates how to use built-in optimizers like RMSprop instead of manually implementing gradient descent.

SOURCE: https://github.com/pytorch/tutorials/blob/main/beginner_source/pytorch_with_examples.rst#2025-04-22_snippet_5

LANGUAGE: python
CODE:
```
# -*- coding: utf-8 -*-
import torch
import math

# Create Tensors to hold input and outputs.
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# Prepare the input tensor (x, x^2, x^3).
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use RMSprop; the optim package contains many other
# optimization algorithms. The first argument to the RMSprop constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(xx)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()


linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')
```

----------------------------------------

TITLE: Starting the Flask Server for the Image Classifier API - Shell
DESCRIPTION: Command for launching the Flask app from the shell. FLASK_APP is set to 'app.py', and flask run starts the web server (default on port 5000). Requires Flask to be installed. Expects 'app.py' to be present in the working directory.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/deployment_with_flask.rst#2025-04-22_snippet_6

LANGUAGE: shell
CODE:
```
FLASK_APP=app.py flask run
```

----------------------------------------

TITLE: Installing Holistic Trace Analysis using pip
DESCRIPTION: Command to install the HolisticTraceAnalysis package using pip. This is the primary installation method for HTA.

SOURCE: https://github.com/pytorch/tutorials/blob/main/beginner_source/hta_intro_tutorial.rst#2025-04-22_snippet_0

LANGUAGE: python
CODE:
```
pip install HolisticTraceAnalysis
```

----------------------------------------

TITLE: Bundling Example Inputs to Scripted Model - PyTorch - Python
DESCRIPTION: Demonstrates how to create a list of example inputs (for 'forward') and attach them to a TorchScript module using the bundle_inputs utility. The sample input tuple must match the model input signature. This step creates a bunded_model with embedded sample inputs, suitable for later retrieval or testing. bundle_inputs comes from torch.utils.bundled_inputs; ensure it is imported.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/bundled_inputs.rst#2025-04-22_snippet_1

LANGUAGE: python
CODE:
```
# For each method create a list of inputs and each input is a tuple of arguments\nsample_input = [(torch.zeros(1,10),)]\n\n# Create model with bundled inputs, if type(input) is list then the input is bundled to 'forward'\nbundled_model = bundle_inputs(scripted_module, sample_input)
```

----------------------------------------

TITLE: Installing Intel Neural Compressor
DESCRIPTION: Commands for installing Intel Neural Compressor from pip or conda. Supports Python versions 3.6, 3.7, 3.8, and 3.9.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/intel_neural_compressor_for_pytorch.rst#2025-04-22_snippet_0

LANGUAGE: bash
CODE:
```
# install stable version from pip
pip install neural-compressor

# install nightly version from pip
pip install -i https://test.pypi.org/simple/ neural-compressor

# install stable version from from conda
conda install neural-compressor -c conda-forge -c intel
```

----------------------------------------

TITLE: Inference with ResNet50 in FP32 using Intel Extension for PyTorch
DESCRIPTION: Shows how to perform inference using a pre-trained ResNet50 model with FP32 precision using Intel Extension for PyTorch backend. Includes model optimization and inference setup.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/torch_compile_backend_ipex.rst#2025-04-22_snippet_2

LANGUAGE: python
CODE:
```
import torch
import torchvision.models as models

model = models.resnet50(weights='ResNet50_Weights.DEFAULT')
model.eval()
data = torch.rand(1, 3, 224, 224)

import intel_extension_for_pytorch as ipex

# Invoke the following API optionally, to apply frontend optimizations
model = ipex.optimize(model, weights_prepack=False)

compile_model = torch.compile(model, backend="ipex")

with torch.no_grad():
    compile_model(data)
```

----------------------------------------

TITLE: Implementing Polynomial Fitting with NumPy
DESCRIPTION: A numpy implementation of fitting a third-order polynomial to a sine function. This example manually implements both the forward and backward passes through the network using numpy operations.

SOURCE: https://github.com/pytorch/tutorials/blob/main/beginner_source/pytorch_with_examples.rst#2025-04-22_snippet_0

LANGUAGE: python
CODE:
```
# -*- coding: utf-8 -*-
import numpy as np
import math

# Create random input and output data
np.random.seed(42)
x = np.random.randn(200, 1)
y = np.sin(x)

# Randomly initialize weights
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    # y = a + b * x + c * x^2 + d * x^3
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')
```

----------------------------------------

TITLE: Running DDP with a Model Parallel Architecture Example - PyTorch - Python
DESCRIPTION: Demonstrates initialization and training of a DDP-wrapped model-parallel network. Sets up device allocation per process, wraps the custom multi-GPU ToyMpModel in DistributedDataParallel, and walks through optimizer and loss setup, forward and backward passes, and process group cleanup. Assumes a distributed context with known rank and world_size and enough GPUs for parallel use. The outputs and targets are randomly generated. Dependencies: torch, torch.nn, torch.optim, setup, cleanup functions.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/ddp_tutorial.rst#2025-04-22_snippet_5

LANGUAGE: python
CODE:
```
def demo_model_parallel(rank, world_size):
    print(f"Running DDP with model parallel example on rank {rank}.")
    setup(rank, world_size)

    # setup mp_model and devices for this process
    dev0 = rank * 2
    dev1 = rank * 2 + 1
    mp_model = ToyMpModel(dev0, dev1)
    ddp_mp_model = DDP(mp_model)

    loss_fn = nn.MSELoss()
    optimizer = optim.SGD(ddp_mp_model.parameters(), lr=0.001)

    optimizer.zero_grad()
    # outputs will be on dev1
    outputs = ddp_mp_model(torch.randn(20, 10))
    labels = torch.randn(20, 5).to(dev1)
    loss_fn(outputs, labels).backward()
    optimizer.step()

    cleanup()
    print(f"Finished running DDP with model parallel example on rank {rank}.")
```

----------------------------------------

TITLE: Building PyTorch Extension with setuptools
DESCRIPTION: Terminal output showing the build process of the custom operator using setup.py. Demonstrates compilation, linking and installation steps.

SOURCE: https://github.com/pytorch/tutorials/blob/main/advanced_source/torch_script_custom_ops.rst#2025-04-22_snippet_26

LANGUAGE: shell
CODE:
```
$ python setup.py build develop
running build
running build_ext
building 'warp_perspective' extension
creating build
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /root/local/miniconda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/root/local/miniconda/lib/python3.7/site-packages/torch/lib/include -I/root/local/miniconda/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/root/local/miniconda/lib/python3.7/site-packages/torch/lib/include/TH -I/root/local/miniconda/lib/python3.7/site-packages/torch/lib/include/THC -I/root/local/miniconda/include/python3.7m -c op.cpp -o build/temp.linux-x86_64-3.7/op.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=warp_perspective -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
creating build/lib.linux-x86_64-3.7
g++ -pthread -shared -B /root/local/miniconda/compiler_compat -L/root/local/miniconda/lib -Wl,-rpath=/root/local/miniconda/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/op.o -lopencv_core -lopencv_imgproc -o build/lib.linux-x86_64-3.7/warp_perspective.so
running develop
running egg_info
creating warp_perspective.egg-info
writing warp_perspective.egg-info/PKG-INFO
writing dependency_links to warp_perspective.egg-info/dependency_links.txt
writing top-level names to warp_perspective.egg-info/top_level.txt
writing manifest file 'warp_perspective.egg-info/SOURCES.txt'
reading manifest file 'warp_perspective.egg-info/SOURCES.txt'
writing manifest file 'warp_perspective.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.7/warp_perspective.so ->
Creating /root/local/miniconda/lib/python3.7/site-packages/warp-perspective.egg-link (link to .)
Adding warp-perspective 0.0.0 to easy-install.pth file

Installed /warp_perspective
Processing dependencies for warp-perspective==0.0.0
Finished processing dependencies for warp-perspective==0.0.0
```

----------------------------------------

TITLE: Setting up 2D Parallel Pattern Without DeviceMesh in PyTorch
DESCRIPTION: This code demonstrates the manual setup of process groups for a 2D parallel pattern in PyTorch without using DeviceMesh. It involves calculating shard groups and replicate groups, then assigning them to each rank.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/distributed_device_mesh.rst#2025-04-22_snippet_0

LANGUAGE: python
CODE:
```
import os

import torch
import torch.distributed as dist

# Understand world topology
rank = int(os.environ["RANK"])
world_size = int(os.environ["WORLD_SIZE"])
print(f"Running example on {rank=} in a world with {world_size=}")

# Create process groups to manage 2-D like parallel pattern
dist.init_process_group("nccl")
torch.cuda.set_device(rank)

# Create shard groups (e.g. (0, 1, 2, 3), (4, 5, 6, 7))
# and assign the correct shard group to each rank
num_node_devices = torch.cuda.device_count()
shard_rank_lists = list(range(0, num_node_devices // 2)), list(range(num_node_devices // 2, num_node_devices))
shard_groups = (
    dist.new_group(shard_rank_lists[0]),
    dist.new_group(shard_rank_lists[1]),
)
current_shard_group = (
    shard_groups[0] if rank in shard_rank_lists[0] else shard_groups[1]
)

# Create replicate groups (for example, (0, 4), (1, 5), (2, 6), (3, 7))
# and assign the correct replicate group to each rank
current_replicate_group = None
shard_factor = len(shard_rank_lists[0])
for i in range(num_node_devices // 2):
    replicate_group_ranks = list(range(i, num_node_devices, shard_factor))
    replicate_group = dist.new_group(replicate_group_ranks)
    if rank in replicate_group_ranks:
        current_replicate_group = replicate_group
```

----------------------------------------

TITLE: Implementing HTML Meta Redirect to ExecuTorch Documentation
DESCRIPTION: This HTML meta tag creates an automatic redirect to the ExecuTorch documentation page after a 3-second delay, directing users from the deprecated PyTorch Mobile documentation to the currently supported alternative.

SOURCE: https://github.com/pytorch/tutorials/blob/main/recipes_source/model_preparation_ios.rst#2025-04-22_snippet_0

LANGUAGE: html
CODE:
```
<meta http-equiv="Refresh" content="3; url='https://pytorch.org/executorch/stable/index.html'" />
```

----------------------------------------

TITLE: Initializing Command-Line Arguments for PyTorch RPC Parameter Server
DESCRIPTION: Sets up argparse to handle command-line arguments for configuring the distributed training setup, including world size, rank, number of GPUs, master address, and port.

SOURCE: https://github.com/pytorch/tutorials/blob/main/intermediate_source/rpc_param_server_tutorial.rst#2025-04-22_snippet_12

LANGUAGE: python
CODE:
```
if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description="Parameter-Server RPC based training")
    parser.add_argument(
        "--world_size",
        type=int,
        default=4,
        help="""Total number of participating processes. Should be the sum of
        master node and all training nodes.""")
    parser.add_argument(
        "--rank",
        type=int,
        default=None,
        help="Global rank of this process. Pass in 0 for master.")
    parser.add_argument(
        "--num_gpus",
        type=int,
        default=0,
        help="""Number of GPUs to use for training, Currently supports between 0
         and 2 GPUs. Note that this argument will be passed to the parameter servers.""")
    parser.add_argument(
        "--master_addr",
        type=str,
        default="localhost",
        help="""Address of master, will default to localhost if not provided.
        Master must be able to accept network traffic on the address + port.""")
    parser.add_argument(
        "--master_port",
        type=str,
        default="29500",
        help="""Port that master is listening on, will default to 29500 if not
        provided. Master must be able to accept network traffic on the host and port.""")

    args = parser.parse_args()
    assert args.rank is not None, "must provide rank argument."
    assert args.num_gpus <= 3, f"Only 0-2 GPUs currently supported (got {args.num_gpus})."
    os.environ['MASTER_ADDR'] = args.master_addr
    os.environ["MASTER_PORT"] = args.master_port
```