### Downloading and Preparing PyTorch Distributed Examples

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/slurm/setup_pcluster_slurm.md

This snippet navigates to `/shared`, clones the PyTorch examples repository (shallow clone), filters it to retain only the `distributed/ddp-tutorial-series` subdirectory, installs a specific version of `setuptools`, and then installs project dependencies from `requirements.txt`.

```Shell
cd /shared
git clone --depth 1 https://github.com/pytorch/examples;
cd /shared/examples
git filter-branch --prune-empty --subdirectory-filter distributed/ddp-tutorial-series
python3 -m pip install setuptools==59.5.0
pip install -r requirements.txt
```

--------------------------------

### Downloading Training Code and Installing Requirements (Shell)

Source: https://github.com/pytorch/examples/blob/main/distributed/minGPT-ddp/mingpt/slurm/setup_pcluster_slurm.md

These commands navigate to the `/shared` directory, clone the PyTorch examples repository, filter it to the `minGPT-ddp` distributed example, and then install specific Python dependencies, including `setuptools` and those listed in the `requirements.txt` file, to prepare the environment for training.

```Shell
cd /shared
git clone --depth 1 https://github.com/pytorch/examples;
cd /shared/examples
git filter-branch --prune-empty --subdirectory-filter distributed/minGPT-ddp
python3 -m pip install setuptools==59.5.0
pip install -r requirements.txt
```

--------------------------------

### Installing Python Dependencies (Bash)

Source: https://github.com/pytorch/examples/blob/main/distributed/FSDP/README.md

This command installs all required Python packages listed in the `requirements.txt` file using pip. It's essential for setting up the Python environment needed to run the T5 training script.

```bash
pip install -r requirements.txt
```

--------------------------------

### Installing Dependencies and Running MNIST Hogwild Example (Bash)

Source: https://github.com/pytorch/examples/blob/main/mnist_hogwild/README.md

This snippet provides the commands to install the necessary Python dependencies from `requirements.txt` and then execute the `main.py` script to start the MNIST Hogwild training example.

```bash
pip install -r requirements.txt
python main.py
```

--------------------------------

### Installing Dependencies and Running Main Script (Bash)

Source: https://github.com/pytorch/examples/blob/main/mnist_forward_forward/README.md

This snippet provides instructions to set up the project environment by installing required Python packages from `requirements.txt` and then executing the main training script `main.py`. This is the standard way to prepare and run the Forward-Forward algorithm example.

```Bash
pip install -r requirements.txt
python main.py
```

--------------------------------

### Starting FSDP T5 Training with Torchrun (Bash)

Source: https://github.com/pytorch/examples/blob/main/distributed/FSDP/README.md

This command initiates the distributed training of the T5 model using Torchrun. It specifies one node and four processes per node, which should be adjusted based on the available GPU count. The `T5_training.py` script contains the core training logic.

```bash
torchrun --nnodes 1 --nproc_per_node 4  T5_training.py
```

--------------------------------

### Installing Dependencies for PyTorch RL Examples - Bash

Source: https://github.com/pytorch/examples/blob/main/reinforcement_learning/README.md

This command installs all necessary Python packages listed in the `requirements.txt` file. It is a prerequisite for running any of the reinforcement learning examples.

```bash
pip install -r requirements.txt
```

--------------------------------

### Installing Python Dependencies and Virtual Environment

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/slurm/setup_pcluster_slurm.md

This sequence of commands updates package lists, installs `python3-venv`, creates a Python virtual environment at `/shared/venv/`, activates it, installs the `wheel` package, and adds the activation command to `.bashrc` for persistent environment loading.

```Shell
sudo apt-get update
sudo apt-get install -y python3-venv
python3 -m venv /shared/venv/
source /shared/venv/bin/activate
pip install wheel
echo 'source /shared/venv/bin/activate' >> ~/.bashrc
```

--------------------------------

### Downloading Wikihow Dataset (Bash)

Source: https://github.com/pytorch/examples/blob/main/distributed/FSDP/README.md

This snippet executes a shell script to download the 'wikihow' dataset, which is a prerequisite for the T5 text summarization example. It ensures the necessary data is available before training.

```bash
sh download_dataset.sh
```

--------------------------------

### Running Distributed Pipeline Parallel Example

Source: https://github.com/pytorch/examples/blob/main/distributed/rpc/pipeline/README.md

This snippet provides the commands to set up the environment and execute the distributed pipeline parallel example. It first installs the necessary dependencies from `requirements.txt` and then runs the main script `main.py` to start the distributed application.

```Shell
pip install -r requirements.txt
python main.py
```

--------------------------------

### Installing Dependencies and Running the Main Script (Bash)

Source: https://github.com/pytorch/examples/blob/main/vae/README.md

This snippet provides the necessary commands to set up the project by installing its dependencies from 'requirements.txt' and then executing the main training script. This is a standard procedure for initializing and running Python-based projects.

```bash
pip install -r requirements.txt
python main.py
```

--------------------------------

### Building Documentation for New PyTorch Examples (Shell)

Source: https://github.com/pytorch/examples/blob/main/CONTRIBUTING.md

This script navigates to the `docs` directory, sets up a Python virtual environment, installs documentation dependencies from `requirements.txt`, and builds the HTML documentation. It's a prerequisite for verifying documentation changes for new examples.

```Shell
cd docs
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
make html
```

--------------------------------

### Running Distributed DataParallel and RPC Example

Source: https://github.com/pytorch/examples/blob/main/distributed/rpc/ddp_rpc/README.md

This snippet provides the commands to set up the environment and run the PyTorch distributed example. It first installs necessary dependencies from `requirements.txt` and then executes the main training script `main.py`.

```Bash
pip install -r requirements.txt
python main.py
```

--------------------------------

### Running the Distributed Reinforcement Learning Example (Shell)

Source: https://github.com/pytorch/examples/blob/main/distributed/rpc/rl/README.md

This snippet provides the commands to set up and run the distributed reinforcement learning example. It first installs the necessary Python dependencies listed in `requirements.txt` and then executes the main application script `main.py`.

```Shell
pip install -r requirements.txt
python main.py
```

--------------------------------

### Installing Python Dependencies on Cluster (Shell)

Source: https://github.com/pytorch/examples/blob/main/distributed/minGPT-ddp/mingpt/slurm/setup_pcluster_slurm.md

This sequence of commands updates the package lists, installs `python3-venv`, creates a Python virtual environment at `/shared/venv/`, activates it, installs the `wheel` package, and configures the virtual environment to activate automatically upon shell login for persistent access.

```Shell
sudo apt-get update
sudo apt-get install -y python3-venv
python3 -m venv /shared/venv/
source /shared/venv/bin/activate
pip install wheel
echo 'source /shared/venv/bin/activate' >> ~/.bashrc
```

--------------------------------

### Installing AWS CLI and ParallelCluster with pip

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/slurm/setup_pcluster_slurm.md

This snippet installs the AWS Command Line Interface (CLI) and AWS ParallelCluster using pip3. The `--user` flag ensures installation into the user's home directory, avoiding system-wide changes, and `-U` or `--upgrade` ensures the latest versions are installed.

```Shell
pip3 install awscli -U --user
pip3 install "aws-parallelcluster" --upgrade --user
```

--------------------------------

### Installing AWS CLI and ParallelCluster (Python)

Source: https://github.com/pytorch/examples/blob/main/distributed/minGPT-ddp/mingpt/slurm/setup_pcluster_slurm.md

This snippet installs the AWS Command Line Interface (CLI) and AWS ParallelCluster using pip3. It ensures the latest versions are installed and configured for the current user, which is a prerequisite for managing AWS resources and clusters.

```Shell
pip3 install awscli -U --user
pip3 install "aws-parallelcluster" --upgrade --user
```

--------------------------------

### Previewing Sphinx Documentation Locally (Shell)

Source: https://github.com/pytorch/examples/blob/main/CONTRIBUTING.md

This command uses `sphinx-serve` to host the built Sphinx documentation locally, allowing contributors to preview their changes in a web browser. `sphinx-serve` must be installed separately.

```Shell
sphinx-serve -b build
```

--------------------------------

### Installing Dependencies and Running Main Script (Bash)

Source: https://github.com/pytorch/examples/blob/main/siamese_network/README.md

This snippet provides commands to install necessary Python dependencies from `requirements.txt` and to execute the main `main.py` script. It also includes an optional command to specify a GPU ID using `CUDA_VISIBLE_DEVICES` for execution on a specific device.

```bash
pip install -r requirements.txt
python main.py
# CUDA_VISIBLE_DEVICES=2 python main.py  # to specify GPU id to ex. 2
```

--------------------------------

### Installing Dependencies and Running Example

Source: https://github.com/pytorch/examples/blob/main/distributed/tensor_parallelism/README.md

This snippet outlines the necessary steps to prepare the environment by installing all required Python packages from 'requirements.txt' and then executing the main 'example.py' script to run the PyTorch Tensor Parallel demonstration.

```Shell
pip install -r requirements.txt
python example.py
```

--------------------------------

### Installing GCN Dependencies and Running Main Script - Bash

Source: https://github.com/pytorch/examples/blob/main/gcn/README.md

This snippet outlines the steps to set up the project environment and execute the main application. It first installs all required Python packages listed in `requirements.txt` using pip, and then runs the `main.py` script, which typically initializes the application or performs initial setup.

```bash
pip install -r requirements.txt
python main.py
```

--------------------------------

### Running the PyTorch MNIST RNN Example (Bash)

Source: https://github.com/pytorch/examples/blob/main/mnist_rnn/README.md

This snippet provides commands to set up the environment and run the PyTorch MNIST RNN example. It includes installing dependencies and executing the main script, with an optional command to specify a GPU ID.

```bash
pip install -r requirements.txt
python main.py
# CUDA_VISIBLE_DEVICES=2 python main.py  # to specify GPU id to ex. 2
```

--------------------------------

### Running Synchronized Batch Update Parameter Server Example (Shell)

Source: https://github.com/pytorch/examples/blob/main/distributed/rpc/batch/README.md

This snippet provides the commands to set up and run the Synchronized Batch Update Parameter Server example. It installs necessary dependencies from `requirements.txt` and then executes the `parameter_server.py` script, which utilizes `@rpc.functions.async_execution` for parameter updates and retrieval.

```Shell
pip install -r requirements.txt
python parameter_server.py
```

--------------------------------

### Running Multi-Observer with Batch-Processing Agent Example (Shell)

Source: https://github.com/pytorch/examples/blob/main/distributed/rpc/batch/README.md

This snippet provides the commands to set up and run the Multi-Observer with Batch-Processing Agent example. It installs necessary dependencies from `requirements.txt` and then executes the `reinforce.py` script, which uses `@rpc.functions.async_execution` to process multiple observed states through a policy.

```Shell
pip install -r requirements.txt
python reinforce.py
```

--------------------------------

### Building the DCGAN Example with CMake and Make

Source: https://github.com/pytorch/examples/blob/main/cpp/dcgan/README.md

This snippet provides the shell commands required to build the DCGAN example using CMake and Make. It navigates into the `dcgan` directory, creates a build directory, configures the project with CMake by specifying the `LibTorch` installation path, and then compiles the project using Make. Prerequisites include a C++ compiler, CMake, and the PyTorch LibTorch distribution.

```shell
$ cd dcgan
$ mkdir build
$ cd build
$ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make
```

--------------------------------

### Building PyTorch C++ Frontend Example

Source: https://github.com/pytorch/examples/blob/main/cpp/custom-dataset/README.md

This snippet provides the shell commands required to build the custom dataset example using CMake and Make. It assumes `libtorch` is installed and its path is provided via `CMAKE_PREFIX_PATH`. Troubleshooting tips for OpenCV compatibility are also mentioned.

```Shell
$ cd custom-dataset
$ mkdir build
$ cd build
$ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make
```

--------------------------------

### Running the Distributed RNN Example

Source: https://github.com/pytorch/examples/blob/main/distributed/rpc/rnn/README.md

This snippet provides the commands to install necessary Python dependencies and then execute the main application script for the distributed RNN model example.

```Shell
pip install -r requirements.txt
python main.py
```

--------------------------------

### Setting Up Project Dependencies

Source: https://github.com/pytorch/examples/blob/main/language_translation/README.md

These commands install all project dependencies listed in `requirements.txt` and download a specified Spacy language model, providing a quick start for the language translation project.

```bash
pip install -r requirements.txt
python3 -m spacy download <language-you-want>
```

--------------------------------

### Installing Torchtext Library

Source: https://github.com/pytorch/examples/blob/main/language_translation/README.md

This command installs the Torchtext library, a dependency for handling text processing and datasets in the PyTorch language translation example.

```bash
pip install torchtext
```

--------------------------------

### Configuring AWS ParallelCluster

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/slurm/setup_pcluster_slurm.md

This command initiates the configuration process for AWS ParallelCluster, using `config.yaml` as the target configuration file. It guides the user through setting up cluster parameters, which can then be reviewed and modified in the specified YAML file.

```Shell
pcluster configure --config config.yaml
```

--------------------------------

### Running MNIST Example with PyTorch (Bash)

Source: https://github.com/pytorch/examples/blob/main/mnist/README.md

This snippet provides the necessary bash commands to install project dependencies from 'requirements.txt' and execute the main PyTorch MNIST script. It also includes an optional command to specify a particular GPU ID using the 'CUDA_VISIBLE_DEVICES' environment variable for execution.

```bash
pip install -r requirements.txt
python main.py
# CUDA_VISIBLE_DEVICES=2 python main.py  # to specify GPU id to ex. 2
```

--------------------------------

### Building and Installing OpenCV from Source on Linux

Source: https://github.com/pytorch/examples/blob/main/cpp/tools/InstallingOpenCV.md

This sequence of commands clones the OpenCV repositories, configures the build using CMake, compiles the project with parallel jobs, and installs it to the specified prefix. This is a standard procedure for building large C++ projects from source.

```shell
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git

cd opencv && mkdir build && cd build
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..
make -j8 # runs 8 jobs in parallel
sudo make install
```

--------------------------------

### Installing Spacy Language Models

Source: https://github.com/pytorch/examples/blob/main/language_translation/README.md

This command installs specific language models for Spacy, which are required for tokenization in the language translation example. It demonstrates how to download a generic language model, English, and German models.

```bash
python3 -m spacy download <language>
python3 -m spacy download en
python3 -m spacy download de
```

--------------------------------

### Cloning PyTorch GAT Example Repository (Bash)

Source: https://github.com/pytorch/examples/blob/main/gat/README.md

This command sequence clones the PyTorch examples repository from GitHub and navigates into the `examples/gat` directory, which contains the GAT model implementation. This is the initial step to set up the project locally.

```bash
git clone https://github.com/pytorch/examples.git
cd examples/gat
```

--------------------------------

### Example Output of PyTorch DDP Application Launch in Shell

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp/README.md

This shell output illustrates the console logs generated when launching a PyTorch DDP application. It shows the initialization of multiple process groups, each with its own rank and world size, and confirms the backend (NCCL) and device assignments for each process. This output helps verify the correct distributed setup.

```Shell
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
[238627] Initializing process group with: {'MASTER_ADDR': '127.0.0.1', 'MASTER_PORT': '29500', 'RANK': '0', 'WORLD_SIZE': '8'}
[238630] Initializing process group with: {'MASTER_ADDR': '127.0.0.1', 'MASTER_PORT': '29500', 'RANK': '3', 'WORLD_SIZE': '8'}
[238628] Initializing process group with: {'MASTER_ADDR': '127.0.0.1', 'MASTER_PORT': '29500', 'RANK': '1', 'WORLD_SIZE': '8'}
[238634] Initializing process group with: {'MASTER_ADDR': '127.0.0.1', 'MASTER_PORT': '29500', 'RANK': '7', 'WORLD_SIZE': '8'}
[238631] Initializing process group with: {'MASTER_ADDR': '127.0.0.1', 'MASTER_PORT': '29500', 'RANK': '4', 'WORLD_SIZE': '8'}
[238632] Initializing process group with: {'MASTER_ADDR': '127.0.0.1', 'MASTER_PORT': '29500', 'RANK': '5', 'WORLD_SIZE': '8'}
[238629] Initializing process group with: {'MASTER_ADDR': '127.0.0.1', 'MASTER_PORT': '29500', 'RANK': '2', 'WORLD_SIZE': '8'}
[238633] Initializing process group with: {'MASTER_ADDR': '127.0.0.1', 'MASTER_PORT': '29500', 'RANK': '6', 'WORLD_SIZE': '8'}
[238633] world_size = 8, rank = 6, backend=nccl
[238628] world_size = 8, rank = 1, backend=nccl
[238629] world_size = 8, rank = 2, backend=nccl
[238631] world_size = 8, rank = 4, backend=nccl
[238630] world_size = 8, rank = 3, backend=nccl
[238632] world_size = 8, rank = 5, backend=nccl
[238634] world_size = 8, rank = 7, backend=nccl
[238627] world_size = 8, rank = 0, backend=nccl
[238633] rank = 6, world_size = 8, n = 1, device_ids = [6]
[238628] rank = 1, world_size = 8, n = 1, device_ids = [1]
[238632] rank = 5, world_size = 8, n = 1, device_ids = [5]
[238634] rank = 7, world_size = 8, n = 1, device_ids = [7]
[238629] rank = 2, world_size = 8, n = 1, device_ids = [2]
[238630] rank = 3, world_size = 8, n = 1, device_ids = [3]
[238631] rank = 4, world_size = 8, n = 1, device_ids = [4]
[238627] rank = 0, world_size = 8, n = 1, device_ids = [0]
```

--------------------------------

### Building PyTorch C++ MNIST Example (Shell)

Source: https://github.com/pytorch/examples/blob/main/cpp/mnist/README.md

These shell commands navigate into the `mnist` directory, create a `build` directory, change into it, configure the build with CMake, specifying the path to the LibTorch distribution, and then compile the project using `make`.

```Shell
$ cd mnist
$ mkdir build
$ cd build
$ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make
```

--------------------------------

### Building PyTorch C++ Autograd Example

Source: https://github.com/pytorch/examples/blob/main/cpp/autograd/README.md

This snippet provides the shell commands required to build the PyTorch C++ autograd example. It involves navigating to the `autograd` directory, creating a build directory, configuring the project with CMake, and compiling it using `make`. The `CMAKE_PREFIX_PATH` must point to the unzipped LibTorch distribution.

```shell
cd autograd
mkdir build
cd build
cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
make
```

--------------------------------

### Launching PyTorch Distributed Training with `launch.py`

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp/README.md

This command executes a PyTorch distributed training script (`example.py`) using a launcher utility (`launch.py`). It configures a single node with one process per node, setting the local world size to one, simplifying the setup of distributed training environments.

```Shell
python /path/to/launch.py --nnode=1 --node_rank=0 --nproc_per_node=1 example.py --local_world_size=1
```

--------------------------------

### Building Linear Regression Example with CMake (Shell)

Source: https://github.com/pytorch/examples/blob/main/cpp/regression/README.md

This snippet provides the shell commands required to navigate into the regression example directory, create a build directory, configure the project with CMake, linking against the LibTorch distribution, and compile the project using `make`. The `/path/to/libtorch` placeholder should be replaced with the actual path to your unzipped LibTorch distribution.

```shell
$ cd regression
$ mkdir build
$ cd build
$ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make
```

--------------------------------

### Launching RPC Master Node (Rank 0)

Source: https://github.com/pytorch/examples/blob/main/distributed/rpc/parameter_server/README.md

This specific command launches the master node (server) for the RPC-based training example with a `WORLD_SIZE` of 2 and a `RANK` of 0. It should be run in a separate terminal window to initiate the distributed environment.

```Shell
python rpc_parameter_server.py --world_size=2 --rank=0
```

--------------------------------

### Executing Script on GPU with Accelerator (Bash)

Source: https://github.com/pytorch/examples/blob/main/siamese_network/README.md

This command demonstrates how to execute the `main.py` script utilizing a detected GPU by adding the `--accel` argument. This enables accelerated computation for the Siamese network example.

```bash
python main.py --accel
```

--------------------------------

### Command-Line Arguments for main.py (Bash)

Source: https://github.com/pytorch/examples/blob/main/vae/README.md

This snippet lists the optional command-line arguments available for customizing the execution of the 'main.py' script. These arguments allow users to control various training parameters such as batch size, number of epochs, hardware acceleration, random seed, and logging frequency.

```bash
--batch-size            input batch size for training (default: 128)
--epochs                number of epochs to train (default: 10)
--accel                 use accelerator
--seed                  random seed (default: 1)
--log-interval	        how many batches to wait before logging training status
```

--------------------------------

### Launching RPC Trainer Node (Rank 1)

Source: https://github.com/pytorch/examples/blob/main/distributed/rpc/parameter_server/README.md

This command launches a trainer node for the RPC-based training example with a `WORLD_SIZE` of 2 and a `RANK` of 1. It should be run in a separate terminal window to begin training with the server launched by the master node.

```Shell
python rpc_parameter_server.py --world_size=2 --rank=1
```

--------------------------------

### Running PyTorch Python Examples Locally (Shell)

Source: https://github.com/pytorch/examples/blob/main/CONTRIBUTING.md

This command executes the `run_python_examples.sh` script, ensuring it runs within a specified virtual environment (`.venv`). This is crucial for verifying bug fixes and ensuring all tests pass locally before submitting a pull request.

```Shell
VIRTUAL_ENV=.venv ./run_python_examples.sh
```

--------------------------------

### Executing PyTorch C++ MNIST Model Training (Shell)

Source: https://github.com/pytorch/examples/blob/main/cpp/mnist/README.md

This shell command executes the compiled `mnist` binary to start the model training process. The output shows the training progress across multiple epochs, including loss and accuracy metrics for both training and test sets.

```Shell
$ ./mnist
Train Epoch: 1 [59584/60000] Loss: 0.4232
Test set: Average loss: 0.1989 | Accuracy: 0.940
Train Epoch: 2 [59584/60000] Loss: 0.1926
Test set: Average loss: 0.1338 | Accuracy: 0.959
Train Epoch: 3 [59584/60000] Loss: 0.1390
Test set: Average loss: 0.0997 | Accuracy: 0.969
Train Epoch: 4 [59584/60000] Loss: 0.1239
Test set: Average loss: 0.0875 | Accuracy: 0.972
...
```

--------------------------------

### Listing AWS ParallelClusters

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/slurm/setup_pcluster_slurm.md

This command lists all AWS ParallelClusters associated with the current AWS account. It is used to track the status and details of existing clusters, including those currently being created or updated.

```Shell
pcluster list-clusters
```

--------------------------------

### Creating an AWS ParallelCluster

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/slurm/setup_pcluster_slurm.md

This command creates a new AWS ParallelCluster named `dist-ml` based on the settings defined in `config.yaml`. It provisions the necessary AWS resources, including compute instances and networking, to form the cluster.

```Shell
pcluster create-cluster --cluster-name dist-ml --cluster-configuration config.yaml
```

--------------------------------

### Executing DCGAN Training with Default Epochs

Source: https://github.com/pytorch/examples/blob/main/cpp/dcgan/README.md

This command executes the compiled DCGAN binary to start the training process. By default, it trains for 30 epochs, displaying loss values for the discriminator (D_loss) and generator (G_loss) at regular intervals, along with checkpoint indicators. This requires the `dcgan` binary to be successfully built and located in the current directory.

```shell
$ ./dcgan
[ 1/30][200/938] D_loss: 0.4953 | G_loss: 4.0195
-> checkpoint 1
[ 1/30][400/938] D_loss: 0.3610 | G_loss: 4.8148
-> checkpoint 2
[ 1/30][600/938] D_loss: 0.4072 | G_loss: 4.36760
-> checkpoint 3
[ 1/30][800/938] D_loss: 0.4444 | G_loss: 4.0250
-> checkpoint 4
[ 2/30][200/938] D_loss: 0.3761 | G_loss: 3.8790
-> checkpoint 5
[ 2/30][400/938] D_loss: 0.3977 | G_loss: 3.3315
-> checkpoint 6
[ 2/30][600/938] D_loss: 0.3815 | G_loss: 3.5696
-> checkpoint 7
[ 2/30][800/938] D_loss: 0.4039 | G_loss: 3.2759
-> checkpoint 8
[ 3/30][200/938] D_loss: 0.4236 | G_loss: 4.5132
-> checkpoint 9
[ 3/30][400/938] D_loss: 0.3645 | G_loss: 3.9759
-> checkpoint 10
...
```

--------------------------------

### Executing Linear Regression Example and Observing Output (Shell)

Source: https://github.com/pytorch/examples/blob/main/cpp/regression/README.md

This snippet shows how to execute the compiled linear regression binary and provides an example of the expected output. The output includes the final loss after a certain number of batches and a comparison between the learned polynomial function and the actual target function, demonstrating the model's accuracy.

```shell
$ ./regression
Loss: 0.000301158 after 584 batches
==> Learned function:	y = 11.6441 x^4 -3.10164 x^3 2.19786 x^2 -3.83606 x^1 + 4.37066
==> Actual function:	y = 11.669 x^4 -3.16023 x^3 2.19182 x^2 -3.81505 x^1 + 4.38219
...
```

--------------------------------

### Installing Optional Dependencies for OpenCV on Linux

Source: https://github.com/pytorch/examples/blob/main/cpp/tools/InstallingOpenCV.md

This command installs optional libraries that enhance OpenCV's functionality, such as Python bindings, TBB for parallel processing, and support for various image formats (JPEG, PNG, TIFF). These are highly recommended for a full-featured OpenCV installation.

```shell
sudo apt-get install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev
```

--------------------------------

### Executing PyTorch C++ Frontend Training

Source: https://github.com/pytorch/examples/blob/main/cpp/custom-dataset/README.md

This snippet shows how to execute the compiled binary for training the model and provides an example of the console output during the training process, including loss and accuracy metrics. The output indicates the device being used (e.g., CUDA) and progress per epoch.

```Shell
./custom-dataset
Running on: CUDA
Train Epoch: 1 16/7281	Loss: 0.314655	Acc: 0
Train Epoch: 1 176/7281	Loss: 0.532111	Acc: 0.0681818
Train Epoch: 1 336/7281	Loss: 0.538482	Acc: 0.0714286
Train Epoch: 1 496/7281	Loss: 0.535302	Acc: 0.0705645
Train Epoch: 1 656/7281	Loss: 0.536113	Acc: 0.0716463
Train Epoch: 1 816/7281	Loss: 0.537626	Acc: 0.0784314
Train Epoch: 1 976/7281	Loss: 0.537055	Acc: 0.079918
...
```

--------------------------------

### Command-Line Arguments for main.py (Bash)

Source: https://github.com/pytorch/examples/blob/main/mnist_forward_forward/README.md

This section lists the optional command-line arguments accepted by the `main.py` script, allowing users to customize training parameters such as epochs, learning rate, random seed, dataset sizes, and logging intervals. These arguments control the behavior and performance of the Forward-Forward algorithm training process.

```Bash
optional arguments:
  -h, --help            show this help message and exit
  --epochs EPOCHS       number of epochs to train (default: 1000)
  --lr LR               learning rate (default: 0.03)
  --no_accel            disables accelerator
  --seed SEED           random seed (default: 1)
  --save_model          For saving the current Model
  --train_size TRAIN_SIZE
                        size of training set
  --threshold THRESHOLD
                        threshold for training
  --test_size TEST_SIZE
                        size of test set
  --save-model          For Saving the current Model
  --log-interval LOG_INTERVAL
                        logging training status interval
```

--------------------------------

### Configuring AWS ParallelCluster (Shell)

Source: https://github.com/pytorch/examples/blob/main/distributed/minGPT-ddp/mingpt/slurm/setup_pcluster_slurm.md

This command initiates the configuration process for AWS ParallelCluster, prompting the user to define cluster settings and generating a `config.yaml` file. It's crucial to have a valid EC2 key-pair file for secure access to the cluster.

```Shell
pcluster configure --config config.yaml
```

--------------------------------

### Listing AWS ParallelClusters (Shell)

Source: https://github.com/pytorch/examples/blob/main/distributed/minGPT-ddp/mingpt/slurm/setup_pcluster_slurm.md

This command lists all active AWS ParallelClusters associated with the current AWS account. It is useful for monitoring the status of cluster creation or verifying the existence and state of deployed clusters.

```Shell
pcluster list-clusters
```

--------------------------------

### Creating AWS ParallelCluster (Shell)

Source: https://github.com/pytorch/examples/blob/main/distributed/minGPT-ddp/mingpt/slurm/setup_pcluster_slurm.md

This command creates an AWS ParallelCluster named 'dist-ml' based on the specifications in the `config.yaml` file. It provisions all necessary AWS resources, including compute instances and networking, to form the cluster.

```Shell
pcluster create-cluster --cluster-name dist-ml --cluster-configuration config.yaml
```

--------------------------------

### SSH into ParallelCluster Head Node

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/slurm/setup_pcluster_slurm.md

This command establishes an SSH connection to the head node of the `dist-ml` cluster. The `-i` flag specifies the private key file (`your-keyname-file`) required for authentication, allowing remote access to the cluster's primary control instance.

```Shell
pcluster ssh --cluster-name dist-ml -i your-keyname-file
```

--------------------------------

### Installing PyTorch GAT Dependencies (Bash)

Source: https://github.com/pytorch/examples/blob/main/gat/README.md

This command installs all necessary Python packages and libraries required to run the PyTorch GAT model. Dependencies are listed in the `requirements.txt` file, ensuring the correct environment for model execution.

```bash
pip install -r requirements.txt
```

--------------------------------

### Showcasing DCP API with FSDP2 - Bash

Source: https://github.com/pytorch/examples/blob/main/distributed/FSDP2/README.md

This command runs the FSDP2 training script to demonstrate the Distributed Checkpointing (DCP) API. It uses `torchrun` with 2 processes per node and includes the `--dcp-api` flag to activate and showcase the functionality of the DCP API.

```Bash
torchrun --nproc_per_node 2 train.py --dcp-api
```

--------------------------------

### Running REINFORCE Algorithm - Bash

Source: https://github.com/pytorch/examples/blob/main/reinforcement_learning/README.md

Executes the `reinforce.py` script, which implements the REINFORCE algorithm for reinforcement learning. This script trains a model using the specified algorithm.

```bash
python reinforce.py
```

--------------------------------

### Running FSDP2 on Transformer Model - Bash

Source: https://github.com/pytorch/examples/blob/main/distributed/FSDP2/README.md

This command sequence navigates to the FSDP2 example directory and then executes the `train.py` script using `torchrun` with 2 processes per node. The first run creates and saves state dictionaries to a 'checkpoints' folder, while subsequent runs load from these checkpoints.

```Bash
cd distributed/FSDP2
torchrun --nproc_per_node 2 train.py
```

--------------------------------

### Installing OpenCV on Arch Linux using Pacman

Source: https://github.com/pytorch/examples/blob/main/cpp/tools/InstallingOpenCV.md

This command installs OpenCV and essential development tools on Arch Linux using the `pacman` package manager. It ensures all necessary dependencies for building and running applications with OpenCV are met.

```shell
pacman -Syu base-devel opencv
```

--------------------------------

### Building Distributed MNIST Example with CMake and LibTorch (Shell)

Source: https://github.com/pytorch/examples/blob/main/cpp/distributed/README.md

These shell commands compile the `dist-mnist.cpp` example. It involves navigating to the `distributed` directory, creating a `build` directory, configuring CMake with the LibTorch path, and then building the project using `make`. A custom-compiled LibTorch with MPI headers is required for this example.

```Shell
$ cd distributed
$ mkdir build
$ cd build
$ cmake -DCMAKE_PREFIX_PATH=/path/to/libtorch ..
$ make
```

--------------------------------

### Running RPC Parameter Server Worker

Source: https://github.com/pytorch/examples/blob/main/distributed/rpc/parameter_server/README.md

This command launches a worker for the RPC-based distributed training example. It requires specifying the total `WORLD_SIZE` and the unique `RANK` of the current worker. This command is used for both server and trainer processes.

```Shell
python rpc_parameter_server.py --world_size=WORLD_SIZE --rank=RANK
```

--------------------------------

### Installing OpenCV on Fedora using DNF

Source: https://github.com/pytorch/examples/blob/main/cpp/tools/InstallingOpenCV.md

This command installs OpenCV and its development files on Fedora using the `dnf` package manager. The `opencv-dev` package provides header files and libraries required for compiling applications against OpenCV.

```shell
sudo dnf install opencv opencv-dev
```

--------------------------------

### Deactivating Python Virtual Environment (Shell)

Source: https://github.com/pytorch/examples/blob/main/CONTRIBUTING.md

This command deactivates the currently active Python virtual environment, returning the shell to its system-wide Python installation. It should be run after completing work within the virtual environment.

```Shell
deactivate
```

--------------------------------

### SSH into Cluster Headnode (Shell)

Source: https://github.com/pytorch/examples/blob/main/distributed/minGPT-ddp/mingpt/slurm/setup_pcluster_slurm.md

This command establishes an SSH connection to the head node of the specified AWS ParallelCluster. It requires the cluster name and the path to your EC2 key-pair file for secure authentication and access.

```Shell
pcluster ssh --cluster-name dist-ml -i your-keypair-file
```

--------------------------------

### Running Actor-Critic Algorithm - Bash

Source: https://github.com/pytorch/examples/blob/main/reinforcement_learning/README.md

Executes the `actor_critic.py` script, which implements the Actor-Critic algorithm for reinforcement learning. This script trains a model using the specified algorithm.

```bash
python actor_critic.py
```

--------------------------------

### Executing PyTorch C++ Autograd Examples

Source: https://github.com/pytorch/examples/blob/main/cpp/autograd/README.md

This snippet shows the command to execute the compiled PyTorch C++ autograd binary and its expected output. The output demonstrates various autograd functionalities, including basic operations, higher-order gradient computations, and the use of custom autograd functions, showcasing tensor values and their gradients.

```shell
./autograd
====== Running: "Basic autograd operations" ======
 1  1
 1  1
[ CPUFloatType{2,2} ]
 3  3
 3  3
[ CPUFloatType{2,2} ]
AddBackward1
 27  27
 27  27
[ CPUFloatType{2,2} ]
MulBackward1
27
[ CPUFloatType{} ]
MeanBackward0
false
true
SumBackward0
 4.5000  4.5000
 4.5000  4.5000
[ CPUFloatType{2,2} ]
  813.6625
 1015.0142
 -664.8849
[ CPUFloatType{3} ]
MulBackward1
  204.8000
 2048.0000
    0.2048
[ CPUFloatType{3} ]
true
true
false
true
false
true

====== Running "Computing higher-order gradients in C++" ======
 0.0025  0.0946  0.1474  0.1387
 0.0238 -0.0018  0.0259  0.0094
 0.0513 -0.0549 -0.0604  0.0210
[ CPUFloatType{3,4} ]

====== Running "Using custom autograd function in C++" ======
-3.5513  3.7160  3.6477
-3.5513  3.7160  3.6477
[ CPUFloatType{2,3} ]
 0.3095  1.4035 -0.0349
 0.3095  1.4035 -0.0349
 0.3095  1.4035 -0.0349
 0.3095  1.4035 -0.0349
[ CPUFloatType{4,3} ]
 5.5000
 5.5000
[ CPUFloatType{2} ]
```

--------------------------------

### Training and Generation Commands for Language Models

Source: https://github.com/pytorch/examples/blob/main/word_language_model/README.md

This snippet provides common command-line examples for training language models using `main.py` and generating text using `generate.py`. It demonstrates how to specify model type (LSTM, Transformer), enable CUDA, set epochs, and tie weights for training, as well as how to run the text generation script.

```Bash
python main.py --cuda --epochs 6           # Train a LSTM on Wikitext-2 with CUDA.
python main.py --cuda --epochs 6 --tied    # Train a tied LSTM on Wikitext-2 with CUDA.
python main.py --cuda --tied               # Train a tied LSTM on Wikitext-2 with CUDA for 40 epochs.
python main.py --cuda --epochs 6 --model Transformer --lr 5
                                           # Train a Transformer model on Wikitext-2 with CUDA.

python generate.py                         # Generate samples from the default model checkpoint.
```

--------------------------------

### Installing Required Build Dependencies for OpenCV on Linux

Source: https://github.com/pytorch/examples/blob/main/cpp/tools/InstallingOpenCV.md

This command installs essential build tools and libraries required to compile OpenCV from source on Debian/Ubuntu-based systems. It includes compilers, build systems, and multimedia libraries necessary for the build process.

```shell
sudo apt-get install build-essential cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
```

--------------------------------

### Optimized Training Configurations for Language Models

Source: https://github.com/pytorch/examples/blob/main/word_language_model/README.md

This snippet provides examples of command-line arguments for `main.py` that are known to produce slower but better-performing language models. It demonstrates configurations with increased embedding and hidden unit sizes, higher dropout rates, and extended training epochs, both with and without tied weights, all utilizing CUDA.

```Bash
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied
```

--------------------------------

### Command-Line Arguments for PyTorch MNIST Hogwild Script (Bash)

Source: https://github.com/pytorch/examples/blob/main/mnist_hogwild/README.md

This snippet lists the optional command-line arguments available for the `main.py` script, allowing users to customize training parameters such as batch size, epochs, learning rate, and enable CUDA or MPS training.

```bash
optional arguments:
  -h, --help            show this help message and exit
  --batch_size          input batch_size for training (default:64)
  --testing_batch_size  input batch size for testing (default: 1000)
  --epochs EPOCHS       number of epochs to train (default: 1000)
  --lr LR               learning rate (default: 0.03)
  --momentum            SGD momentum (default: 0.5)
  --seed SEED           random seed (default: 1)
  --mps                 enables macos GPU training
  --save_model          For saving the current Model
  --log_interval        how many batches to wait before logging training status
  --num_process         how many training processes to use (default: 2)
  --cuda                enables CUDA training
  --dry-run             quickly check a single pass
  --save-model          For Saving the current Model
```

--------------------------------

### Launching PyTorch DDP Application with launch.py (Multi-process) in Shell

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp/README.md

This shell command demonstrates how to launch a PyTorch DDP application using `launch.py`. It configures a single node with 8 GPUs, running one process per GPU, and explicitly passes `local_world_size=8` to the `example.py` script. This setup is typical for distributing training across multiple GPUs on a single machine.

```Shell
python /path/to/launch.py --nnode=1 --node_rank=0 --nproc_per_node=8 example.py --local_world_size=8
```

--------------------------------

### Customizing Execution with Command-Line Arguments (Bash)

Source: https://github.com/pytorch/examples/blob/main/siamese_network/README.md

This snippet lists various command-line arguments available for customizing the execution of the `main.py` script. These arguments control parameters such as batch sizes for training and testing, number of epochs, learning rate, gamma for learning rate step, use of an accelerator, dry-run mode, random seed, logging interval, and model saving.

```bash
--batch-size            input batch size for training (default: 64)
--test-batch-size       input batch size for testing (default: 1000)
--epochs                number of epochs to train (default: 14)
--lr                    learning rate (default: 1.0)
--gamma                 learning rate step gamma (default: 0.7)
--accel                 use accelerator
--dry-run               quickly check a single pass
--seed                  random seed (default: 1)
--log-interval          how many batches to wait before logging training status
--save-model            Saving the current Model
```

--------------------------------

### Command-Line Arguments for PyTorch MNIST RNN (Bash)

Source: https://github.com/pytorch/examples/blob/main/mnist_rnn/README.md

This snippet lists the available command-line arguments for configuring the PyTorch MNIST RNN training and testing process. It details parameters such as batch size, epochs, learning rate, and options for saving the model or enabling accelerators.

```bash
optional arguments:
  -h, --help            show this help message and exit
  --batch_size          input batch_size for training (default:64)
  --testing_batch_size  input batch size for testing (default: 1000)
  --epochs EPOCHS       number of epochs to train (default: 14)
  --lr LR               learning rate (default: 0.1)
  --gamma               learning rate step gamma (default: 0.7)
  --accel               enables accelerator
  --seed SEED           random seed (default: 1)
  --save_model          For saving the current Model
  --log_interval        how many batches to wait before logging training status
  --dry-run             quickly check a single pass
```

--------------------------------

### Enabling Explicit Prefetching for FSDP2 - Bash

Source: https://github.com/pytorch/examples/blob/main/distributed/FSDP2/README.md

This command runs the FSDP2 training script with explicit prefetching enabled. It uses `torchrun` with 2 processes per node and passes the `--explicit-prefetch` flag to optimize data loading.

```Bash
torchrun --nproc_per_node 2 train.py --explicit-prefetch
```

--------------------------------

### Training with Dummy Data for Benchmarking in PyTorch

Source: https://github.com/pytorch/examples/blob/main/imagenet/README.md

This command runs the training script using dummy data instead of the full ImageNet dataset. This is useful for quick setup, testing, and benchmarking training speed, though the resulting loss and accuracy will not be meaningful.

```bash
python main.py -a resnet18 --dummy
```

--------------------------------

### Executing DCGAN Training with Custom Epochs

Source: https://github.com/pytorch/examples/blob/main/cpp/dcgan/README.md

This snippet demonstrates how to run the DCGAN training script and specify a custom number of training epochs using the `--epochs` flag. In this example, the model will train for 10 epochs instead of the default 30. This command requires the `dcgan` binary to be compiled and accessible.

```shell
$ ./dcgan --epochs 10
```

--------------------------------

### Running Distributed MNIST Example with MPI (Shell)

Source: https://github.com/pytorch/examples/blob/main/cpp/distributed/README.md

This shell command executes the compiled `dist-mnist` program using `mpirun`. The `{NUM-PROCS}` placeholder should be replaced with the desired number of processes for distributed training, allowing the application to leverage multiple MPI ranks.

```Shell
mpirun -np {NUM-PROCS} ./dist-mnist
```

--------------------------------

### Displaying Generated DCGAN Samples with Python

Source: https://github.com/pytorch/examples/blob/main/cpp/dcgan/README.md

This command uses the `display_samples.py` Python script to visualize the image samples generated during the DCGAN training. It takes the path to a saved sample tensor file (e.g., `dcgan-sample-10.pt`) as input using the `-i` flag and outputs a plot image named `out.png`. This requires Python and the necessary libraries for `display_samples.py`.

```shell
$ python display_samples.py -i dcgan-sample-10.pt
Saved out.png
```

--------------------------------

### Parsing DDP Command-Line Arguments in Python

Source: https://github.com/pytorch/examples/blob/main/distributed/ddp/README.md

This Python snippet demonstrates how a PyTorch DDP application parses command-line arguments using `argparse`. It specifically handles `--local_rank` and `--local_world_size`, which are crucial for distributed training setup. These arguments are then passed to the `spmd_main` entrypoint.

```Python
if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--local_rank", type=int, default=0)
    parser.add_argument("--local_world_size", type=int, default=1)
    args = parser.parse_args()
    spmd_main(args.local_world_size, args.local_rank)
```

--------------------------------

### Enabling Mixed Precision for FSDP2 - Bash

Source: https://github.com/pytorch/examples/blob/main/distributed/FSDP2/README.md

This command executes the FSDP2 training script with mixed precision enabled. It utilizes `torchrun` with 2 processes per node and includes the `--mixed-precision` flag to leverage lower precision data types for potentially faster training and reduced memory usage.

```Bash
torchrun --nproc_per_node 2 train.py --mixed-precision
```