### Install Basic Dependencies for GPT-NeoX (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

Installs the core Python dependencies listed in `requirements.txt` using pip and optionally compiles the fused kernels using the provided setup script. This step is required after setting up the Python environment.

```Bash
pip install -r requirements/requirements.txt
python ./megatron/fused_kernels/setup.py install # optional if not using fused kernels
```

--------------------------------

### Installing Cog CLI (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/docs/cog_setup.md

This script downloads the latest Cog CLI binary for the current system architecture and makes it executable. It requires `curl` and `sudo` to be installed on the system.

```bash
sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`

sudo chmod +x /usr/local/bin/cog
```

--------------------------------

### Example Command for Pretokenizing Data

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

Provides a concrete example of how to execute the `preprocess_data.py` script to pretokenize a custom dataset. It demonstrates specifying input/output files, vocab/merge files, dataset implementation (`mmap`), tokenizer type (`GPT2BPETokenizer`), and appending the end-of-document token.

```Bash
python tools/preprocess_data.py \
            --input ./data/mydataset.jsonl.zst \
            --output-prefix ./data/mydataset \
            --vocab ./data/gpt2-vocab.json \
            --merge-file gpt2-merges.txt \
            --dataset-impl mmap \
            --tokenizer-type GPT2BPETokenizer \
            --append-eod
```

--------------------------------

### Verify AITemplate Dependencies Python

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md

This snippet shows how to import and check the installed versions of the required Python libraries: `transformers`, `diffusers`, and `torch`. It verifies that the installed versions are compatible with the tested versions for the AITemplate Stable Diffusion example.

```Python
>>> import transformers
>>> transformers.__version__
'4.21.2'
>>> import diffusers
>>> diffusers.__version__
'0.3.0'
>>> import torch
>>> torch.__version__
'1.12.1+cu116'
```

--------------------------------

### Installing Docker and NVIDIA Container Toolkit on Ubuntu (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/docs/cog_setup.md

This script provides commands to install Docker Engine, Docker CLI, containerd, and the Docker Compose plugin on an Ubuntu system. It also includes steps to install the NVIDIA Container Toolkit required for GPU acceleration within containers. Requires `sudo` access and internet connectivity.

```bash
sudo apt-get update

sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release

sudo mkdir -p /etc/apt/keyrings

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
```

--------------------------------

### Download Stable Diffusion Weights using Cog (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/StableDiffusion-v2/cog_example/README.md

This command uses the `cog run` utility to execute a script that downloads the pre-trained weights required for the Stable Diffusion v2 model. This is a necessary setup step before running predictions.

```Shell
cog run script/download-weights
```

--------------------------------

### Run AITemplate Img2Img Demo Shell

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md

This command executes the image-to-image demo using the compiled AITemplate Stable Diffusion models. It requires a Hugging Face access token and processes an example input image to generate an output.

```Shell
python3 examples/05_stable_diffusion/demo_img2img.py --token ACCESS_TOKEN
```

--------------------------------

### Example Optimizer Config (YAML, YAML-compatible)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

An example optimizer configuration in a standard YAML format, including a comment and a boolean value ('False') that is valid YAML but requires modification to be strictly JSON-compatible.

```yaml
    # optimizer settings
   "optimizer": {
     "type": "OneBitAdam",
     "params": {
       "lr": 0.0001,
       "freeze_step": 23000,
       "betas": [0.9, 0.95],
       "cuda_aware": False,
       "comm_backend_name": "nccl"
     }

```

--------------------------------

### Verifying Library Versions (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md

Checks the installed versions of `transformers`, `diffusers`, and `torch` to ensure compatibility with the AITemplate Stable Diffusion example. Requires these libraries to be installed.

```Python
import transformers
print(transformers.__version__)
import diffusers
print(diffusers.__version__)
import torch
print(torch.__version__)
```

--------------------------------

### Run Human Pose-to-Image Gradio App (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Starts a Gradio application that uses a ControlNet model with human pose estimation. Users provide an input image, and the Openpose model detects the pose skeleton, which is then used by Stable Diffusion 1.5 to generate an image based on the pose and a text prompt.

```python
python gradio_pose2image.py
```

--------------------------------

### Verifying Library Versions (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md

Checks the installed versions of `transformers`, `diffusers`, and `torch` libraries to ensure compatibility with the AITemplate Stable Diffusion example. Requires these libraries to be installed.

```Python
import transformers
print(transformers.__version__)
import diffusers
print(diffusers.__version__)
import torch
print(torch.__version__)
```

--------------------------------

### Run AITemplate Stable Diffusion Demo Shell

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md

This command executes the text-to-image demo using the compiled AITemplate Stable Diffusion models. It requires a Hugging Face access token and generates an output image file (`example_ait.png`).

```Shell
python3 examples/05_stable_diffusion/demo.py --token ACCESS_TOKEN
```

--------------------------------

### Example deepy.py Command with Config Directory

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

Demonstrates using the `-d` option with `deepy.py` to specify a directory (`configs`) where configuration files are located. This command launches `train.py` using merged settings from `125M.yml` and `local_setup.yml` found within the specified directory.

```Bash
python ./deepy.py train.py -d configs 125M.yml local_setup.yml
```

--------------------------------

### Configuring OneBitAdam Optimizer in YAML

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Example configuration for the Deepspeed OneBitAdam optimizer, including required fields like freeze_step, cuda_aware, and comm_backend_name in addition to standard optimizer parameters.

```YAML
   "optimizer": {
     "type": "OneBitAdam",
     "params": {
       "lr": 0.0001,
       "freeze_step": 23000,
       "betas": [0.9, 0.95],
       "cuda_aware": false,
       "comm_backend_name": "nccl"
     }

```

--------------------------------

### Example GPT-NeoX Configuration (GPT3 Small)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Illustrates a complete YAML configuration file used to train a small ~160M parameter GPT model with GPT-NeoX, including parallelism, model, optimizer, zero optimization, data, activation checkpointing, regularization, precision, LR decay, and miscellaneous settings.

```YAML
# GPT-3 pretraining setup
{
   # parallelism settings ( you will want to change these based on your cluster setup, ideally scheduling pipeline stages
   # across the node boundaries )
   "pipe-parallel-size": 1,
   "model-parallel-size": 1,

   # model settings
   "num-layers": 12,
   "hidden-size": 768,
   "num-attention-heads": 12,
   "seq-length": 2048,
   "max-position-embeddings": 2048,
   "norm": "rmsnorm",
   "pos-emb": "none",
   "no-weight-tying": true,
    # this should provide some speedup but takes a while to build, set to true if desired
   "scaled-upper-triang-masked-softmax-fusion": false,
   "train-iters": 320000,

   # optimizer settings
   "optimizer": {
     "type": "Adam",
     "params": {
       "lr": 0.0006,
       "max_grad_norm": 1.0,
       "betas": [0.9, 0.95]
     }
   },
   # for all zero_optimization options, see https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training
   "zero_optimization": {
    "stage": 0,
    "allgather_partitions": True,
    "allgather_bucket_size": 500000000,
    "overlap_comm": True,
    "reduce_scatter": True,
    "reduce_bucket_size": 500000000,
    "contiguous_gradients": True,
  },

   # batch / data settings
   "train_micro_batch_size_per_gpu": 4,
   "gradient_accumulation_steps": 1,
   "data-impl": "mmap",
   "split": "949,50,1",

   # activation checkpointing
   "checkpoint-activations": true,
   "checkpoint-num-layers": 1,
   "partition-activations": true,
   "synchronize-each-layer": true,

   # regularization
   "gradient_clipping": 1.0,
   "weight-decay": 0,
   "hidden-dropout": 0,
   "attention-dropout": 0,

   # precision settings
   "fp16": {
     "enabled": true,
     "loss_scale": 0,
     "loss_scale_window": 1000,
     "hysteresis": 2,
     "min_loss_scale": 1
   },

   # lr decay settings
   "lr-decay-iters": 320000,
   "lr-decay-style": "cosine",
   "warmup": 0.01,

   # misc. training settings
   "distributed-backend": "nccl",
   "save-interval": 10000,
   "eval-interval": 1000,
   "eval-iters": 10,

   # logging
   "log-interval": 100,
   "steps_per_print": 10,
   "keep-last-n-checkpoints": 4,
   "wall_clock_breakdown": true,
}
```

--------------------------------

### Running AIT Img2Img Demo (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md

Executes the image-to-image demo script using the compiled AITemplate models. Requires a Hugging Face access token (`ACCESS_TOKEN`).

```Shell
python3 examples/05_stable_diffusion/demo_img2img.py --token ACCESS_TOKEN
```

--------------------------------

### Run Depth Map-to-Image Gradio App (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Starts a Gradio application that utilizes a ControlNet model guided by depth maps. This script uses Stable Diffusion 1.5 and processes full 512x512 depth maps (unlike some other models that use 64x64) to preserve more detail when generating images based on depth information and a text prompt.

```python
python gradio_depth2image.py
```

--------------------------------

### Running AIT Text-to-Image Demo (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md

Executes the text-to-image demo script using the compiled AITemplate models. Requires a Hugging Face access token (`ACCESS_TOKEN`). Generates an output image file named `example_ait.png`.

```Shell
python3 examples/05_stable_diffusion/demo.py --token ACCESS_TOKEN
```

--------------------------------

### Building and Tagging AITemplate Docker Container (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/README.md

This snippet provides the bash commands to clone the AITemplate repository, navigate into the directory, build the Docker image using the provided build script with the CUDA backend, and then tag the resulting image with a specific name for RunPod deployment.

```bash
git clone --recursive https://github.com/facebookincubator/AITemplate
cd AITemplate
./docker/build.sh cuda

docker tag ait:latest merrell/ait-sd-1-runpod:latest
```

--------------------------------

### Run Semantic Segmentation-to-Image Gradio App (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Launches a Gradio application for image generation using semantic segmentation maps. The application takes an input image, uses the Uniformer model to detect segmentations based on the ADE20K protocol, and then uses this information with Stable Diffusion 1.5 ControlNet to generate an output image guided by the segmentation.

```python
python gradio_seg2image.py
```

--------------------------------

### Benchmark AITemplate Stable Diffusion Shell

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md

This command runs a benchmark for the compiled AITemplate Stable Diffusion models. It initializes the model weights and measures performance metrics. An access token is required to load the models.

```Shell
python3 examples/05_stable_diffusion/benchmark.py --token ACCESS_TOKEN
```

--------------------------------

### Configuring Mixed Precision Training (YAML)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Example configuration for enabling fp16 mixed precision training in GPT-NeoX, following DeepSpeed's configuration style. Set 'enabled' to false for fp32 training.

```yaml
   "fp16": {
     "enabled": true,
     "loss_scale": 0,
     "loss_scale_window": 1000,
     "hysteresis": 2,
     "min_loss_scale": 1
   },
```

--------------------------------

### Configuring Adam Optimizer in YAML

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Example configuration for the Adam optimizer, specifying learning rate, gradient norm clipping, and beta values. This structure is similar to Deepspeed's optimizer configuration.

```YAML
  # optimizer settings
   "optimizer": {
     "type": "Adam",
     "params": {
       "lr": 0.0006,
       "max_grad_norm": 1.0,
       "betas": [0.9, 0.95]
     }
   }
```

--------------------------------

### Running Image-to-Image Demo (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md

Executes the `demo_img2img.py` script to perform an image-to-image generation task using the compiled AIT modules. Requires a Hugging Face access token (`ACCESS_TOKEN`).

```Shell
python3 examples/05_stable_diffusion/demo_img2img.py --token ACCESS_TOKEN
```

--------------------------------

### Running Text-to-Image Demo (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md

Runs the `demo.py` script to generate an image from a text prompt using the compiled AIT modules. Requires a Hugging Face access token (`ACCESS_TOKEN`). The output image is saved as `example_ait.png`.

```Shell
python3 examples/05_stable_diffusion/demo.py --token ACCESS_TOKEN
```

--------------------------------

### Install Development Dependencies with Pip

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/tests/README.md

Installs the required development dependencies for running tests and coverage analysis using pip and the provided requirements file.

```bash
pip install -r requirements/requirements-dev.txt
```

--------------------------------

### Compiling AIT Modules for Img2Img (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md

Compiles the AITemplate modules specifically for the Stable Diffusion img2img pipeline. Requires a Hugging Face access token (`ACCESS_TOKEN`).

```Shell
python3 examples/05_stable_diffusion/compile.py --img2img True --token ACCESS_TOKEN
```

--------------------------------

### Example Optimizer Config (YAML, JSON-compatible)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

The same optimizer configuration as the previous snippet, modified to be strictly JSON-compatible by removing the comment and changing the boolean value to lowercase ('false'). This format is required when using SLURM with 'srun'.

```yaml
   "optimizer": {
     "type": "OneBitAdam",
     "params": {
       "lr": 0.0001,
       "freeze_step": 23000,
       "betas": [0.9, 0.95],
       "cuda_aware": false,
       "comm_backend_name": "nccl"
     }

```

--------------------------------

### Building and Pushing Docker Image with Cog (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/docs/cog_setup.md

These commands build a Docker image using Cog, tag it with a specific name and version, and then push it to a Docker registry (e.g., Docker Hub). Requires Cog and Docker installed. Placeholder names `{model-name}` and `runpod` should be replaced with your specific details.

```bash
cog build -t ai-api-{model-name}
docker tag ai-api-{model-name} runpod/ai-api-{model-name}:latest
docker push runpod/ai-api-{model-name}:latest
```

--------------------------------

### Compiling AIT Modules (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md

Compiles the AITemplate modules for the CLIP, UNet, and VAE models used in the Stable Diffusion pipeline. Requires a Hugging Face access token (`ACCESS_TOKEN`) to download model weights. Generates `.so` files in `./tmp/` subdirectories.

```Shell
python3 examples/05_stable_diffusion/compile.py --token ACCESS_TOKEN
```

--------------------------------

### BibTeX Citation for ControlNet Paper

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Provides the standard BibTeX format citation for the research paper 'Adding Conditional Control to Text-to-Image Diffusion Models' by Lvmin Zhang and Maneesh Agrawala.

```BibTeX
@misc{zhang2023adding,
  title={Adding Conditional Control to Text-to-Image Diffusion Models}, 
  author={Lvmin Zhang and Maneesh Agrawala},
  year={2023},
  eprint={2302.05543},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```

--------------------------------

### Example Input Payload for RealESRGAN Function

Source: https://github.com/runpod/serverless-workers/blob/main/workers/Real-ESRGAN/README.md

This JSON object demonstrates the structure of the input payload required to invoke the RealESRGAN serverless function. It specifies the input image URL and the desired output format (zip in this case).

```JSON
{
  "input": {
    "data_url": "LINKTOZIP/LINKTOIMAGE",
        "output_type": "zip"
  }
}
```

--------------------------------

### Benchmarking AIT Modules (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md

Runs a benchmark script to evaluate the performance of the compiled AITemplate modules. Requires a Hugging Face access token (`ACCESS_TOKEN`) to initialize weights. This step is optional.

```Shell
python3 examples/05_stable_diffusion/benchmark.py --token ACCESS_TOKEN
```

--------------------------------

### Create Conda Environment for ControlNet

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

This command creates a new Conda environment named 'control' using the specifications provided in the 'environment.yaml' file. This is the first step in setting up the required dependencies for ControlNet.

```Shell
conda env create -f environment.yaml
```

--------------------------------

### Compiling AIT Modules (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md

Runs the `compile.py` script to build AITemplate modules for the CLIP, UNet, and VAE models. Requires a Hugging Face access token (`ACCESS_TOKEN`) to download model weights. Generates `.so` files in `./tmp` folders.

```Shell
python3 examples/05_stable_diffusion/compile.py --token ACCESS_TOKEN
```

--------------------------------

### Benchmarking AIT Modules (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md

Executes the `benchmark.py` script to evaluate the performance of the compiled AIT modules. Requires a Hugging Face access token (`ACCESS_TOKEN`) to initialize weights. This step is optional.

```Shell
python3 examples/05_stable_diffusion/benchmark.py --token ACCESS_TOKEN
```

--------------------------------

### Serve HTML Coverage Report Locally

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/tests/README.md

Starts a simple Python HTTP server to serve the generated HTML coverage report from the `htmlcov` directory on port 8000, allowing it to be viewed in a web browser.

```bash
python -m http.server --directory htmlcov 8000
```

--------------------------------

### Compile AITemplate Stable Diffusion Modules Shell

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md

This command compiles the AITemplate modules for the CLIP, UNet, and VAE models used in the Stable Diffusion pipeline. It requires a Hugging Face access token to download the necessary model weights. The compilation generates `.so` files in specific temporary directories.

```Shell
python3 examples/05_stable_diffusion/compile.py --token ACCESS_TOKEN
```

--------------------------------

### Train ControlNet Model (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/docs/train.md

Main script for training the ControlNet model using PyTorch Lightning. Configures model parameters, loads data, initializes the trainer, and starts the training process.

```Python
import pytorch_lightning as pl
from torch.utils.data import DataLoader
from tutorial_dataset import MyDataset
from cldm.logger import ImageLogger
from cldm.model import create_model, load_state_dict


# Configs
resume_path = './models/control_sd15_ini.ckpt'
batch_size = 4
logger_freq = 300
learning_rate = 1e-5
sd_locked = True
only_mid_control = False


# First use cpu to load models. Pytorch Lightning will automatically move it to GPUs.
model = create_model('./models/cldm_v15.yaml').cpu()
model.load_state_dict(load_state_dict(resume_path, location='cpu'))
model.learning_rate = learning_rate
model.sd_locked = sd_locked
model.only_mid_control = only_mid_control


# Misc
dataset = MyDataset()
dataloader = DataLoader(dataset, num_workers=0, batch_size=batch_size, shuffle=True)
logger = ImageLogger(batch_frequency=logger_freq)
trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger])


# Train!
trainer.fit(model, dataloader)
```

--------------------------------

### Configure Gradient Accumulation (PyTorch Lightning)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/docs/train.md

Example configuration for the PyTorch Lightning Trainer to enable gradient accumulation, allowing larger effective batch sizes with limited GPU memory.

```Python
trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger], accumulate_grad_batches=4)
```

--------------------------------

### Run Stable Diffusion Prediction using Cog (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/StableDiffusion-v2/cog_example/README.md

This command executes a prediction using the configured Cog model. It takes the input prompt via the `-i` flag, demonstrating how to generate an image based on a text description.

```Shell
cog predict -i prompt="monkey scuba diving"
```

--------------------------------

### Project Dependencies - Python Requirements

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/requirements/requirements.txt

Specifies the required Python packages and their minimum versions or source locations for the project to run correctly. These dependencies are typically installed using pip.

```Python Requirements
best_download
deepspeed
ftfy>=6.0.1
git+https://github.com/EleutherAI/lm_dataformat.git@4eec05349977071bf67fc072290b95e31c8dd836
huggingface_hub>=0.11.0
lm_eval>=0.3.0
mpi4py>=3.0.3
numpy>=1.22.0
pybind11>=2.6.2
regex
sentencepiece
six
tiktoken>=0.1.2
tokenizers>=0.12.1
transformers>=4.24.0
```

--------------------------------

### Compiling Img2img AIT Modules (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md

Runs the `compile.py` script with the `--img2img True` flag to build AITemplate modules optimized for image-to-image tasks. Requires a Hugging Face access token (`ACCESS_TOKEN`).

```Shell
python3 examples/05_stable_diffusion/compile.py --img2img True --token ACCESS_TOKEN
```

--------------------------------

### Compile AITemplate Img2Img Modules Shell

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md

This command compiles the AITemplate modules specifically for the Stable Diffusion image-to-image (img2img) pipeline. It uses the `--img2img True` flag and requires a Hugging Face access token to access the model weights.

```Shell
python3 examples/05_stable_diffusion/compile.py --img2img True --token ACCESS_TOKEN
```

--------------------------------

### Run Gradio App with Scribbles ControlNet

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Executes the Python script 'gradio_scribble2image.py' to launch a Gradio web interface. This app allows users to generate images with Stable Diffusion controlled by user-drawn scribble inputs.

```Shell
python gradio_scribble2image.py
```

--------------------------------

### Run Gradio App with Canny Edge ControlNet

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Executes the Python script 'gradio_canny2image.py' to launch a Gradio web interface. This app allows users to generate images using Stable Diffusion controlled by Canny edge detection input.

```Shell
python gradio_canny2image.py
```

--------------------------------

### Run Interactive Scribble-to-Image Gradio App (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Launches a Gradio-based interactive application for generating images from scribbles using a Stable Diffusion 1.5 ControlNet model. It allows users to draw on a canvas and generate an image based on the drawing and a text prompt.

```python
python gradio_scribble2image_interactive.py
```

--------------------------------

### Setting Data Path in GPT-NeoX Configuration

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

This is a snippet from a YAML configuration file used by GPT-NeoX. It sets the `data-path` parameter, which specifies the prefix for the tokenized data files (`.bin` and `.idx`) that the model should use for training or evaluation. The example shows the path for the tokenized `enwik8` dataset.

```YAML
"data-path": "./data/enwik8/enwik8_text_document",
```

--------------------------------

### Parallelism Settings in GPT-NeoX Config

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Defines the pipeline and model parallelism sizes within the GPT-NeoX configuration, typically set based on the cluster setup and network topology. These values determine how the model is distributed across GPUs.

```YAML
   "pipe-parallel-size": 1,
   "model-parallel-size": 1,
```

--------------------------------

### Defining Python Dependencies

Source: https://github.com/runpod/serverless-workers/blob/main/requirements.txt

This list specifies the exact versions of Python packages required for the project to run correctly. It is typically used with package managers like pip to install dependencies, ensuring a consistent environment.

```Python
accelerate==0.15.0
bitsandbytes==0.36.0
cog==0.6.1
diffusers==0.12.1
runpod==0.8.4
scipy==1.10.0
transformers==4.26.0
torch==1.13.1
torchvision==0.14.1
```

--------------------------------

### Run Gradio App with M-LSD Lines ControlNet

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Executes the Python script 'gradio_hough2image.py' to launch a Gradio web interface. This app enables image generation with Stable Diffusion controlled by M-LSD (Mobile-Line Segment Detection) straight line inputs.

```Shell
python gradio_hough2image.py
```

--------------------------------

### Activate ControlNet Conda Environment

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

This command activates the newly created Conda environment named 'control'. All subsequent commands related to running ControlNet scripts should be executed within this activated environment to ensure correct dependencies are used.

```Shell
conda activate control
```

--------------------------------

### Building Docker Image for Model (BASH)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/huggingface-transformers/README.md

This BASH command builds a Docker image. It uses the '--build-arg MODEL_NAME' to specify which model to include in the image, tags the image with a repository name, image name, and tag, and uses the current directory '.' as the build context. Requires Docker installed.

```BASH
docker build --build-arg MODEL_NAME={model name} -t repo/image_name:tag .
```

--------------------------------

### Run Fake Scribble-to-Image Gradio App (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Executes a Python script that runs a Gradio application for generating images from *synthesized* scribbles. This uses a ControlNet model based on scribbles but automatically generates the scribble input from an uploaded image instead of requiring manual drawing.

```python
python gradio_fake_scribble2image.py
```

--------------------------------

### Loading Fill50K Dataset with PyTorch Dataset (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/docs/train.md

This Python class `MyDataset` inherits from `torch.utils.data.Dataset` to load the Fill50K training data. It reads the `prompt.json` file to get image paths and prompts, then uses OpenCV (`cv2`) to load source and target images. Images are converted from BGR to RGB, and normalized to specific ranges ([0, 1] for source, [-1, 1] for target) before being returned as a dictionary item.

```python
import json
import cv2
import numpy as np

from torch.utils.data import Dataset


class MyDataset(Dataset):
    def __init__(self):
        self.data = []
        with open('./training/fill50k/prompt.json', 'rt') as f:
            for line in f:
                self.data.append(json.loads(line))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        item = self.data[idx]

        source_filename = item['source']
        target_filename = item['target']
        prompt = item['prompt']

        source = cv2.imread('./training/fill50k/' + source_filename)
        target = cv2.imread('./training/fill50k/' + target_filename)

        # Do not forget that OpenCV read images in BGR order.
        source = cv2.cvtColor(source, cv2.COLOR_BGR2RGB)
        target = cv2.cvtColor(target, cv2.COLOR_BGR2RGB)

        # Normalize source images to [0, 1].
        source = source.astype(np.float32) / 255.0

        # Normalize target images to [-1, 1].
        target = (target.astype(np.float32) / 127.5) - 1.0

        return dict(jpg=target, txt=prompt, hint=source)
```

--------------------------------

### Run Gradio App with HED Boundary ControlNet

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Executes the Python script 'gradio_hed2image.py' to launch a Gradio web interface. This app facilitates image generation using Stable Diffusion controlled by soft HED (Holistically-Nested Edge Detection) boundaries, suitable for tasks like recoloring.

```Shell
python gradio_hed2image.py
```

--------------------------------

### Run Normal Map-to-Image Gradio App (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md

Executes a Python script to run a Gradio application for image generation using normal maps. The application computes the normal map from a MiDaS depth map and a user-defined threshold, then uses this with Stable Diffusion 1.5 ControlNet to generate images that preserve geometric details based on the normal map and a text prompt.

```python
python gradio_normal2image.py
```

--------------------------------

### Downloading GPT-NeoX Slim Weights with Wget (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

This command uses `wget` to recursively download the slim weights for the GPT-NeoX-20B model from a public URL. It cuts directory levels, avoids creating host directories, rejects index files, and saves the files into a specified local directory (`20B_checkpoints`). These weights are suitable for inference or finetuning.

```bash
wget --cut-dirs=5 -nH -r --no-parent --reject "index.html*" https://the-eye.eu/public/AI/models/GPT-NeoX-20B/slim_weights/ -P 20B_checkpoints
```

--------------------------------

### Locking MI-250 GCD Frequency (Shell)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md

This shell command is used to lock the frequency of a specific Graphics Compute Die (GCD) on an AMD MI-250 GPU for performance benchmarking. The `-d x` flag specifies the GPU ID, and `--setperfdeterminism 1700` sets the performance state to a deterministic mode, likely fixing the clock speed.

```shell
rocm-smi -d x --setperfdeterminism 1700
```

--------------------------------

### Downloading GPT-NeoX Full Weights with Wget (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

This command uses `wget` to recursively download the full weights for the GPT-NeoX-20B model from a public URL. It cuts directory levels, avoids creating host directories, rejects index files, and saves the files into a specified local directory (`20B_checkpoints`). These weights include optimizer states and are significantly larger.

```bash
wget --cut-dirs=5 -nH -r --no-parent --reject "index.html*" https://the-eye.eu/public/AI/models/GPT-NeoX-20B/full_weights/ -P 20B_checkpoints
```

--------------------------------

### General Usage of deepy.py for Training

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

Shows the basic command structure for launching the `train.py` script using the `deepy.py` wrapper. This wrapper utilizes DeepSpeed's launcher to distribute training across multiple GPUs or nodes, accepting one or more configuration files.

```Bash
python ./deepy.py train.py [path/to/config1.yml] [path/to/config2.yml] ...
```

--------------------------------

### Launching GPT-NeoX Scripts with deepy.py

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

This is the general command structure for launching GPT-NeoX functionality (training, evaluation, generation) using the `deepy.py` wrapper around the `deepspeed` launcher. It requires specifying the main script to run (e.g., `train.py`, `evaluate.py`, `generate.py`) followed by one or more paths to YAML configuration files.

```Bash
./deepy.py [script.py] [./path/to/config_1.yml] [./path/to/config_2.yml] ... [./path/to/config_n.yml]
```

--------------------------------

### Configuring Data, Checkpoint, and Logging Paths/Intervals in YAML

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Configuration for data implementation, split ratios, data paths, vocabulary/merge files, save/load directories, tensorboard/log directories, and save/evaluation intervals.

```YAML
   "data-impl": "mmap",
   "split": "949,50,1",
   # Suggested data paths when using GPT-NeoX locally
   "data-path": "data/enwik8/enwik8_text_document",
   #"train-data-path": "data/enwik8/enwik8_text_document",
   #"test-data-path": "data/enwik8/enwik8_text_document",
   #"valid-data-path": "data/enwik8/enwik8_text_document",
   "vocab-file": "data/gpt2-vocab.json",
   "merge-file": "data/gpt2-merges.txt",
   "save": "checkpoints",
   "load": "checkpoints",
   "tensorboard-dir": "tensorboard",
   "log-dir": "logs",
   "save-interval": 10000,
   "eval-interval": 1000,
   "eval-iters": 10,

```

--------------------------------

### Usage for preprocess_data.py Script

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

Displays the command-line arguments and options available for the `preprocess_data.py` script, which is used to pretokenize custom datasets. It shows options for input/output paths, tokenizer type, vocab/merge files, dataset implementation, and runtime parameters.

```Shell
usage: preprocess_data.py [-h] --input INPUT [--jsonl-keys JSONL_KEYS [JSONL_KEYS ...]] [--num-docs NUM_DOCS] --tokenizer-type {HFGPT2Tokenizer,HFTokenizer,GPT2BPETokenizer,CharLevelTokenizer} [--vocab-file VOCAB_FILE] [--merge-file MERGE_FILE] [--append-eod] [--ftfy] --output-prefix OUTPUT_PREFIX
                          [--dataset-impl {lazy,cached,mmap}] [--workers WORKERS] [--log-interval LOG_INTERVAL]

optional arguments:
  -h, --help            show this help message and exit

input data:
  --input INPUT         Path to input jsonl files or lmd archive(s) - if using multiple archives, put them in a comma separated list
  --jsonl-keys JSONL_KEYS [JSONL_KEYS ...]
                        space separate listed of keys to extract from jsonl. Defa
  --num-docs NUM_DOCS   Optional: Number of documents in the input data (if known) for an accurate progress bar.

tokenizer:
  --tokenizer-type {HFGPT2Tokenizer,HFTokenizer,GPT2BPETokenizer,CharLevelTokenizer}
                        What type of tokenizer to use.
  --vocab-file VOCAB_FILE
                        Path to the vocab file
  --merge-file MERGE_FILE
                        Path to the BPE merge file (if necessary).
  --append-eod          Append an <eod> token to the end of a document.
  --ftfy                Use ftfy to clean text

output data:
  --output-prefix OUTPUT_PREFIX
                        Path to binary output file without suffix
  --dataset-impl {lazy,cached,mmap}
                        Dataset implementation to use. Default: mmap

runtimes:
  --workers WORKERS     Number of worker processes to launch
  --log-interval LOG_INTERVAL
                        Interval between progress updates
```

--------------------------------

### Cite GPT-NeoX-20B Model (BibTeX)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

BibTeX entry for citing the GPT-NeoX-20B model paper. Includes authors, title, booktitle, URL, and year.

```bibtex
@inproceedings{gpt-neox-20b,
  title={{GPT-NeoX-20B}: An Open-Source Autoregressive Language Model},
  author={Black, Sid and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and Pieler, Michael and Prashanth, USVSN Sai and Purohit, Shivanshu and Reynolds, Laria and Tow, Jonathan and Wang, Ben and Weinbach, Samuel},
  booktitle={Proceedings of the ACL Workshop on Challenges \& Perspectives in Creating Large Language Models},
  url={https://arxiv.org/abs/2204.06745},
  year={2022}
}
```

--------------------------------

### Evaluating GPT-NeoX Model with Evaluation Harness (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

This command executes the `evaluate.py` script using `deepy.py` to run model evaluation on specified downstream tasks. It requires configuration files (`your_configs.yml`) and a list of evaluation tasks provided via the `--eval_tasks` argument, referencing tasks available in the `lm-evaluation-harness` repository.

```bash
python ./deepy.py evaluate.py -d configs your_configs.yml --eval_tasks task1 task2 ... taskn
```

--------------------------------

### Configuring Learning Rate Scheduler in YAML

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Settings for controlling the learning rate decay over time, including decay iterations, decay style, and warmup percentage.

```YAML
   "lr-decay-iters": 320000,
   "lr-decay-style": "cosine",
   "warmup": 0.01,

```

--------------------------------

### Tokenizing enwik8 Dataset

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

This command executes the `prepare_data.py` script using Python to download and tokenize the `enwik8` dataset. The `-d ./data` argument specifies that the tokenized data should be saved to the `./data` directory. It uses the default tokenizer (GPT2 Tokenizer mentioned in the text).

```Bash
python prepare_data.py -d ./data
```

--------------------------------

### Cite GPT-NeoX Library (BibTeX)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

BibTeX entry for citing the GPT-NeoX library repository. Includes authors, title, URL, DOI, month, year, and version.

```bibtex
@software{gpt-neox-library,
  title = {{GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch}},
  author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
  url = {https://www.github.com/eleutherai/gpt-neox},
  doi = {10.5281/zenodo.5879544},
  month = {8},
  year = {2021},
  version = {0.0.1},
}
```

--------------------------------

### Upload Model to Hugging Face Hub (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

Commands to log in to the Hugging Face CLI and upload a model using the provided upload script. Requires a Hugging Face Hub user token for authentication.

```bash
huggingface-cli login
python ./tools/upload.py
```

--------------------------------

### Configuring Batch Size and Gradient Accumulation in YAML

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Settings for controlling the effective training batch size and gradient accumulation steps per GPU, following Deepspeed's configuration approach.

```YAML
   # batch / data settings
   "train_micro_batch_size_per_gpu": 4,
   "gradient_accumulation_steps": 1,

```

--------------------------------

### Testing PyTorch Dataset Loading (Python)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/docs/train.md

This script demonstrates how to use the `MyDataset` class. It creates an instance of the dataset, prints its total number of items using `len()`, and then accesses a specific item by index (1234). It prints the prompt (`txt`) and the shapes of the target image (`jpg`) and source image (`hint`) to verify successful loading and processing.

```python
from tutorial_dataset import MyDataset

dataset = MyDataset()
print(len(dataset))

item = dataset[1234]
jpg = item['jpg']
txt = item['txt']
hint = item['hint']
print(txt)
print(jpg.shape)
print(hint.shape)
```

--------------------------------

### Configuring SLURM Integration (YAML)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Specifies the necessary configuration settings to enable SLURM as the launcher and coordinate nodes when running GPT-NeoX on a SLURM cluster.

```yaml
    "launcher": "slurm",
    "deepspeed_slurm": true
```

--------------------------------

### Convert GPT-NeoX Checkpoint to Hugging Face Format (Bash)

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

Command to convert a GPT-NeoX model checkpoint to the Hugging Face Transformers GPTNeoXModel format using the provided conversion script. Requires specifying the input checkpoint directory, configuration file, and output directory.

```bash
python ./tools/convert_to_hf.py --input_dir /path/to/model/global_stepXXX --config_file your_config.yml --output_dir hf_model/save/location
```

--------------------------------

### Tokenizing Pile Subset with Custom Tokenizer

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

This command runs the `prepare_data.py` script to download and tokenize a single shard of the Pile dataset (`pile_subset`). It saves the output to `./data` (`-d ./data`), specifies the tokenizer type as `HFTokenizer` (`-t HFTokenizer`), and provides the path to a custom vocabulary file (`--vocab-file ./20B_checkpoints/20B_tokenizer.json`), likely for the GPT-NeoX-20B tokenizer.

```Bash
python prepare_data.py -d ./data -t HFTokenizer --vocab-file ./20B_checkpoints/20B_tokenizer.json pile_subset
```

--------------------------------

### Running GPT-NeoX Docker Container with GPUs

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

This command launches a Docker container based on the `gpt-neox` image. It uses `nvidia-docker` to expose GPUs 0-3, sets shared memory size to 1GB (`--shm-size=1g`), and disables memory locking limits (`--ulimit memlock=-1`) which are important for NCCL. It also mounts the current directory (`$PWD`) to `/gpt-neox` inside the container. The `--rm` flag removes the container after exit, and `-it` provides an interactive terminal.

```Bash
nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 --shm-size=1g --ulimit memlock=-1 --mount type=bind,src=$PWD,dst=/gpt-neox gpt-neox
```

--------------------------------

### Evaluating Model with deepy.py and lm-evaluation-harness

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md

This command runs the `evaluate.py` script using `deepy.py` and the `./configs/20B.yml` configuration. It uses the `--eval_tasks` argument to specify a list of evaluation tasks (e.g., `triviaqa`, `piqa`) from the `lm-evaluation-harness`.

```Bash
./deepy.py evaluate.py ./configs/20B.yml --eval_tasks triviaqa piqa
```

--------------------------------

### Configuring ZeRO Optimization in YAML

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md

Configuration settings for Deepspeed's ZeRO optimization, including stage, allgather/reduce settings, and contiguous gradients. Note the separate zero_allow_untested_optimizer flag.

```YAML
# for all zero_optimization options, see https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training
  "zero_optimization": {
        "stage": 0,
        "allgather_partitions": True,
        "allgather_bucket_size": 500000000,
        "overlap_comm": True,
        "reduce_scatter": True,
        "reduce_bucket_size": 500000000,
        "contiguous_gradients": True,
  },
  "zero_allow_untested_optimizer": false,

```

--------------------------------

### Prepare Test Data with Python Script

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/tests/README.md

Executes the `prepare_data.py` script to download or generate the necessary data required for running the project's tests.

```bash
python prepare_data.py
```

--------------------------------

### Listing Python Dependencies

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/requirements/requirements-dev.txt

This snippet provides a list of Python packages and their version constraints required for the project to run correctly. It is typically found in a requirements.txt file.

```Python
autopep8>=1.5.6
clang-format>=13.0.1
pre-commit>=2.17.0
pytest>=6.2.3
pytest-cov>=2.11.1
pytest-forked>=1.3.0
pytest-xdist
```

--------------------------------

### Configure Mup Settings in gpt-neox

Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README-MUP.md

This configuration block defines settings for enabling and controlling Mup (μP) behavior within the gpt-neox framework. It includes options for enabling Mup, saving base shapes for initialization, specifying the base shapes file path, enabling coordinate checks for verification, and setting hyperparameters for Mup tuning.

```Configuration
# mup

"use-mup": true,

"save-base-shapes": false, # this only needs to be enabled once in order to generate the base-shapes-file on each rank

"base-shapes-file": "base-shapes", # load base shapes from this file

"coord-check": false, # generate coord check plots to verify mup's implementation in neox

# mup hp search

"mup-init-scale": 1.0,

"mup-attn-temp": 1.0,

"mup-output-temp": 1.0,

"mup-embedding-mult": 1.0,

"mup-rp-embedding-mult": 1.0,
```