### Install Basic Dependencies for GPT-NeoX (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md Installs the core Python dependencies listed in `requirements.txt` using pip and optionally compiles the fused kernels using the provided setup script. This step is required after setting up the Python environment. ```Bash pip install -r requirements/requirements.txt python ./megatron/fused_kernels/setup.py install # optional if not using fused kernels ``` -------------------------------- ### Installing Cog CLI (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/docs/cog_setup.md This script downloads the latest Cog CLI binary for the current system architecture and makes it executable. It requires `curl` and `sudo` to be installed on the system. ```bash sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m` sudo chmod +x /usr/local/bin/cog ``` -------------------------------- ### Example Command for Pretokenizing Data Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md Provides a concrete example of how to execute the `preprocess_data.py` script to pretokenize a custom dataset. It demonstrates specifying input/output files, vocab/merge files, dataset implementation (`mmap`), tokenizer type (`GPT2BPETokenizer`), and appending the end-of-document token. ```Bash python tools/preprocess_data.py \ --input ./data/mydataset.jsonl.zst \ --output-prefix ./data/mydataset \ --vocab ./data/gpt2-vocab.json \ --merge-file gpt2-merges.txt \ --dataset-impl mmap \ --tokenizer-type GPT2BPETokenizer \ --append-eod ``` -------------------------------- ### Verify AITemplate Dependencies Python Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md This snippet shows how to import and check the installed versions of the required Python libraries: `transformers`, `diffusers`, and `torch`. It verifies that the installed versions are compatible with the tested versions for the AITemplate Stable Diffusion example. ```Python >>> import transformers >>> transformers.__version__ '4.21.2' >>> import diffusers >>> diffusers.__version__ '0.3.0' >>> import torch >>> torch.__version__ '1.12.1+cu116' ``` -------------------------------- ### Installing Docker and NVIDIA Container Toolkit on Ubuntu (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/docs/cog_setup.md This script provides commands to install Docker Engine, Docker CLI, containerd, and the Docker Compose plugin on an Ubuntu system. It also includes steps to install the NVIDIA Container Toolkit required for GPU acceleration within containers. Requires `sudo` access and internet connectivity. ```bash sudo apt-get update sudo apt-get install \ ca-certificates \ curl \ gnupg \ lsb-release sudo mkdir -p /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin # Install NVIDIA Container Toolkit distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit ``` -------------------------------- ### Download Stable Diffusion Weights using Cog (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/StableDiffusion-v2/cog_example/README.md This command uses the `cog run` utility to execute a script that downloads the pre-trained weights required for the Stable Diffusion v2 model. This is a necessary setup step before running predictions. ```Shell cog run script/download-weights ``` -------------------------------- ### Run AITemplate Img2Img Demo Shell Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md This command executes the image-to-image demo using the compiled AITemplate Stable Diffusion models. It requires a Hugging Face access token and processes an example input image to generate an output. ```Shell python3 examples/05_stable_diffusion/demo_img2img.py --token ACCESS_TOKEN ``` -------------------------------- ### Example Optimizer Config (YAML, YAML-compatible) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md An example optimizer configuration in a standard YAML format, including a comment and a boolean value ('False') that is valid YAML but requires modification to be strictly JSON-compatible. ```yaml # optimizer settings "optimizer": { "type": "OneBitAdam", "params": { "lr": 0.0001, "freeze_step": 23000, "betas": [0.9, 0.95], "cuda_aware": False, "comm_backend_name": "nccl" } ``` -------------------------------- ### Verifying Library Versions (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md Checks the installed versions of `transformers`, `diffusers`, and `torch` to ensure compatibility with the AITemplate Stable Diffusion example. Requires these libraries to be installed. ```Python import transformers print(transformers.__version__) import diffusers print(diffusers.__version__) import torch print(torch.__version__) ``` -------------------------------- ### Run Human Pose-to-Image Gradio App (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Starts a Gradio application that uses a ControlNet model with human pose estimation. Users provide an input image, and the Openpose model detects the pose skeleton, which is then used by Stable Diffusion 1.5 to generate an image based on the pose and a text prompt. ```python python gradio_pose2image.py ``` -------------------------------- ### Verifying Library Versions (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md Checks the installed versions of `transformers`, `diffusers`, and `torch` libraries to ensure compatibility with the AITemplate Stable Diffusion example. Requires these libraries to be installed. ```Python import transformers print(transformers.__version__) import diffusers print(diffusers.__version__) import torch print(torch.__version__) ``` -------------------------------- ### Run AITemplate Stable Diffusion Demo Shell Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md This command executes the text-to-image demo using the compiled AITemplate Stable Diffusion models. It requires a Hugging Face access token and generates an output image file (`example_ait.png`). ```Shell python3 examples/05_stable_diffusion/demo.py --token ACCESS_TOKEN ``` -------------------------------- ### Example deepy.py Command with Config Directory Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md Demonstrates using the `-d` option with `deepy.py` to specify a directory (`configs`) where configuration files are located. This command launches `train.py` using merged settings from `125M.yml` and `local_setup.yml` found within the specified directory. ```Bash python ./deepy.py train.py -d configs 125M.yml local_setup.yml ``` -------------------------------- ### Configuring OneBitAdam Optimizer in YAML Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Example configuration for the Deepspeed OneBitAdam optimizer, including required fields like freeze_step, cuda_aware, and comm_backend_name in addition to standard optimizer parameters. ```YAML "optimizer": { "type": "OneBitAdam", "params": { "lr": 0.0001, "freeze_step": 23000, "betas": [0.9, 0.95], "cuda_aware": false, "comm_backend_name": "nccl" } ``` -------------------------------- ### Example GPT-NeoX Configuration (GPT3 Small) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Illustrates a complete YAML configuration file used to train a small ~160M parameter GPT model with GPT-NeoX, including parallelism, model, optimizer, zero optimization, data, activation checkpointing, regularization, precision, LR decay, and miscellaneous settings. ```YAML # GPT-3 pretraining setup { # parallelism settings ( you will want to change these based on your cluster setup, ideally scheduling pipeline stages # across the node boundaries ) "pipe-parallel-size": 1, "model-parallel-size": 1, # model settings "num-layers": 12, "hidden-size": 768, "num-attention-heads": 12, "seq-length": 2048, "max-position-embeddings": 2048, "norm": "rmsnorm", "pos-emb": "none", "no-weight-tying": true, # this should provide some speedup but takes a while to build, set to true if desired "scaled-upper-triang-masked-softmax-fusion": false, "train-iters": 320000, # optimizer settings "optimizer": { "type": "Adam", "params": { "lr": 0.0006, "max_grad_norm": 1.0, "betas": [0.9, 0.95] } }, # for all zero_optimization options, see https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training "zero_optimization": { "stage": 0, "allgather_partitions": True, "allgather_bucket_size": 500000000, "overlap_comm": True, "reduce_scatter": True, "reduce_bucket_size": 500000000, "contiguous_gradients": True, }, # batch / data settings "train_micro_batch_size_per_gpu": 4, "gradient_accumulation_steps": 1, "data-impl": "mmap", "split": "949,50,1", # activation checkpointing "checkpoint-activations": true, "checkpoint-num-layers": 1, "partition-activations": true, "synchronize-each-layer": true, # regularization "gradient_clipping": 1.0, "weight-decay": 0, "hidden-dropout": 0, "attention-dropout": 0, # precision settings "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, # lr decay settings "lr-decay-iters": 320000, "lr-decay-style": "cosine", "warmup": 0.01, # misc. training settings "distributed-backend": "nccl", "save-interval": 10000, "eval-interval": 1000, "eval-iters": 10, # logging "log-interval": 100, "steps_per_print": 10, "keep-last-n-checkpoints": 4, "wall_clock_breakdown": true, } ``` -------------------------------- ### Running AIT Img2Img Demo (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md Executes the image-to-image demo script using the compiled AITemplate models. Requires a Hugging Face access token (`ACCESS_TOKEN`). ```Shell python3 examples/05_stable_diffusion/demo_img2img.py --token ACCESS_TOKEN ``` -------------------------------- ### Run Depth Map-to-Image Gradio App (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Starts a Gradio application that utilizes a ControlNet model guided by depth maps. This script uses Stable Diffusion 1.5 and processes full 512x512 depth maps (unlike some other models that use 64x64) to preserve more detail when generating images based on depth information and a text prompt. ```python python gradio_depth2image.py ``` -------------------------------- ### Running AIT Text-to-Image Demo (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md Executes the text-to-image demo script using the compiled AITemplate models. Requires a Hugging Face access token (`ACCESS_TOKEN`). Generates an output image file named `example_ait.png`. ```Shell python3 examples/05_stable_diffusion/demo.py --token ACCESS_TOKEN ``` -------------------------------- ### Building and Tagging AITemplate Docker Container (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/README.md This snippet provides the bash commands to clone the AITemplate repository, navigate into the directory, build the Docker image using the provided build script with the CUDA backend, and then tag the resulting image with a specific name for RunPod deployment. ```bash git clone --recursive https://github.com/facebookincubator/AITemplate cd AITemplate ./docker/build.sh cuda docker tag ait:latest merrell/ait-sd-1-runpod:latest ``` -------------------------------- ### Run Semantic Segmentation-to-Image Gradio App (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Launches a Gradio application for image generation using semantic segmentation maps. The application takes an input image, uses the Uniformer model to detect segmentations based on the ADE20K protocol, and then uses this information with Stable Diffusion 1.5 ControlNet to generate an output image guided by the segmentation. ```python python gradio_seg2image.py ``` -------------------------------- ### Benchmark AITemplate Stable Diffusion Shell Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md This command runs a benchmark for the compiled AITemplate Stable Diffusion models. It initializes the model weights and measures performance metrics. An access token is required to load the models. ```Shell python3 examples/05_stable_diffusion/benchmark.py --token ACCESS_TOKEN ``` -------------------------------- ### Configuring Mixed Precision Training (YAML) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Example configuration for enabling fp16 mixed precision training in GPT-NeoX, following DeepSpeed's configuration style. Set 'enabled' to false for fp32 training. ```yaml "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, ``` -------------------------------- ### Configuring Adam Optimizer in YAML Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Example configuration for the Adam optimizer, specifying learning rate, gradient norm clipping, and beta values. This structure is similar to Deepspeed's optimizer configuration. ```YAML # optimizer settings "optimizer": { "type": "Adam", "params": { "lr": 0.0006, "max_grad_norm": 1.0, "betas": [0.9, 0.95] } } ``` -------------------------------- ### Running Image-to-Image Demo (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md Executes the `demo_img2img.py` script to perform an image-to-image generation task using the compiled AIT modules. Requires a Hugging Face access token (`ACCESS_TOKEN`). ```Shell python3 examples/05_stable_diffusion/demo_img2img.py --token ACCESS_TOKEN ``` -------------------------------- ### Running Text-to-Image Demo (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md Runs the `demo.py` script to generate an image from a text prompt using the compiled AIT modules. Requires a Hugging Face access token (`ACCESS_TOKEN`). The output image is saved as `example_ait.png`. ```Shell python3 examples/05_stable_diffusion/demo.py --token ACCESS_TOKEN ``` -------------------------------- ### Install Development Dependencies with Pip Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/tests/README.md Installs the required development dependencies for running tests and coverage analysis using pip and the provided requirements file. ```bash pip install -r requirements/requirements-dev.txt ``` -------------------------------- ### Compiling AIT Modules for Img2Img (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md Compiles the AITemplate modules specifically for the Stable Diffusion img2img pipeline. Requires a Hugging Face access token (`ACCESS_TOKEN`). ```Shell python3 examples/05_stable_diffusion/compile.py --img2img True --token ACCESS_TOKEN ``` -------------------------------- ### Example Optimizer Config (YAML, JSON-compatible) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md The same optimizer configuration as the previous snippet, modified to be strictly JSON-compatible by removing the comment and changing the boolean value to lowercase ('false'). This format is required when using SLURM with 'srun'. ```yaml "optimizer": { "type": "OneBitAdam", "params": { "lr": 0.0001, "freeze_step": 23000, "betas": [0.9, 0.95], "cuda_aware": false, "comm_backend_name": "nccl" } ``` -------------------------------- ### Building and Pushing Docker Image with Cog (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/docs/cog_setup.md These commands build a Docker image using Cog, tag it with a specific name and version, and then push it to a Docker registry (e.g., Docker Hub). Requires Cog and Docker installed. Placeholder names `{model-name}` and `runpod` should be replaced with your specific details. ```bash cog build -t ai-api-{model-name} docker tag ai-api-{model-name} runpod/ai-api-{model-name}:latest docker push runpod/ai-api-{model-name}:latest ``` -------------------------------- ### Compiling AIT Modules (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md Compiles the AITemplate modules for the CLIP, UNet, and VAE models used in the Stable Diffusion pipeline. Requires a Hugging Face access token (`ACCESS_TOKEN`) to download model weights. Generates `.so` files in `./tmp/` subdirectories. ```Shell python3 examples/05_stable_diffusion/compile.py --token ACCESS_TOKEN ``` -------------------------------- ### BibTeX Citation for ControlNet Paper Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Provides the standard BibTeX format citation for the research paper 'Adding Conditional Control to Text-to-Image Diffusion Models' by Lvmin Zhang and Maneesh Agrawala. ```BibTeX @misc{zhang2023adding, title={Adding Conditional Control to Text-to-Image Diffusion Models}, author={Lvmin Zhang and Maneesh Agrawala}, year={2023}, eprint={2302.05543}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` -------------------------------- ### Example Input Payload for RealESRGAN Function Source: https://github.com/runpod/serverless-workers/blob/main/workers/Real-ESRGAN/README.md This JSON object demonstrates the structure of the input payload required to invoke the RealESRGAN serverless function. It specifies the input image URL and the desired output format (zip in this case). ```JSON { "input": { "data_url": "LINKTOZIP/LINKTOIMAGE", "output_type": "zip" } } ``` -------------------------------- ### Benchmarking AIT Modules (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion/README.md Runs a benchmark script to evaluate the performance of the compiled AITemplate modules. Requires a Hugging Face access token (`ACCESS_TOKEN`) to initialize weights. This step is optional. ```Shell python3 examples/05_stable_diffusion/benchmark.py --token ACCESS_TOKEN ``` -------------------------------- ### Create Conda Environment for ControlNet Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md This command creates a new Conda environment named 'control' using the specifications provided in the 'environment.yaml' file. This is the first step in setting up the required dependencies for ControlNet. ```Shell conda env create -f environment.yaml ``` -------------------------------- ### Compiling AIT Modules (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md Runs the `compile.py` script to build AITemplate modules for the CLIP, UNet, and VAE models. Requires a Hugging Face access token (`ACCESS_TOKEN`) to download model weights. Generates `.so` files in `./tmp` folders. ```Shell python3 examples/05_stable_diffusion/compile.py --token ACCESS_TOKEN ``` -------------------------------- ### Benchmarking AIT Modules (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md Executes the `benchmark.py` script to evaluate the performance of the compiled AIT modules. Requires a Hugging Face access token (`ACCESS_TOKEN`) to initialize weights. This step is optional. ```Shell python3 examples/05_stable_diffusion/benchmark.py --token ACCESS_TOKEN ``` -------------------------------- ### Serve HTML Coverage Report Locally Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/tests/README.md Starts a simple Python HTTP server to serve the generated HTML coverage report from the `htmlcov` directory on port 8000, allowing it to be viewed in a web browser. ```bash python -m http.server --directory htmlcov 8000 ``` -------------------------------- ### Compile AITemplate Stable Diffusion Modules Shell Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md This command compiles the AITemplate modules for the CLIP, UNet, and VAE models used in the Stable Diffusion pipeline. It requires a Hugging Face access token to download the necessary model weights. The compilation generates `.so` files in specific temporary directories. ```Shell python3 examples/05_stable_diffusion/compile.py --token ACCESS_TOKEN ``` -------------------------------- ### Train ControlNet Model (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/docs/train.md Main script for training the ControlNet model using PyTorch Lightning. Configures model parameters, loads data, initializes the trainer, and starts the training process. ```Python import pytorch_lightning as pl from torch.utils.data import DataLoader from tutorial_dataset import MyDataset from cldm.logger import ImageLogger from cldm.model import create_model, load_state_dict # Configs resume_path = './models/control_sd15_ini.ckpt' batch_size = 4 logger_freq = 300 learning_rate = 1e-5 sd_locked = True only_mid_control = False # First use cpu to load models. Pytorch Lightning will automatically move it to GPUs. model = create_model('./models/cldm_v15.yaml').cpu() model.load_state_dict(load_state_dict(resume_path, location='cpu')) model.learning_rate = learning_rate model.sd_locked = sd_locked model.only_mid_control = only_mid_control # Misc dataset = MyDataset() dataloader = DataLoader(dataset, num_workers=0, batch_size=batch_size, shuffle=True) logger = ImageLogger(batch_frequency=logger_freq) trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger]) # Train! trainer.fit(model, dataloader) ``` -------------------------------- ### Configure Gradient Accumulation (PyTorch Lightning) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/docs/train.md Example configuration for the PyTorch Lightning Trainer to enable gradient accumulation, allowing larger effective batch sizes with limited GPU memory. ```Python trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger], accumulate_grad_batches=4) ``` -------------------------------- ### Run Stable Diffusion Prediction using Cog (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/StableDiffusion-v2/cog_example/README.md This command executes a prediction using the configured Cog model. It takes the input prompt via the `-i` flag, demonstrating how to generate an image based on a text description. ```Shell cog predict -i prompt="monkey scuba diving" ``` -------------------------------- ### Project Dependencies - Python Requirements Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/requirements/requirements.txt Specifies the required Python packages and their minimum versions or source locations for the project to run correctly. These dependencies are typically installed using pip. ```Python Requirements best_download deepspeed ftfy>=6.0.1 git+https://github.com/EleutherAI/lm_dataformat.git@4eec05349977071bf67fc072290b95e31c8dd836 huggingface_hub>=0.11.0 lm_eval>=0.3.0 mpi4py>=3.0.3 numpy>=1.22.0 pybind11>=2.6.2 regex sentencepiece six tiktoken>=0.1.2 tokenizers>=0.12.1 transformers>=4.24.0 ``` -------------------------------- ### Compiling Img2img AIT Modules (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md Runs the `compile.py` script with the `--img2img True` flag to build AITemplate modules optimized for image-to-image tasks. Requires a Hugging Face access token (`ACCESS_TOKEN`). ```Shell python3 examples/05_stable_diffusion/compile.py --img2img True --token ACCESS_TOKEN ``` -------------------------------- ### Compile AITemplate Img2Img Modules Shell Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-anything-v3/README.md This command compiles the AITemplate modules specifically for the Stable Diffusion image-to-image (img2img) pipeline. It uses the `--img2img True` flag and requires a Hugging Face access token to access the model weights. ```Shell python3 examples/05_stable_diffusion/compile.py --img2img True --token ACCESS_TOKEN ``` -------------------------------- ### Run Gradio App with Scribbles ControlNet Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Executes the Python script 'gradio_scribble2image.py' to launch a Gradio web interface. This app allows users to generate images with Stable Diffusion controlled by user-drawn scribble inputs. ```Shell python gradio_scribble2image.py ``` -------------------------------- ### Run Gradio App with Canny Edge ControlNet Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Executes the Python script 'gradio_canny2image.py' to launch a Gradio web interface. This app allows users to generate images using Stable Diffusion controlled by Canny edge detection input. ```Shell python gradio_canny2image.py ``` -------------------------------- ### Run Interactive Scribble-to-Image Gradio App (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Launches a Gradio-based interactive application for generating images from scribbles using a Stable Diffusion 1.5 ControlNet model. It allows users to draw on a canvas and generate an image based on the drawing and a text prompt. ```python python gradio_scribble2image_interactive.py ``` -------------------------------- ### Setting Data Path in GPT-NeoX Configuration Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md This is a snippet from a YAML configuration file used by GPT-NeoX. It sets the `data-path` parameter, which specifies the prefix for the tokenized data files (`.bin` and `.idx`) that the model should use for training or evaluation. The example shows the path for the tokenized `enwik8` dataset. ```YAML "data-path": "./data/enwik8/enwik8_text_document", ``` -------------------------------- ### Parallelism Settings in GPT-NeoX Config Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Defines the pipeline and model parallelism sizes within the GPT-NeoX configuration, typically set based on the cluster setup and network topology. These values determine how the model is distributed across GPUs. ```YAML "pipe-parallel-size": 1, "model-parallel-size": 1, ``` -------------------------------- ### Defining Python Dependencies Source: https://github.com/runpod/serverless-workers/blob/main/requirements.txt This list specifies the exact versions of Python packages required for the project to run correctly. It is typically used with package managers like pip to install dependencies, ensuring a consistent environment. ```Python accelerate==0.15.0 bitsandbytes==0.36.0 cog==0.6.1 diffusers==0.12.1 runpod==0.8.4 scipy==1.10.0 transformers==4.26.0 torch==1.13.1 torchvision==0.14.1 ``` -------------------------------- ### Run Gradio App with M-LSD Lines ControlNet Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Executes the Python script 'gradio_hough2image.py' to launch a Gradio web interface. This app enables image generation with Stable Diffusion controlled by M-LSD (Mobile-Line Segment Detection) straight line inputs. ```Shell python gradio_hough2image.py ``` -------------------------------- ### Activate ControlNet Conda Environment Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md This command activates the newly created Conda environment named 'control'. All subsequent commands related to running ControlNet scripts should be executed within this activated environment to ensure correct dependencies are used. ```Shell conda activate control ``` -------------------------------- ### Building Docker Image for Model (BASH) Source: https://github.com/runpod/serverless-workers/blob/main/workers/huggingface-transformers/README.md This BASH command builds a Docker image. It uses the '--build-arg MODEL_NAME' to specify which model to include in the image, tags the image with a repository name, image name, and tag, and uses the current directory '.' as the build context. Requires Docker installed. ```BASH docker build --build-arg MODEL_NAME={model name} -t repo/image_name:tag . ``` -------------------------------- ### Run Fake Scribble-to-Image Gradio App (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Executes a Python script that runs a Gradio application for generating images from *synthesized* scribbles. This uses a ControlNet model based on scribbles but automatically generates the scribble input from an uploaded image instead of requiring manual drawing. ```python python gradio_fake_scribble2image.py ``` -------------------------------- ### Loading Fill50K Dataset with PyTorch Dataset (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/docs/train.md This Python class `MyDataset` inherits from `torch.utils.data.Dataset` to load the Fill50K training data. It reads the `prompt.json` file to get image paths and prompts, then uses OpenCV (`cv2`) to load source and target images. Images are converted from BGR to RGB, and normalized to specific ranges ([0, 1] for source, [-1, 1] for target) before being returned as a dictionary item. ```python import json import cv2 import numpy as np from torch.utils.data import Dataset class MyDataset(Dataset): def __init__(self): self.data = [] with open('./training/fill50k/prompt.json', 'rt') as f: for line in f: self.data.append(json.loads(line)) def __len__(self): return len(self.data) def __getitem__(self, idx): item = self.data[idx] source_filename = item['source'] target_filename = item['target'] prompt = item['prompt'] source = cv2.imread('./training/fill50k/' + source_filename) target = cv2.imread('./training/fill50k/' + target_filename) # Do not forget that OpenCV read images in BGR order. source = cv2.cvtColor(source, cv2.COLOR_BGR2RGB) target = cv2.cvtColor(target, cv2.COLOR_BGR2RGB) # Normalize source images to [0, 1]. source = source.astype(np.float32) / 255.0 # Normalize target images to [-1, 1]. target = (target.astype(np.float32) / 127.5) - 1.0 return dict(jpg=target, txt=prompt, hint=source) ``` -------------------------------- ### Run Gradio App with HED Boundary ControlNet Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Executes the Python script 'gradio_hed2image.py' to launch a Gradio web interface. This app facilitates image generation using Stable Diffusion controlled by soft HED (Holistically-Nested Edge Detection) boundaries, suitable for tasks like recoloring. ```Shell python gradio_hed2image.py ``` -------------------------------- ### Run Normal Map-to-Image Gradio App (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/README.md Executes a Python script to run a Gradio application for image generation using normal maps. The application computes the normal map from a MiDaS depth map and a user-defined threshold, then uses this with Stable Diffusion 1.5 ControlNet to generate images that preserve geometric details based on the normal map and a text prompt. ```python python gradio_normal2image.py ``` -------------------------------- ### Downloading GPT-NeoX Slim Weights with Wget (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md This command uses `wget` to recursively download the slim weights for the GPT-NeoX-20B model from a public URL. It cuts directory levels, avoids creating host directories, rejects index files, and saves the files into a specified local directory (`20B_checkpoints`). These weights are suitable for inference or finetuning. ```bash wget --cut-dirs=5 -nH -r --no-parent --reject "index.html*" https://the-eye.eu/public/AI/models/GPT-NeoX-20B/slim_weights/ -P 20B_checkpoints ``` -------------------------------- ### Locking MI-250 GCD Frequency (Shell) Source: https://github.com/runpod/serverless-workers/blob/main/workers/AIT-StableDiffusion/05_stable_diffusion-v1.5/README.md This shell command is used to lock the frequency of a specific Graphics Compute Die (GCD) on an AMD MI-250 GPU for performance benchmarking. The `-d x` flag specifies the GPU ID, and `--setperfdeterminism 1700` sets the performance state to a deterministic mode, likely fixing the clock speed. ```shell rocm-smi -d x --setperfdeterminism 1700 ``` -------------------------------- ### Downloading GPT-NeoX Full Weights with Wget (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md This command uses `wget` to recursively download the full weights for the GPT-NeoX-20B model from a public URL. It cuts directory levels, avoids creating host directories, rejects index files, and saves the files into a specified local directory (`20B_checkpoints`). These weights include optimizer states and are significantly larger. ```bash wget --cut-dirs=5 -nH -r --no-parent --reject "index.html*" https://the-eye.eu/public/AI/models/GPT-NeoX-20B/full_weights/ -P 20B_checkpoints ``` -------------------------------- ### General Usage of deepy.py for Training Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md Shows the basic command structure for launching the `train.py` script using the `deepy.py` wrapper. This wrapper utilizes DeepSpeed's launcher to distribute training across multiple GPUs or nodes, accepting one or more configuration files. ```Bash python ./deepy.py train.py [path/to/config1.yml] [path/to/config2.yml] ... ``` -------------------------------- ### Launching GPT-NeoX Scripts with deepy.py Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md This is the general command structure for launching GPT-NeoX functionality (training, evaluation, generation) using the `deepy.py` wrapper around the `deepspeed` launcher. It requires specifying the main script to run (e.g., `train.py`, `evaluate.py`, `generate.py`) followed by one or more paths to YAML configuration files. ```Bash ./deepy.py [script.py] [./path/to/config_1.yml] [./path/to/config_2.yml] ... [./path/to/config_n.yml] ``` -------------------------------- ### Configuring Data, Checkpoint, and Logging Paths/Intervals in YAML Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Configuration for data implementation, split ratios, data paths, vocabulary/merge files, save/load directories, tensorboard/log directories, and save/evaluation intervals. ```YAML "data-impl": "mmap", "split": "949,50,1", # Suggested data paths when using GPT-NeoX locally "data-path": "data/enwik8/enwik8_text_document", #"train-data-path": "data/enwik8/enwik8_text_document", #"test-data-path": "data/enwik8/enwik8_text_document", #"valid-data-path": "data/enwik8/enwik8_text_document", "vocab-file": "data/gpt2-vocab.json", "merge-file": "data/gpt2-merges.txt", "save": "checkpoints", "load": "checkpoints", "tensorboard-dir": "tensorboard", "log-dir": "logs", "save-interval": 10000, "eval-interval": 1000, "eval-iters": 10, ``` -------------------------------- ### Usage for preprocess_data.py Script Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md Displays the command-line arguments and options available for the `preprocess_data.py` script, which is used to pretokenize custom datasets. It shows options for input/output paths, tokenizer type, vocab/merge files, dataset implementation, and runtime parameters. ```Shell usage: preprocess_data.py [-h] --input INPUT [--jsonl-keys JSONL_KEYS [JSONL_KEYS ...]] [--num-docs NUM_DOCS] --tokenizer-type {HFGPT2Tokenizer,HFTokenizer,GPT2BPETokenizer,CharLevelTokenizer} [--vocab-file VOCAB_FILE] [--merge-file MERGE_FILE] [--append-eod] [--ftfy] --output-prefix OUTPUT_PREFIX [--dataset-impl {lazy,cached,mmap}] [--workers WORKERS] [--log-interval LOG_INTERVAL] optional arguments: -h, --help show this help message and exit input data: --input INPUT Path to input jsonl files or lmd archive(s) - if using multiple archives, put them in a comma separated list --jsonl-keys JSONL_KEYS [JSONL_KEYS ...] space separate listed of keys to extract from jsonl. Defa --num-docs NUM_DOCS Optional: Number of documents in the input data (if known) for an accurate progress bar. tokenizer: --tokenizer-type {HFGPT2Tokenizer,HFTokenizer,GPT2BPETokenizer,CharLevelTokenizer} What type of tokenizer to use. --vocab-file VOCAB_FILE Path to the vocab file --merge-file MERGE_FILE Path to the BPE merge file (if necessary). --append-eod Append an token to the end of a document. --ftfy Use ftfy to clean text output data: --output-prefix OUTPUT_PREFIX Path to binary output file without suffix --dataset-impl {lazy,cached,mmap} Dataset implementation to use. Default: mmap runtimes: --workers WORKERS Number of worker processes to launch --log-interval LOG_INTERVAL Interval between progress updates ``` -------------------------------- ### Cite GPT-NeoX-20B Model (BibTeX) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md BibTeX entry for citing the GPT-NeoX-20B model paper. Includes authors, title, booktitle, URL, and year. ```bibtex @inproceedings{gpt-neox-20b, title={{GPT-NeoX-20B}: An Open-Source Autoregressive Language Model}, author={Black, Sid and Biderman, Stella and Hallahan, Eric and Anthony, Quentin and Gao, Leo and Golding, Laurence and He, Horace and Leahy, Connor and McDonell, Kyle and Phang, Jason and Pieler, Michael and Prashanth, USVSN Sai and Purohit, Shivanshu and Reynolds, Laria and Tow, Jonathan and Wang, Ben and Weinbach, Samuel}, booktitle={Proceedings of the ACL Workshop on Challenges \& Perspectives in Creating Large Language Models}, url={https://arxiv.org/abs/2204.06745}, year={2022} } ``` -------------------------------- ### Evaluating GPT-NeoX Model with Evaluation Harness (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md This command executes the `evaluate.py` script using `deepy.py` to run model evaluation on specified downstream tasks. It requires configuration files (`your_configs.yml`) and a list of evaluation tasks provided via the `--eval_tasks` argument, referencing tasks available in the `lm-evaluation-harness` repository. ```bash python ./deepy.py evaluate.py -d configs your_configs.yml --eval_tasks task1 task2 ... taskn ``` -------------------------------- ### Configuring Learning Rate Scheduler in YAML Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Settings for controlling the learning rate decay over time, including decay iterations, decay style, and warmup percentage. ```YAML "lr-decay-iters": 320000, "lr-decay-style": "cosine", "warmup": 0.01, ``` -------------------------------- ### Tokenizing enwik8 Dataset Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md This command executes the `prepare_data.py` script using Python to download and tokenize the `enwik8` dataset. The `-d ./data` argument specifies that the tokenized data should be saved to the `./data` directory. It uses the default tokenizer (GPT2 Tokenizer mentioned in the text). ```Bash python prepare_data.py -d ./data ``` -------------------------------- ### Cite GPT-NeoX Library (BibTeX) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md BibTeX entry for citing the GPT-NeoX library repository. Includes authors, title, URL, DOI, month, year, and version. ```bibtex @software{gpt-neox-library, title = {{GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch}}, author = {Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel}, url = {https://www.github.com/eleutherai/gpt-neox}, doi = {10.5281/zenodo.5879544}, month = {8}, year = {2021}, version = {0.0.1}, } ``` -------------------------------- ### Upload Model to Hugging Face Hub (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md Commands to log in to the Hugging Face CLI and upload a model using the provided upload script. Requires a Hugging Face Hub user token for authentication. ```bash huggingface-cli login python ./tools/upload.py ``` -------------------------------- ### Configuring Batch Size and Gradient Accumulation in YAML Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Settings for controlling the effective training batch size and gradient accumulation steps per GPU, following Deepspeed's configuration approach. ```YAML # batch / data settings "train_micro_batch_size_per_gpu": 4, "gradient_accumulation_steps": 1, ``` -------------------------------- ### Testing PyTorch Dataset Loading (Python) Source: https://github.com/runpod/serverless-workers/blob/main/workers/ControlNet/docs/train.md This script demonstrates how to use the `MyDataset` class. It creates an instance of the dataset, prints its total number of items using `len()`, and then accesses a specific item by index (1234). It prints the prompt (`txt`) and the shapes of the target image (`jpg`) and source image (`hint`) to verify successful loading and processing. ```python from tutorial_dataset import MyDataset dataset = MyDataset() print(len(dataset)) item = dataset[1234] jpg = item['jpg'] txt = item['txt'] hint = item['hint'] print(txt) print(jpg.shape) print(hint.shape) ``` -------------------------------- ### Configuring SLURM Integration (YAML) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Specifies the necessary configuration settings to enable SLURM as the launcher and coordinate nodes when running GPT-NeoX on a SLURM cluster. ```yaml "launcher": "slurm", "deepspeed_slurm": true ``` -------------------------------- ### Convert GPT-NeoX Checkpoint to Hugging Face Format (Bash) Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md Command to convert a GPT-NeoX model checkpoint to the Hugging Face Transformers GPTNeoXModel format using the provided conversion script. Requires specifying the input checkpoint directory, configuration file, and output directory. ```bash python ./tools/convert_to_hf.py --input_dir /path/to/model/global_stepXXX --config_file your_config.yml --output_dir hf_model/save/location ``` -------------------------------- ### Tokenizing Pile Subset with Custom Tokenizer Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md This command runs the `prepare_data.py` script to download and tokenize a single shard of the Pile dataset (`pile_subset`). It saves the output to `./data` (`-d ./data`), specifies the tokenizer type as `HFTokenizer` (`-t HFTokenizer`), and provides the path to a custom vocabulary file (`--vocab-file ./20B_checkpoints/20B_tokenizer.json`), likely for the GPT-NeoX-20B tokenizer. ```Bash python prepare_data.py -d ./data -t HFTokenizer --vocab-file ./20B_checkpoints/20B_tokenizer.json pile_subset ``` -------------------------------- ### Running GPT-NeoX Docker Container with GPUs Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md This command launches a Docker container based on the `gpt-neox` image. It uses `nvidia-docker` to expose GPUs 0-3, sets shared memory size to 1GB (`--shm-size=1g`), and disables memory locking limits (`--ulimit memlock=-1`) which are important for NCCL. It also mounts the current directory (`$PWD`) to `/gpt-neox` inside the container. The `--rm` flag removes the container after exit, and `-it` provides an interactive terminal. ```Bash nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 --shm-size=1g --ulimit memlock=-1 --mount type=bind,src=$PWD,dst=/gpt-neox gpt-neox ``` -------------------------------- ### Evaluating Model with deepy.py and lm-evaluation-harness Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README.md This command runs the `evaluate.py` script using `deepy.py` and the `./configs/20B.yml` configuration. It uses the `--eval_tasks` argument to specify a list of evaluation tasks (e.g., `triviaqa`, `piqa`) from the `lm-evaluation-harness`. ```Bash ./deepy.py evaluate.py ./configs/20B.yml --eval_tasks triviaqa piqa ``` -------------------------------- ### Configuring ZeRO Optimization in YAML Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/configs/README.md Configuration settings for Deepspeed's ZeRO optimization, including stage, allgather/reduce settings, and contiguous gradients. Note the separate zero_allow_untested_optimizer flag. ```YAML # for all zero_optimization options, see https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training "zero_optimization": { "stage": 0, "allgather_partitions": True, "allgather_bucket_size": 500000000, "overlap_comm": True, "reduce_scatter": True, "reduce_bucket_size": 500000000, "contiguous_gradients": True, }, "zero_allow_untested_optimizer": false, ``` -------------------------------- ### Prepare Test Data with Python Script Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/tests/README.md Executes the `prepare_data.py` script to download or generate the necessary data required for running the project's tests. ```bash python prepare_data.py ``` -------------------------------- ### Listing Python Dependencies Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/requirements/requirements-dev.txt This snippet provides a list of Python packages and their version constraints required for the project to run correctly. It is typically found in a requirements.txt file. ```Python autopep8>=1.5.6 clang-format>=13.0.1 pre-commit>=2.17.0 pytest>=6.2.3 pytest-cov>=2.11.1 pytest-forked>=1.3.0 pytest-xdist ``` -------------------------------- ### Configure Mup Settings in gpt-neox Source: https://github.com/runpod/serverless-workers/blob/main/workers/gpt-neox/README-MUP.md This configuration block defines settings for enabling and controlling Mup (μP) behavior within the gpt-neox framework. It includes options for enabling Mup, saving base shapes for initialization, specifying the base shapes file path, enabling coordinate checks for verification, and setting hyperparameters for Mup tuning. ```Configuration # mup "use-mup": true, "save-base-shapes": false, # this only needs to be enabled once in order to generate the base-shapes-file on each rank "base-shapes-file": "base-shapes", # load base shapes from this file "coord-check": false, # generate coord check plots to verify mup's implementation in neox # mup hp search "mup-init-scale": 1.0, "mup-attn-temp": 1.0, "mup-output-temp": 1.0, "mup-embedding-mult": 1.0, "mup-rp-embedding-mult": 1.0, ```