### Run Sample CUDA Container with Podman

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html

Use this command to run a sample CUDA container with Podman. This verifies the installation and configuration, similar to the Docker example.

```bash
podman run --rm --security-opt=label=disable \
 --device=nvidia.com/gpu=all \
 ubuntu nvidia-smi
```

--------------------------------

### Install NVIDIA Container Toolkit with apt

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Installs the NVIDIA Container Toolkit packages on Ubuntu/Debian systems after configuring the repository. Ensure the NVIDIA GPU driver is installed prior to this step.

```bash
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
 sudo apt-get update 
 sudo apt-get install -y \
 nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
```

--------------------------------

### Install NVIDIA Container Toolkit Packages with zypper

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Install the NVIDIA Container Toolkit and its dependencies using zypper, specifying a version.

```bash
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
 sudo zypper --gpg-auto-import-keys install -y \
 nvidia-container-toolkit-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 nvidia-container-toolkit-base-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 libnvidia-container-tools-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 libnvidia-container1-${NVIDIA_CONTAINER_TOOLKIT_VERSION}
```

--------------------------------

### Run Sample CUDA Container with Docker

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html

Use this command to run a sample CUDA container with Docker. Ensure the NVIDIA Container Toolkit and drivers are installed.

```bash
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
```

--------------------------------

### Install NVIDIA Container Toolkit with dnf

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Installs the NVIDIA Container Toolkit packages on RHEL/CentOS/Fedora/Amazon Linux systems after configuring the repository. Ensure the NVIDIA GPU driver is installed prior to this step.

```bash
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
 sudo dnf install -y \
 nvidia-container-toolkit-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 nvidia-container-toolkit-base-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 lib-nvidia-container-tools-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 lib-nvidia-container1-${NVIDIA_CONTAINER_TOOLKIT_VERSION}
```

--------------------------------

### Pin All NVIDIA Container Stack Components

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html

To ensure compatibility when using a specific version of `nvidia-container-runtime` (e.g., 3.5.0), it is recommended to pin all related NVIDIA container stack components to their corresponding versions. This command installs the runtime, toolkit, tools, and library at the specified versions.

```bash
sudo apt-get install \
 nvidia-container-runtime=3.5.0-1 \
 nvidia-container-toolkit=1.5.1-1 \
 lib-nvidia-container-tools=1.5.1-1 \
 lib-nvidia-container1==1.5.1-1
```

--------------------------------

### Run CUDA Container on Two GPUs

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

To run a container on a specific number of GPUs, use the `--gpus` option followed by the count of GPUs. This example starts a container with access to two GPUs.

```bash
docker run --rm --gpus 2 nvidia/cuda nvidia-smi
```

--------------------------------

### Install Specific nvidia-container-runtime Version

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html

When installing older versions of `nvidia-container-runtime` on Debian-based systems, explicitly specify the `nvidia-container-toolkit` version to avoid unmet dependency errors. This command pins both packages to compatible versions.

```bash
sudo apt-get install \
 nvidia-container-runtime=3.5.0-1 \
 nvidia-container-toolkit=1.5.1-1
```

--------------------------------

### Run CUDA Container with All GPUs

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

Use the `--gpus all` option to make all available GPUs accessible within the container. This is a common way to start a GPU-enabled CUDA container.

```bash
docker run --rm --gpus all nvidia/cuda nvidia-smi
```

--------------------------------

### Run Container with Specific GPU

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html

Example of running a CUDA container and listing GPUs. This may fail if the selected device does not have /dev/dri or /dev/nvidia-caps nodes.

```bash
$ docker run -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=0 nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi -L
 docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: chmod: cannot access '/var/lib/docker/overlay2/9069fafcb6e39ccf704fa47b52ca92a1d48ca5ccfedd381f407456fb6cd3f9f0/merged/dev/dri': No such file or directory: unknown.
 ERRO[0000] error waiting for container: context canceled
```

--------------------------------

### Run Container with All GPUs using Podman

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

Launches an Ubuntu container with access to all NVIDIA GPUs using Podman. Requires Podman v4.1.0 or later for `--device` argument support. Ensure no NVIDIA Container Runtime hook is active.

```bash
podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
```

--------------------------------

### Configure apt Repository for NVIDIA Container Toolkit

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Configures the production repository for NVIDIA Container Toolkit on Debian-based systems. Optionally enables experimental packages.

```bash
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
```

```bash
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
```

--------------------------------

### Enable Experimental Repository with zypper

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Optionally enable the experimental repository for NVIDIA Container Toolkit packages using zypper.

```bash
sudo zypper modifyrepo --enable nvidia-container-toolkit-experimental
```

--------------------------------

### Run Container with Specific GPUs using Podman

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

Launches an Ubuntu container requesting specific GPUs (e.g., GPU 0 and the first MIG device on GPU 1) using Podman. The output will show only the UUIDs of the requested devices.

```bash
podman run --rm \
    --device nvidia.com/gpu=0 \
    --device nvidia.com/gpu=1:0 \
    --security-opt=label=disable \
    ubuntu nvidia-smi -L
```

--------------------------------

### Identify Conflicting apt Sources

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html

Use this command to find repository configuration files that might be causing conflicts with the NVIDIA Container Toolkit's signed-by directive.

```bash
grep "nvidia.github.io" /etc/apt/sources.list.d/*
```

--------------------------------

### Configure Containerd Runtime with nvidia-ctk

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Configure the containerd container runtime to use the NVIDIA Container Runtime.

```bash
sudo nvidia-ctk runtime configure --runtime=containerd
```

--------------------------------

### Run Container with Compute and Utility Capabilities

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

Use this command to run a container with specific NVIDIA driver capabilities enabled, such as compute and utility, allowing CUDA and NVML usage. It also specifies visible devices.

```bash
docker run --rm --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=2,3 \
    -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \
    nvidia/cuda nvidia-smi
```

```bash
docker run --rm --gpus 'all,"capabilities=compute,utility"' \
    nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
```

--------------------------------

### Configure Rootless Docker Runtime with nvidia-ctk

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Configure the container runtime for Docker running in rootless mode, specifying a custom daemon.json path.

```bash
nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
```

--------------------------------

### Configure CRI-O Runtime with nvidia-ctk

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Configure the CRI-O container runtime to use the NVIDIA Container Runtime.

```bash
sudo nvidia-ctk runtime configure --runtime=crio
```

--------------------------------

### List Potentially Conflicting apt Files

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html

This command helps identify specific files in the sources.list.d directory that reference the NVIDIA repository and do not use the expected toolkit list file, aiding in resolving apt update errors.

```bash
grep -l "nvidia.github.io" /etc/apt/sources.list.d/* | grep -vE "/nvidia-container-toolkit.list$"
```

--------------------------------

### Add NVIDIA Container Toolkit Repository with zypper

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Configure the production repository for NVIDIA Container Toolkit packages on OpenSUSE and SLE systems using zypper.

```bash
sudo zypper ar https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
```

--------------------------------

### Configure nvidia-container-cli for Rootless Docker

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Modify the nvidia-container-runtime configuration for rootless Docker to disable cgroups.

```bash
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
```

--------------------------------

### Configure dnf Repository for NVIDIA Container Toolkit

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Configures the production repository for NVIDIA Container Toolkit on RPM-based systems. Use `dnf-config-manager` to enable experimental packages.

```bash
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
```

```bash
sudo dnf-config-manager --enable nvidia-container-toolkit-experimental
```

--------------------------------

### Pin Dependencies to Container Toolkit 1.17.6 (apt)

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html

Use this command to pin dependencies to version 1.17.6 when using apt on Ubuntu or Debian systems. This is a workaround for a known issue in v1.17.7.

```bash
NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.6-1
sudo apt-get install -y --allow-downgrades \
 nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
```

--------------------------------

### Generate CDI Specification

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

Use this command to generate a CDI specification file. The `--output` argument specifies the file path; omitting it prints to STDOUT. Requires sudo for file creation.

```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```

--------------------------------

### Configure Docker Daemon for NVIDIA Runtime

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html

Command to configure the Docker daemon's config file for use with the NVIDIA Container Runtime.

```bash
nvidia-ctk runtime configure --runtime=nvidia
```

--------------------------------

### Run Container with CDI Mode Enabled

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

Demonstrates running a container with CDI mode explicitly enabled, using 'all' for NVIDIA_VISIBLE_DEVICES. This is equivalent to specifying 'nvidia.com/gpu=all' when CDI mode is active.

```docker
docker run --rm -ti --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=all \
      ubuntu nvidia-smi -L
```

--------------------------------

### List Available CDI Devices

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

Lists the CDI devices detected on the system. Useful for verifying generated devices and understanding naming conventions.

```bash
nvidia-ctk cdi list
```

--------------------------------

### Configure Docker Runtime with nvidia-ctk

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Configure the Docker container runtime to use the NVIDIA Container Runtime by modifying the daemon.json file.

```bash
sudo nvidia-ctk runtime configure --runtime=docker
```

--------------------------------

### Generate SELinux Policy for nvidia-docker

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html

Generate a local SELinux policy to allow nvidia-docker to function correctly. This involves capturing audit logs and creating a policy module.

```bash
ausearch -c 'nvidia-docker' --raw | audit2allow -M my-nvidiadocker
```

```bash
semodule -X 300 -i my-nvidiadocker.pp
```

--------------------------------

### Run CUDA Container with All GPUs using NVIDIA_VISIBLE_DEVICES

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

Alternatively, specify all GPUs using the `NVIDIA_VISIBLE_DEVICES` environment variable and the `--runtime=nvidia` flag. This method is useful when the default runtime is not set to NVIDIA.

```bash
docker run --rm --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda nvidia-smi
```

--------------------------------

### Generate SELinux Policy Module

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html

When encountering permission denied errors with SELinux and nvidia-docker, this command generates a local policy module to allow the necessary access, helping to resolve the issue.

```bash
ausearch -c 'nvidia-docker' --raw | audit2allow -M my-nvidiadocker
```

--------------------------------

### Pin Dependencies to Container Toolkit 1.17.6 (dnf)

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html

Use this command to pin dependencies to version 1.17.6 when using dnf on RHEL/CentOS, Fedora, or Amazon Linux systems. This is a workaround for a known issue in v1.17.7.

```bash
NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.6-1
sudo dnf install -y \
 nvidia-container-toolkit-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 nvidia-container-toolkit-base-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 libnvidia-container-tools-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
 libnvidia-container1-${NVIDIA_CONTAINER_TOOLKIT_VERSION}
```

--------------------------------

### Disable SELinux for Docker/Podman

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html

Use the `--security-opt=label=disable` option on the Docker or Podman command line to bypass SELinux restrictions when encountering 'Failed to initialize NVML: Insufficient Permissions'. Note that this disables SELinux separation.

```bash
docker run --security-opt=label=disable ...
Podman run --security-opt=label=disable ...
```

--------------------------------

### Run Container with Specific GPU UUID

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

Query the UUID of a specific GPU using `nvidia-smi` and then use this UUID with the `--gpus device=` option to launch a container with access to only that GPU.

```bash
nvidia-smi -i 3 --query-gpu=uuid --format=csv uuid
```

```bash
docker run --gpus device=GPU-18a3e86f-4c0e-cd9f-59c3-55488c4b0c24 \
     nvidia/cuda nvidia-smi
```

--------------------------------

### Run CUDA Container on Specific GPUs using --gpus

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

Specify exact GPUs by their indices or UUIDs using the `--gpus` option with the `device` parameter. The device list should be encapsulated in single and double quotes.

```bash
docker run --gpus '"device=1,2"' \
    nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv uuid
```

--------------------------------

### Configure Docker cgroup driver to cgroupfs

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html

Update Docker's configuration to use `cgroupfs` as the cgroup driver to prevent containers from losing GPU access during `systemctl daemon-reload`. This setting is applied to `/etc/docker/daemon.json`.

```json
{
 "exec-opts": ["native.cgroupdriver=cgroupfs"]
}
```

--------------------------------

### Restart Rootless Docker Daemon

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Restart the rootless Docker daemon using the --user flag.

```bash
systemctl --user restart docker
```

--------------------------------

### Run CUDA Container on Specific GPUs using NVIDIA_VISIBLE_DEVICES

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

Use the `NVIDIA_VISIBLE_DEVICES` environment variable to specify a comma-separated list of GPU indices or UUIDs. Ensure the NVIDIA runtime is selected.

```bash
docker run --rm --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=1,2 \
    nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv uuid
```

--------------------------------

### Run Container with CDI using Docker

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

Injects requested CDI devices into a Docker container by setting the `NVIDIA_VISIBLE_DEVICES` environment variable. This method is used with non-CDI-enabled runtimes by configuring the NVIDIA Container Runtime in `cdi` mode.

```bash
docker run --rm -ti --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all \
      ubuntu nvidia-smi -L
```

--------------------------------

### Set NVIDIA Driver Capabilities in Dockerfile

Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html

Set environment variables within a Dockerfile to pre-configure NVIDIA driver capabilities and visible devices for containers. This avoids needing to set them on the `docker run` command line.

```dockerfile
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
```