### Run Sample CUDA Container with Podman Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html Use this command to run a sample CUDA container with Podman. This verifies the installation and configuration, similar to the Docker example. ```bash podman run --rm --security-opt=label=disable \ --device=nvidia.com/gpu=all \ ubuntu nvidia-smi ``` -------------------------------- ### Install NVIDIA Container Toolkit with apt Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Installs the NVIDIA Container Toolkit packages on Ubuntu/Debian systems after configuring the repository. Ensure the NVIDIA GPU driver is installed prior to this step. ```bash export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1 sudo apt-get update sudo apt-get install -y \ nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION} ``` -------------------------------- ### Install NVIDIA Container Toolkit Packages with zypper Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Install the NVIDIA Container Toolkit and its dependencies using zypper, specifying a version. ```bash export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1 sudo zypper --gpg-auto-import-keys install -y \ nvidia-container-toolkit-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ nvidia-container-toolkit-base-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container-tools-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container1-${NVIDIA_CONTAINER_TOOLKIT_VERSION} ``` -------------------------------- ### Run Sample CUDA Container with Docker Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html Use this command to run a sample CUDA container with Docker. Ensure the NVIDIA Container Toolkit and drivers are installed. ```bash sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi ``` -------------------------------- ### Install NVIDIA Container Toolkit with dnf Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Installs the NVIDIA Container Toolkit packages on RHEL/CentOS/Fedora/Amazon Linux systems after configuring the repository. Ensure the NVIDIA GPU driver is installed prior to this step. ```bash export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1 sudo dnf install -y \ nvidia-container-toolkit-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ nvidia-container-toolkit-base-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ lib-nvidia-container-tools-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ lib-nvidia-container1-${NVIDIA_CONTAINER_TOOLKIT_VERSION} ``` -------------------------------- ### Pin All NVIDIA Container Stack Components Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html To ensure compatibility when using a specific version of `nvidia-container-runtime` (e.g., 3.5.0), it is recommended to pin all related NVIDIA container stack components to their corresponding versions. This command installs the runtime, toolkit, tools, and library at the specified versions. ```bash sudo apt-get install \ nvidia-container-runtime=3.5.0-1 \ nvidia-container-toolkit=1.5.1-1 \ lib-nvidia-container-tools=1.5.1-1 \ lib-nvidia-container1==1.5.1-1 ``` -------------------------------- ### Run CUDA Container on Two GPUs Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html To run a container on a specific number of GPUs, use the `--gpus` option followed by the count of GPUs. This example starts a container with access to two GPUs. ```bash docker run --rm --gpus 2 nvidia/cuda nvidia-smi ``` -------------------------------- ### Install Specific nvidia-container-runtime Version Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html When installing older versions of `nvidia-container-runtime` on Debian-based systems, explicitly specify the `nvidia-container-toolkit` version to avoid unmet dependency errors. This command pins both packages to compatible versions. ```bash sudo apt-get install \ nvidia-container-runtime=3.5.0-1 \ nvidia-container-toolkit=1.5.1-1 ``` -------------------------------- ### Run CUDA Container with All GPUs Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html Use the `--gpus all` option to make all available GPUs accessible within the container. This is a common way to start a GPU-enabled CUDA container. ```bash docker run --rm --gpus all nvidia/cuda nvidia-smi ``` -------------------------------- ### Run Container with Specific GPU Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html Example of running a CUDA container and listing GPUs. This may fail if the selected device does not have /dev/dri or /dev/nvidia-caps nodes. ```bash $ docker run -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=0 nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi -L docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: chmod: cannot access '/var/lib/docker/overlay2/9069fafcb6e39ccf704fa47b52ca92a1d48ca5ccfedd381f407456fb6cd3f9f0/merged/dev/dri': No such file or directory: unknown. ERRO[0000] error waiting for container: context canceled ``` -------------------------------- ### Run Container with All GPUs using Podman Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html Launches an Ubuntu container with access to all NVIDIA GPUs using Podman. Requires Podman v4.1.0 or later for `--device` argument support. Ensure no NVIDIA Container Runtime hook is active. ```bash podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L ``` -------------------------------- ### Configure apt Repository for NVIDIA Container Toolkit Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Configures the production repository for NVIDIA Container Toolkit on Debian-based systems. Optionally enables experimental packages. ```bash curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list ``` ```bash sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list ``` -------------------------------- ### Enable Experimental Repository with zypper Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Optionally enable the experimental repository for NVIDIA Container Toolkit packages using zypper. ```bash sudo zypper modifyrepo --enable nvidia-container-toolkit-experimental ``` -------------------------------- ### Run Container with Specific GPUs using Podman Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html Launches an Ubuntu container requesting specific GPUs (e.g., GPU 0 and the first MIG device on GPU 1) using Podman. The output will show only the UUIDs of the requested devices. ```bash podman run --rm \ --device nvidia.com/gpu=0 \ --device nvidia.com/gpu=1:0 \ --security-opt=label=disable \ ubuntu nvidia-smi -L ``` -------------------------------- ### Identify Conflicting apt Sources Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html Use this command to find repository configuration files that might be causing conflicts with the NVIDIA Container Toolkit's signed-by directive. ```bash grep "nvidia.github.io" /etc/apt/sources.list.d/* ``` -------------------------------- ### Configure Containerd Runtime with nvidia-ctk Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Configure the containerd container runtime to use the NVIDIA Container Runtime. ```bash sudo nvidia-ctk runtime configure --runtime=containerd ``` -------------------------------- ### Run Container with Compute and Utility Capabilities Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html Use this command to run a container with specific NVIDIA driver capabilities enabled, such as compute and utility, allowing CUDA and NVML usage. It also specifies visible devices. ```bash docker run --rm --runtime=nvidia \ -e NVIDIA_VISIBLE_DEVICES=2,3 \ -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \ nvidia/cuda nvidia-smi ``` ```bash docker run --rm --gpus 'all,"capabilities=compute,utility"' \ nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi ``` -------------------------------- ### Configure Rootless Docker Runtime with nvidia-ctk Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Configure the container runtime for Docker running in rootless mode, specifying a custom daemon.json path. ```bash nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json ``` -------------------------------- ### Configure CRI-O Runtime with nvidia-ctk Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Configure the CRI-O container runtime to use the NVIDIA Container Runtime. ```bash sudo nvidia-ctk runtime configure --runtime=crio ``` -------------------------------- ### List Potentially Conflicting apt Files Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html This command helps identify specific files in the sources.list.d directory that reference the NVIDIA repository and do not use the expected toolkit list file, aiding in resolving apt update errors. ```bash grep -l "nvidia.github.io" /etc/apt/sources.list.d/* | grep -vE "/nvidia-container-toolkit.list$" ``` -------------------------------- ### Add NVIDIA Container Toolkit Repository with zypper Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Configure the production repository for NVIDIA Container Toolkit packages on OpenSUSE and SLE systems using zypper. ```bash sudo zypper ar https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo ``` -------------------------------- ### Configure nvidia-container-cli for Rootless Docker Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Modify the nvidia-container-runtime configuration for rootless Docker to disable cgroups. ```bash sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place ``` -------------------------------- ### Configure dnf Repository for NVIDIA Container Toolkit Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Configures the production repository for NVIDIA Container Toolkit on RPM-based systems. Use `dnf-config-manager` to enable experimental packages. ```bash curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo ``` ```bash sudo dnf-config-manager --enable nvidia-container-toolkit-experimental ``` -------------------------------- ### Pin Dependencies to Container Toolkit 1.17.6 (apt) Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html Use this command to pin dependencies to version 1.17.6 when using apt on Ubuntu or Debian systems. This is a workaround for a known issue in v1.17.7. ```bash NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.6-1 sudo apt-get install -y --allow-downgrades \ nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION} ``` -------------------------------- ### Generate CDI Specification Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html Use this command to generate a CDI specification file. The `--output` argument specifies the file path; omitting it prints to STDOUT. Requires sudo for file creation. ```bash sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml ``` -------------------------------- ### Configure Docker Daemon for NVIDIA Runtime Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html Command to configure the Docker daemon's config file for use with the NVIDIA Container Runtime. ```bash nvidia-ctk runtime configure --runtime=nvidia ``` -------------------------------- ### Run Container with CDI Mode Enabled Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html Demonstrates running a container with CDI mode explicitly enabled, using 'all' for NVIDIA_VISIBLE_DEVICES. This is equivalent to specifying 'nvidia.com/gpu=all' when CDI mode is active. ```docker docker run --rm -ti --runtime=nvidia \ -e NVIDIA_VISIBLE_DEVICES=all \ ubuntu nvidia-smi -L ``` -------------------------------- ### List Available CDI Devices Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html Lists the CDI devices detected on the system. Useful for verifying generated devices and understanding naming conventions. ```bash nvidia-ctk cdi list ``` -------------------------------- ### Configure Docker Runtime with nvidia-ctk Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Configure the Docker container runtime to use the NVIDIA Container Runtime by modifying the daemon.json file. ```bash sudo nvidia-ctk runtime configure --runtime=docker ``` -------------------------------- ### Generate SELinux Policy for nvidia-docker Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html Generate a local SELinux policy to allow nvidia-docker to function correctly. This involves capturing audit logs and creating a policy module. ```bash ausearch -c 'nvidia-docker' --raw | audit2allow -M my-nvidiadocker ``` ```bash semodule -X 300 -i my-nvidiadocker.pp ``` -------------------------------- ### Run CUDA Container with All GPUs using NVIDIA_VISIBLE_DEVICES Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html Alternatively, specify all GPUs using the `NVIDIA_VISIBLE_DEVICES` environment variable and the `--runtime=nvidia` flag. This method is useful when the default runtime is not set to NVIDIA. ```bash docker run --rm --runtime=nvidia \ -e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda nvidia-smi ``` -------------------------------- ### Generate SELinux Policy Module Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html When encountering permission denied errors with SELinux and nvidia-docker, this command generates a local policy module to allow the necessary access, helping to resolve the issue. ```bash ausearch -c 'nvidia-docker' --raw | audit2allow -M my-nvidiadocker ``` -------------------------------- ### Pin Dependencies to Container Toolkit 1.17.6 (dnf) Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/release-notes.html Use this command to pin dependencies to version 1.17.6 when using dnf on RHEL/CentOS, Fedora, or Amazon Linux systems. This is a workaround for a known issue in v1.17.7. ```bash NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.6-1 sudo dnf install -y \ nvidia-container-toolkit-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ nvidia-container-toolkit-base-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container-tools-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container1-${NVIDIA_CONTAINER_TOOLKIT_VERSION} ``` -------------------------------- ### Disable SELinux for Docker/Podman Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html Use the `--security-opt=label=disable` option on the Docker or Podman command line to bypass SELinux restrictions when encountering 'Failed to initialize NVML: Insufficient Permissions'. Note that this disables SELinux separation. ```bash docker run --security-opt=label=disable ... Podman run --security-opt=label=disable ... ``` -------------------------------- ### Run Container with Specific GPU UUID Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html Query the UUID of a specific GPU using `nvidia-smi` and then use this UUID with the `--gpus device=` option to launch a container with access to only that GPU. ```bash nvidia-smi -i 3 --query-gpu=uuid --format=csv uuid ``` ```bash docker run --gpus device=GPU-18a3e86f-4c0e-cd9f-59c3-55488c4b0c24 \ nvidia/cuda nvidia-smi ``` -------------------------------- ### Run CUDA Container on Specific GPUs using --gpus Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html Specify exact GPUs by their indices or UUIDs using the `--gpus` option with the `device` parameter. The device list should be encapsulated in single and double quotes. ```bash docker run --gpus '"device=1,2"' \ nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv uuid ``` -------------------------------- ### Configure Docker cgroup driver to cgroupfs Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html Update Docker's configuration to use `cgroupfs` as the cgroup driver to prevent containers from losing GPU access during `systemctl daemon-reload`. This setting is applied to `/etc/docker/daemon.json`. ```json { "exec-opts": ["native.cgroupdriver=cgroupfs"] } ``` -------------------------------- ### Restart Rootless Docker Daemon Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html Restart the rootless Docker daemon using the --user flag. ```bash systemctl --user restart docker ``` -------------------------------- ### Run CUDA Container on Specific GPUs using NVIDIA_VISIBLE_DEVICES Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html Use the `NVIDIA_VISIBLE_DEVICES` environment variable to specify a comma-separated list of GPU indices or UUIDs. Ensure the NVIDIA runtime is selected. ```bash docker run --rm --runtime=nvidia \ -e NVIDIA_VISIBLE_DEVICES=1,2 \ nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv uuid ``` -------------------------------- ### Run Container with CDI using Docker Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html Injects requested CDI devices into a Docker container by setting the `NVIDIA_VISIBLE_DEVICES` environment variable. This method is used with non-CDI-enabled runtimes by configuring the NVIDIA Container Runtime in `cdi` mode. ```bash docker run --rm -ti --runtime=nvidia \ -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all \ ubuntu nvidia-smi -L ``` -------------------------------- ### Set NVIDIA Driver Capabilities in Dockerfile Source: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html Set environment variables within a Dockerfile to pre-configure NVIDIA driver capabilities and visible devices for containers. This avoids needing to set them on the `docker run` command line. ```dockerfile ENV NVIDIA_VISIBLE_DEVICES all ENV NVIDIA_DRIVER_CAPABILITIES compute,utility ```