### Install and Run Codebase Embeddings Demo

Source: https://github.com/docker/model-runner/blob/main/demos/embeddings/README.md

Install dependencies and start the demo server. The server automatically downloads a pre-generated embeddings index on first run.

```bash
docker model pull ai/qwen3-embedding:0.6B-F16

cd demos/embeddings

npm install

npm start
```

--------------------------------

### Start the Demo Server

Source: https://github.com/docker/model-runner/blob/main/demos/extractor/README.md

Initiate the demo application's server process. This command starts the backend that serves the demo interface.

```bash
npm start
```

--------------------------------

### Handle Start Button Click

Source: https://github.com/docker/model-runner/blob/main/demos/multimodal/demo.html

Initiates the process when the start button is clicked, checking for camera availability.

```javascript
function handleStart() {
  if (!stream) {
    responseText.value = "Camera not available. Cannot start.";
    alert("Camera not available. ");
  }
}
```

--------------------------------

### Install Node.js Dependencies

Source: https://github.com/docker/model-runner/blob/main/demos/extractor/README.md

Install all necessary packages for the demo application using npm. This command should be run after navigating to the demo directory.

```bash
npm install
```

--------------------------------

### Start Docker Model Gateway

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md

Starts the gateway using a specified configuration file. The gateway will be accessible on `http://0.0.0.0:4000` by default.

```console
$ docker model gateway --config config.yaml
```

--------------------------------

### Install OpenAI Python Package

Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md

Install the `openai` Python package. This is required for interacting with the gateway using the OpenAI SDK in the demo.

```bash
pip install openai
```

--------------------------------

### Check Docker Installation

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Verify that Docker is installed and accessible. Ensure you are using Docker from official repositories for compatibility.

```bash
# Check if Docker is from official repositories
docker version
```

--------------------------------

### Install Docker Model Runner

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md

Install the Docker Model Runner. Use --gpu flags for GPU support or auto-detection.

```bash
./model-cli install-runner --gpu cuda
```

--------------------------------

### Install Open WebUI with Docker Model Runner

Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md

Add the Open WebUI Helm repository, update it, and install the chart with specific configurations to connect to Docker Model Runner.

```bash
# Add the Open WebUI Helm repository
helm repo add open-webui https://helm.openwebui.com/
helm repo update

# Install Open WebUI with auth diabled
# See the open-webui Helm chart for
# connecting to your auth provider.
helm upgrade --install --wait open-webui open-webui/open-webui \
  --set ollama.enabled=false \
  --set pipelines.enabled=false \
  --set extraEnvVars[0].name="WEBUI_AUTH" \
  --set-string extraEnvVars[0].value=false \
  --set openaiBaseApiUrl="http://docker-model-runner/engines/v1"
```

--------------------------------

### Start Gateway with Master API Key

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md

Starts the gateway with a master API key set via an environment variable. Clients must include this key in their requests.

```console
$ GATEWAY_API_KEY=my-secret docker model gateway --config config.yaml
```

--------------------------------

### Test Docker Model Runner Installation

Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md

Set up port-forwarding to the service and then test the model runner by running a sample model.

```bash
kubectl port-forward service/docker-model-runner-nodeport 31245:80
MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest
```

--------------------------------

### Install Docker Engine on Linux

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Installs Docker Engine on a Linux system and adds the current user to the docker group.

```bash
curl -fsSL https://get.docker.com | sudo bash
sudo usermod -aG docker $USER
```

--------------------------------

### Start Gateway with Custom Port

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md

Starts the gateway on port 8080, overriding the default port 4000. Ensure the specified port is not in use.

```console
$ docker model gateway --config config.yaml --port 8080
```

--------------------------------

### List available models using model-cli

Source: https://github.com/docker/model-runner/blob/main/README.md

Connects to a manually started model-runner server and lists the available models using the `model-cli` tool.

```bash
# List available models
MODEL_RUNNER_HOST=http://localhost:13434 ./cmd/cli/model-cli list
```

--------------------------------

### Enable Debug Logging

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md

Starts the gateway with verbose logging enabled. This is helpful for diagnosing issues by providing detailed output.

```console
$ docker model gateway --config config.yaml --verbose
```

--------------------------------

### Model Distribution Client Usage (Go)

Source: https://github.com/docker/model-runner/blob/main/pkg/distribution/README.md

Demonstrates the core functionalities of the Model Distribution client, including creating a client, pulling, getting, bundling, listing, deleting, tagging, and pushing models. Ensure the client is initialized with a store root path for local caching.

```go
import (
    "context"
    "github.com/docker/model-runner/pkg/distribution/distribution"
)

// Create a new client
client, err := distribution.NewClient(
    distribution.WithStoreRootPath("/path/to/cache"),
)
if err != nil {
    // Handle error
}

// Pull a model
err = client.PullModel(context.Background(), "registry.example.com/models/llama:v1.0", os.Stdout)
if err != nil {
    // Handle error
}

// Get a model
model, err := client.GetModel("registry.example.com/models/llama:v1.0")
if err != nil {
    // Handle error
}

// Create a bundle
bundle, err := client.GetBundle("registry.example.com/models/llama:v1.0")
if err != nil {
    // Handle error
}

// Get the GGUF file path within the bundle
modelPath, err := bundle.GGUFPath()
if err != nil {
    // Handle error
}

fmt.Println("Model path:", modelPath)

// List all models
models, err := client.ListModels()
if err != nil {
    // Handle error
}

// Delete a model
_, err = client.DeleteModel("registry.example.com/models/llama:v1.0", false)
if err != nil {
    // Handle error
}

// Tag a model
err = client.Tag("registry.example.com/models/llama:v1.0", "registry.example.com/models/llama:latest")
if err != nil {
    // Handle error
}

// Push a model
err = client.PushModel(context.Background(), "registry.example.com/models/llama:v1.0", nil)
if err != nil {
    // Handle error
}
```

--------------------------------

### Run Gateway Demo Script

Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md

Execute the end-to-end demo script for the `model-cli gateway`. This script starts the gateway, tests its features, and then shuts it down.

```bash
./demos/gateway/demo.sh
```

--------------------------------

### Basic Helm Configuration for Docker Model Runner

Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md

Example `values.yaml` for basic Docker Model Runner Helm deployment, including storage, image, and node port settings.

```yaml
# Storage configuration
storage:
  size: 100Gi
  storageClass: ""  # Set this to the storage class of your cloud provider.

# Model pre-pull configuration
modelInit:
  enabled: false
  models:
    - "ai/smollm2:latest"

# Image configuration
image:
  repository: docker/model-runner
  tag: "latest"  # Use 'latest-cuda' for NVIDIA or 'latest-rocm' for AMD GPUs
  pullPolicy: IfNotPresent

# GPU configuration
gpu:
  enabled: false
  vendor: nvidia  # or amd
  count: 1
  # For AMD GPUs, use 'latest-rocm' image tag

# NodePort configuration
nodePort:
  enabled: false
  port: 31245
```

--------------------------------

### Use Custom Host and Port

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md

Starts the gateway binding to a specific host address (`127.0.0.1`) and port (`9000`). This is useful for network isolation or avoiding port conflicts.

```console
$ docker model gateway --config config.yaml --host 127.0.0.1 --port 9000
```

--------------------------------

### Verify dmrlet Installation

Source: https://github.com/docker/model-runner/blob/main/README.md

Check if the dmrlet binary was built successfully and is executable by running its help command.

```bash
# Verify it works
./dmrlet --help
```

--------------------------------

### Check Docker Version

Source: https://github.com/docker/model-runner/blob/main/README.md

Displays the installed Docker version.

```bash
# Check Docker version
docker version
```

--------------------------------

### Serve a Model with dmrlet

Source: https://github.com/docker/model-runner/blob/main/README.md

Start serving an AI model using dmrlet. It automatically detects the backend and available GPUs for seamless deployment.

```bash
# Auto-detect backend and GPUs
dmrlet serve gemma3
```

--------------------------------

### Basic Gateway Configuration

Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md

Example of a basic gateway configuration file. It defines a single provider with two models and sets up bearer-token authentication.

```yaml
model_list:
  # Alias the client uses       Provider / actual model on DMR
  - model_name: fast-model
    params:
      model: docker_model_runner/ai/smollm2

  # Second entry with same alias → round-robin load balancing
  - model_name: fast-model
    params:
      model: docker_model_runner/ai/qwen3:0.6B-Q4_0

  - model_name: big-model
    params:
      model: docker_model_runner/ai/gemma3

general_settings:
  master_key: demo-secret   # Bearer token required on all requests
  num_retries: 2            # retry up to 2 times before fallback
  fallbacks:
    - fast-model: [big-model]   # automatic fallback chain
```

--------------------------------

### Reinstall Docker

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Reinstall Docker using the official installation script. This can resolve issues related to incorrect installations or missing components.

```bash
# Reinstall from official repository if needed
curl -fsSL https://get.docker.com | sudo bash
```

--------------------------------

### Troubleshoot Docker Installation Source

Source: https://github.com/docker/model-runner/blob/main/README.md

Commands to check Docker and Docker Model Runner versions to identify if the installation source is from the distribution or Docker's official repository.

```bash
# Check Docker version
docker version

# Check Docker Model Runner version
docker model version
```

--------------------------------

### Navigate to Demo Directory

Source: https://github.com/docker/model-runner/blob/main/demos/extractor/README.md

Change your current directory to the extractor demo's location within the project. This is a prerequisite for installing dependencies.

```bash
cd demos/extractor
```

--------------------------------

### Verify Docker Model Runner Installation

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Checks if the Docker Model Runner CLI is installed and accessible by displaying its help information and version.

```bash
docker model --help
docker model version
```

--------------------------------

### Start model-runner server manually

Source: https://github.com/docker/model-runner/blob/main/README.md

Starts the model-runner server in a terminal, specifying a custom port to avoid conflicts with Docker Desktop's default port.

```bash
MODEL_RUNNER_PORT=13434 ./model-runner
```

--------------------------------

### Run a Model using `dmr` convenience wrapper

Source: https://github.com/docker/model-runner/blob/main/README.md

Executes an AI model using the `dmr` convenience wrapper, which starts the server, runs the command, and then shuts down the server.

```bash
./dmr run ai/smollm2 "Hello, how are you?"
```

--------------------------------

### Run Docker Model for One-Time Prompt

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md

Execute a model to process a single prompt and get a response.

```bash
docker model run ai/smollm2 "Your prompt here"
```

--------------------------------

### Example Aggregated Metrics Output

Source: https://github.com/docker/model-runner/blob/main/METRICS.md

This is an example of the Prometheus-compatible metrics output from the aggregated /metrics endpoint. It includes metrics like total prompt tokens, generation tokens, and requests, each labeled with backend, model, and mode.

```prometheus
# HELP llama_prompt_tokens_total Total number of prompt tokens processed
# TYPE llama_prompt_tokens_total counter
llama_prompt_tokens_total{backend="llama.cpp",model="llama3.2:latest",mode="completion"} 4934
llama_prompt_tokens_total{backend="llama.cpp",model="ai/mxbai-embed-large:335M-F16",mode="embedding"} 4525

# HELP llama_generation_tokens_total Total number of tokens generated
# TYPE llama_generation_tokens_total counter
llama_generation_tokens_total{backend="llama.cpp",model="llama3.2:latest",mode="completion"} 2156

# HELP llama_requests_total Total number of requests processed
# TYPE llama_requests_total counter
llama_requests_total{backend="llama.cpp",model="llama3.2:latest",mode="completion"} 127
llama_requests_total{backend="llama.cpp",model="ai/mxbai-embed-large:335M-F16",mode="embedding"} 89
```

--------------------------------

### Run Docker Container with Custom Port and Model Path

Source: https://github.com/docker/model-runner/blob/main/README.md

Starts the application in a Docker container, allowing customization of the TCP port and the host path for persistent model storage. The specified `MODELS_PATH` will be mounted into the container.

```sh
# Customize port and model storage location
make docker-run PORT=3000 MODELS_PATH=/path/to/your/models
```

--------------------------------

### Run model for interactive chat with Docker

Source: https://github.com/docker/model-runner/blob/main/README.md

Start an interactive chat session with a model using the `docker model run` command. Type `/bye` to exit the session.

```bash
docker model run nvcr.io/nim/google/gemma-3-1b-it:latest
> Tell me a joke
...
> /bye
```

--------------------------------

### Start an interactive chat session with a model

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_run.md

Initiate an interactive chat session with a specified model. This allows for multi-turn conversations until the session is terminated.

```console
docker model run ai/smollm2
```

--------------------------------

### Run an AI Model with Docker Model Runner

Source: https://github.com/docker/model-runner/blob/main/README.md

Tests the full Docker Model Runner setup by running a specified AI model with a given input string.

```bash
# Run a model to test the full setup
docker model run ai/gemma3 "Hello"
```

--------------------------------

### Pull a Model using Docker

Source: https://github.com/docker/model-runner/blob/main/demos/extractor/demo.html

Use this command to download AI models from Docker Hub. Ensure you have Docker installed and configured.

```bash
docker model pull <model-name>
```

--------------------------------

### Pull and run a model using model-cli

Source: https://github.com/docker/model-runner/blob/main/README.md

Connects to a manually started model-runner server and pulls/runs a specified AI model with an input string using the `model-cli` tool.

```bash
# Pull and run a model
MODEL_RUNNER_HOST=http://localhost:13434 ./cmd/cli/model-cli run ai/smollm2 "Hello, how are you?"
```

--------------------------------

### List all available models via API

Source: https://github.com/docker/model-runner/blob/main/README.md

Use this curl command to get a list of all models accessible through the Model Runner API.

```sh
# List all available models
curl http://localhost:8080/models
```

--------------------------------

### Send Request with API Key

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md

Example of sending a request to a secured gateway. The API key must be provided either as a Bearer token or in the `x-api-key` header.

```console
$ curl http://localhost:4000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "smollm2", "messages": [{"role": "user", "content": "Hi"}]}'
```

--------------------------------

### Verify Docker CLI Plugin Availability

Source: https://github.com/docker/model-runner/blob/main/README.md

Checks if the Docker Model Runner CLI plugin is installed and available by running the `docker model --help` command.

```bash
# Check if the Docker CLI plugin is available
docker model --help
```

--------------------------------

### Use OpenAI Python Library for Chat Completions

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md

Programmatically interact with the Docker Model Runner's OpenAI-compatible API using the Python client library. Ensure the `openai` library is installed.

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:12434/engines/llama.cpp/v1",
    api_key="not-needed"  # API key not required for local inference
)

response = client.chat.completions.create(
    model="ai/smollm2",
    messages=[{"role": "user", "content": "Hello!"}]
)
```

--------------------------------

### JavaScript/TypeScript OpenAI Client

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Example of making a chat completion request using the OpenAI library in JavaScript or TypeScript. Set the `baseURL` to point to your Docker Model Runner instance.

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:12434/engines/llama.cpp/v1',
  apiKey: 'not-needed'
});

const response = await client.chat.completions.create({
  model: 'ai/smollm2',
  messages: [{ role: 'user', content: 'Hello!' }]
});

console.log(response.choices[0].message.content);
```

--------------------------------

### Build model-cli

Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md

Build the `model-cli` binary in release mode. This is a prerequisite for running the gateway demo.

```bash
cd model-cli && cargo build --release
```

--------------------------------

### Python OpenAI Library Chat Completion

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Example of performing a chat completion using the Python OpenAI library. Configure the client with the base URL of the Docker Model Runner.

```python
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:12434/engines/llama.cpp/v1",
    api_key="not-needed"
)

# Chat completion
response = client.chat.completions.create(
    model="ai/smollm2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)
```

--------------------------------

### Initialize Camera Access

Source: https://github.com/docker/model-runner/blob/main/demos/multimodal/demo.html

Requests camera permissions and sets up the video stream. Displays success or error messages to the user.

```javascript
// 1. Ask for camera permission on load
async function initCamera() {
  try {
    stream = await navigator.mediaDevices.getUserMedia({ video: true, audio: false });
    video.srcObject = stream;
    responseText.value = "Camera access granted. Ready to start.";
  } catch (err) {
    console.error("Error accessing camera:", err);
    responseText.value = `Error accessing camera: ${err.name} - ${err.message}. Please ensure permissions are granted and you are on HTTPS or localhost.`;
    alert(`Error accessing camera: ${err.name}. Make sure you've granted permission and are on HTTPS or localhost.`);
  }
}
```

--------------------------------

### Model Runner API Response Example

Source: https://github.com/docker/model-runner/blob/main/README.md

This is an example of the JSON response you might receive from the Model Runner API, detailing a chat completion.

```json
{
  "id": "chat-12345",
  "object": "chat.completion",
  "created": 1682456789,
  "model": "ai/smollm2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm doing well, thank you for asking! How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 16,
    "total_tokens": 40
  }
}
```

--------------------------------

### Run llama-server

Source: https://github.com/docker/model-runner/blob/main/llamacpp/native/README.md

Execute the compiled llama-server binary, providing the path to the model file.

```bash
./build/bin/com.docker.llama-server --model <path to model>
```

--------------------------------

### Get metrics

Source: https://github.com/docker/model-runner/blob/main/README.md

Retrieves operational metrics for the Model Runner.

```APIDOC
## GET /metrics

### Description
Retrieves operational metrics for the Model Runner service.

### Method
GET

### Endpoint
/metrics

### Response
#### Success Response (200)
- (type) - Description of the response body containing various metrics.
```

--------------------------------

### Get Model Info

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Retrieves detailed information about a specific model.

```APIDOC
## GET /models/{model}

### Description
Retrieves detailed information about a specific model.

### Method
GET

### Endpoint
/models/{model}

### Parameters
#### Path Parameters
- **model** (string) - Required - The ID of the model to retrieve.

### Response
#### Success Response (200)
- **id** (string) - The unique identifier of the model.
- **object** (string) - The type of object (e.g., 'model').
- **owned_by** (string) - The owner of the model.

#### Response Example
```json
{
  "id": "ai/smollm2",
  "object": "model",
  "owned_by": "local"
}
```
```

--------------------------------

### Clone Repository and Build CLI

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md

Clone the model-cli repository and build the command-line interface.

```bash
git clone https://github.com/docker/model-cli.git
cd model-cli
make build
```

--------------------------------

### Build llama-server with CMake

Source: https://github.com/docker/model-runner/blob/main/llamacpp/native/README.md

Use CMake to configure and build the llama-server binary. Specify parallel build jobs for faster compilation.

```bash
cmake -B build
cmake --build build --parallel 8 --config Release
```

--------------------------------

### Access Open WebUI

Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md

Set up port-forwarding for the Open WebUI service and access it via your browser.

```bash
kubectl port-forward service/open-webui 8080:80
```

--------------------------------

### Get metrics via API

Source: https://github.com/docker/model-runner/blob/main/README.md

Use this curl command to retrieve performance metrics from the Model Runner.

```sh
# Get metrics
curl http://localhost:8080/metrics
```

--------------------------------

### Discover Docker Models: Search Hub, HuggingFace, Specific Source

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Shows how to search for AI models available on Docker Hub and HuggingFace, with options to specify the search source.

```bash
# Search Docker Hub
docker model search llama

# Search HuggingFace
docker model search hf.co/bartowski

# Search with specific source
docker model search --source dockerhub llama
```

--------------------------------

### Check Docker Model Runner Version

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md

Verify if Docker Model Runner is installed and accessible by checking its version.

```bash
docker model version
```

--------------------------------

### Build Docker Image with vLLM

Source: https://github.com/docker/model-runner/blob/main/README.md

Builds the Docker image with vLLM support using default settings. Ensure you have the necessary build environment set up.

```sh
# Build with default settings (vLLM 0.19.1)
make docker-build DOCKER_TARGET=final-vllm BASE_IMAGE=nvidia/cuda:13.0.2-runtime-ubuntu24.04 LLAMA_SERVER_VARIANT=cuda
```

--------------------------------

### Gateway Health Check

Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md

Check the health status of the `model-cli` gateway. This is a simple GET request to the `/health` endpoint.

```bash
GW="http://localhost:4000"
KEY="demo-secret"

# Health
curl "${GW}/health"
```

--------------------------------

### Generate Documentation

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md

Generate documentation for the model-cli project.

```bash
make docs
```

--------------------------------

### Helm Configuration for Model Pre-pulling

Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md

Configure models to be pre-pulled during pod initialization by enabling `modelInit` and listing the desired models.

```yaml
modelInit:
  enabled: true
  models:
    - "ai/smollm2:latest"
    - "ai/llama3.2:latest"
    - "ai/mistral:latest"
```

--------------------------------

### Get information about a specific model via API

Source: https://github.com/docker/model-runner/blob/main/README.md

Use this curl command to retrieve detailed information about a particular model.

```sh
# Get information about a specific model
curl http://localhost:8080/models/ai/smollm2
```

--------------------------------

### Deploy Docker Model Runner on Docker Desktop

Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md

Apply the desktop-specific manifest and wait for the deployment to become available. Then, run a model using the specified host.

```bash
kubectl apply -f static/docker-model-runner-desktop.yaml
kubectl wait --for=condition=Available deployment/docker-model-runner --timeout=5m
MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest
```

--------------------------------

### List Available Docker Models

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md

View a list of models that have already been downloaded and are available for use with Docker Model Runner.

```bash
docker model list
```

--------------------------------

### List Models using `dmr` convenience wrapper

Source: https://github.com/docker/model-runner/blob/main/README.md

Lists available models using the `dmr` convenience wrapper.

```bash
./dmr ls
```

--------------------------------

### Manage Docker Model Runner Service

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Commands to control the lifecycle of the Docker Model Runner service, including starting, stopping, and restarting it.

```bash
# Start the runner
docker model start-runner

# Stop the runner
docker model stop-runner

# Restart the runner
docker model restart-runner
```

--------------------------------

### Initialize DOM Elements and Variables

Source: https://github.com/docker/model-runner/blob/main/demos/multimodal/demo.html

Selects all necessary DOM elements and initializes global variables for the application's state, including default instruction text and the recommended model.

```javascript
const video = document.getElementById('videoFeed');
const canvas = document.getElementById('canvas');
const baseURL = document.getElementById('baseURL');
const modelSelect = document.getElementById('modelSelect');
const modelWarning = document.getElementById('modelWarning');
const modelInfo = document.getElementById('modelInfo');
const instructionText = document.getElementById('instructionText');
const responseText = document.getElementById('responseText');
const intervalSelect = document.getElementById('intervalSelect');
const startButton = document.getElementById('startButton');

instructionText.value = "What do you see?"; // default instruction

let stream;
let intervalId;
let isProcessing = false;
let isWaitingForResponse = false;

const RECOMMENDED_MODEL = 'ai/smolvlm:500M-Q8_0'; // Default model
```

--------------------------------

### Remove Distro Docker Version (Ubuntu/Debian)

Source: https://github.com/docker/model-runner/blob/main/README.md

Removes the Docker, containerd, and runc packages that might have been installed from a Linux distribution's repositories.

```bash
# Remove distro version (Ubuntu/Debian example)
sudo apt-get purge docker docker.io containerd runc
```

--------------------------------

### Inspect Model Configuration

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

View the current configuration settings for a specific model. This is useful for verifying settings after configuration.

```bash
# View configuration
docker model inspect ai/smollm2
```

--------------------------------

### Pull a General-Purpose AI Model

Source: https://github.com/docker/model-runner/blob/main/demos/extractor/README.md

Use this command to download a suitable AI model for text extraction from Docker Hub. Ensure you have Docker installed and configured.

```bash
docker model pull ai/gemma3
```

--------------------------------

### Package and Push Model

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md

Package a GGUF model and push it to a target registry. Options for license and context size are available.

```bash
./model-cli package --gguf <path> --push <target>
```

--------------------------------

### Send Request to Gateway

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md

Example of sending a chat completion request to the gateway using an OpenAI-compatible client. The `model` field should match a `model_name` defined in your configuration.

```console
$ curl http://localhost:4000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "smollm2",
      "messages": [{"role": "user", "content": "Hello"}]
    }'
```

--------------------------------

### List Available Models

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_search.md

Lists all available models from Docker Hub when no search term is provided.

```bash
docker model search
```

--------------------------------

### Build nv-gpu-info Executable

Source: https://github.com/docker/model-runner/blob/main/llamacpp/native/src/nv-gpu-info/CMakeLists.txt

Adds an executable target named 'com.docker.nv-gpu-info' and links it with the previously defined nvapi library. This compiles the native GPU information utility.

```cmake
set(TARGET com.docker.nv-gpu-info)

add_executable(${TARGET} nv-gpu-info.c)
install(TARGETS ${TARGET} RUNTIME)

target_link_libraries(${TARGET} nvapi)
```

--------------------------------

### Build Docker Model Runner from Source

Source: https://github.com/docker/model-runner/blob/main/README.md

Builds the complete Docker Model Runner stack, including the server, CLI plugin, and a `dmr` convenience wrapper, using the provided Makefile.

```bash
make
```

--------------------------------

### Check Docker Model Runner Pod Logs

Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md

Stream the logs of the Docker Model Runner deployment to troubleshoot startup issues.

```bash
kubectl logs -f deployment/docker-model-runner
```

--------------------------------

### Build Docker Image for Multi-Architecture Support with vLLM

Source: https://github.com/docker/model-runner/blob/main/README.md

Builds the Docker image with vLLM support for multiple architectures (amd64 and arm64), automatically selecting appropriate prebuilt wheels.

```sh
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --target final-vllm \
  --build-arg BASE_IMAGE=nvidia/cuda:13.0.2-runtime-ubuntu24.04 \
  --build-arg LLAMA_SERVER_VARIANT=cuda \
  -t docker/model-runner:vllm .
```

--------------------------------

### Run model for single prompt with Docker

Source: https://github.com/docker/model-runner/blob/main/README.md

Execute a model with a single prompt using the `docker model run` command. Ensure the model image is correctly specified.

```bash
docker model run nvcr.io/nim/google/gemma-3-1b-it:latest "Explain quantum computing"
```

--------------------------------

### Run a model with a one-time prompt

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_run.md

Use this command to send a single prompt to a model and receive a response. The model is loaded, the prompt is processed, and the output is displayed.

```console
docker model run ai/smollm2 "Hi"
```

--------------------------------

### Makefile Commands for Docker Model Runner

Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/CONTRIBUTING.md

Common commands for managing the Docker Model Runner chart using the Makefile. These include rendering YAML, installing, upgrading, and uninstalling the chart.

```bash
# Render to plain Kubernetes YAML
make render
```

```bash
# Install the chart
make install
```

```bash
# Upgrade the chart
make upgrade
```

```bash
# Uninstall the chart
make uninstall
```

--------------------------------

### Configure Gateway for Multiple Providers with Fallback

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md

This configuration sets up multiple LLM providers (Groq, OpenAI, Docker Model Runner) and defines fallback strategies. If the primary provider fails, requests will be routed to the specified fallbacks.

```yaml
model_list:
  - model_name: fast
    params:
      model: groq/llama-3.1-8b-instant
      api_key: os.environ/GROQ_API_KEY
  - model_name: smart
    params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
  - model_name: local
    params:
      model: docker_model_runner/ai/smollm2
      api_base: http://localhost:12434/engines/llama.cpp/v1

general_settings:
  num_retries: 2
  fallbacks:
    - fast: [local]
    - smart: [fast, local]
```

--------------------------------

### Inspect Docker Model Details

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md

Retrieve detailed information and metadata about a specific downloaded model.

```bash
docker model inspect <model>
```

--------------------------------

### Rebuild Search Index

Source: https://github.com/docker/model-runner/blob/main/demos/embeddings/index.html

Initiates the process of rebuilding the search index. It prompts the user for confirmation due to the potentially long duration of the operation and updates the UI to indicate the indexing process has started.

```javascript
async function rebuildIndex() {
    if (!confirm('Rebuilding the index may take several minutes. Continue?')) {
        return;
    }

    const rebuildBtn = document.getElementById('rebuildBtn');
    rebuildBtn.disabled = true;
    rebuildBtn.textContent = 'Indexing...';
    showInfo('Indexing started. This may take several minutes. 
```

--------------------------------

### Run Docker Model with Optional Prompt

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md

Execute a specified model, optionally providing a prompt for immediate inference.

```bash
docker model run <model> [prompt]
```

--------------------------------

### Run Docker Models: Interactive, Single Prompt, Detached, Debug

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Demonstrates various ways to run AI models using Docker Model Runner, including interactive chat, single-line prompts, pre-loading for faster requests, and enabling debug logging.

```bash
# Interactive chat mode
docker model run ai/smollm2

# Single prompt
docker model run ai/smollm2 "Explain Docker in one sentence"

# Pre-load model for faster subsequent requests
docker model run --detach ai/smollm2

# With debug logging
docker model run --debug ai/smollm2 "Hello"
```

--------------------------------

### JavaScript Event Listeners for Model Runner

Source: https://github.com/docker/model-runner/blob/main/demos/multimodal/demo.html

Handles starting and stopping model processing based on button clicks. Initializes camera and fetches models on page load. Also includes cleanup for the camera stream and intervals on page unload.

```javascript
function handleStart() {
  if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) {
    alert("getUserMedia not supported on your browser!");
    return;
  }
  // Check if permission was already granted
  if (!hasPermission) { // Assuming hasPermission is a boolean flag set elsewhere
    alert("Please grant permission first.");
    return;
  }
  isProcessing = true;
  startButton.textContent = "Stop";
  startButton.classList.remove('start');
  startButton.classList.add('stop');
  instructionText.disabled = true;
  intervalSelect.disabled = true;
  responseText.value = "Processing started...";
  const intervalMs = parseInt(intervalSelect.value, 10);
  // Initial immediate call
  sendData();
  // Then set interval
  intervalId = setInterval(sendData, intervalMs);
}

function handleStop() {
  isProcessing = false;
  if (intervalId) {
    clearInterval(intervalId);
    intervalId = null;
  }
  startButton.textContent = "Start";
  startButton.classList.remove('stop');
  startButton.classList.add('start');
  instructionText.disabled = false;
  intervalSelect.disabled = false;
  if (responseText.value.startsWith("Processing started...")) {
    responseText.value = "Processing stopped.";
  }
}

startButton.addEventListener('click', () => {
  if (isProcessing) {
    handleStop();
  } else {
    handleStart();
  }
});

// Initialize camera and fetch models when the page loads
window.addEventListener('DOMContentLoaded', () => {
  initCamera();
  fetchModels();
});

// Optional: Stop stream when page is closed/navigated away to release camera
window.addEventListener('beforeunload', () => {
  if (stream) {
    stream.getTracks().forEach(track => track.stop());
  }
  if (intervalId) {
    clearInterval(intervalId);
  }
});
```

--------------------------------

### Run Unit Tests

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md

Execute unit tests for the model-cli project.

```bash
make unit-tests
```

--------------------------------

### Direct Docker Build with Resolved Upstream Image

Source: https://github.com/docker/model-runner/blob/main/README.md

Demonstrates how to use `docker buildx build` directly, passing a fully resolved upstream image for llama.cpp. This is an alternative to using the `make docker-build` target for advanced customization.

```sh
docker buildx build \
  --target final-llamacpp \
  --build-arg LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b8840 \
  -t docker/model-runner:llama-b8840 .
```

--------------------------------

### List Available Models via Gateway

Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md

Retrieve a list of available models exposed by the gateway. Requires a valid Authorization header.

```bash
GW="http://localhost:4000"
KEY="demo-secret"

# List models
curl -H "Authorization: Bearer ${KEY}" "${GW}/v1/models"
```

--------------------------------

### Push a model to a registry

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_push.md

Use this command to push your model to a specified namespace in a container registry. Ensure you have authenticated with the registry beforehand.

```console
docker model push <namespace>/<model>
```

--------------------------------

### Build model-runner Docker image

Source: https://github.com/docker/model-runner/blob/main/README.md

Builds the Docker image for the model-runner service.

```bash
cd model-runner
make docker-build
```

--------------------------------

### Go OpenAI Client Chat Completion

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

This Go code snippet shows how to create a chat completion request using the `go-openai` library. Configure the client's `BaseURL` to connect to the Docker Model Runner.

```go
package main

import (
    "context"
    "fmt"
    "github.com/sashabaranov/go-openai"
)

func main() {
    config := openai.DefaultConfig("not-needed")
    config.BaseURL = "http://localhost:12434/engines/llama.cpp/v1"
    client := openai.NewClientWithConfig(config)

    resp, err := client.CreateChatCompletion(
        context.Background(),
        openai.ChatCompletionRequest{
            Model: "ai/smollm2",
            Messages: []openai.ChatCompletionMessage{
                {Role: "user", Content: "Hello!"},
            },
        },
    )
    if err != nil {
        panic(err)
    }
    fmt.Println(resp.Choices[0].Message.Content)
}
```

--------------------------------

### JavaScript Event Listeners for Initialization

Source: https://github.com/docker/model-runner/blob/main/demos/extractor/demo.html

Sets up event listeners for DOMContentLoaded to initialize the application and for changes in the base URL to refresh available models.

```javascript
// Initialize on page load
window.addEventListener('DOMContentLoaded', () => {
  loadInvoiceSchema();
  fetchModels();
});

// Refresh models when base URL changes
document.getElementById('baseUrl').addEventListener('change', fetchModels);
```

--------------------------------

### Search for Docker Models

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md

Find models available on Docker Hub or HuggingFace using a search query.

```bash
docker model search <query>
```

--------------------------------

### Initialize llama.cpp Submodule

Source: https://github.com/docker/model-runner/blob/main/llamacpp/native/README.md

Ensure the llama.cpp git submodule is initialized and updated. This command must be run from the project root directory.

```bash
git submodule update --init --recursive
```

--------------------------------

### Pull a Docker Model

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md

Download a specific AI model to your local machine before running it.

```bash
docker model pull <model>
```

--------------------------------

### Configure Model Parameters

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md

Configure specific parameters for a model, such as context size. Use this to fine-tune model behavior for your tasks.

```bash
# Configure model parameters
docker model configure ai/smollm2 --ctx-size 4096
```

--------------------------------

### Run Model with Prompt

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md

Execute a model with a specific prompt using the Docker Model CLI.

```bash
./model-cli run llama.cpp "What is the capital of France?"
```

--------------------------------

### Show Running Docker Models

Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md

View a list of models that are currently active and running.

```bash
docker model ps
```

--------------------------------

### Run Docker Container with Default Settings

Source: https://github.com/docker/model-runner/blob/main/README.md

Executes the application within a Docker container using default configurations for port and model storage. The `models` directory will be created in the current working directory and mounted into the container.

```sh
make docker-run
```