### Install and Run Codebase Embeddings Demo Source: https://github.com/docker/model-runner/blob/main/demos/embeddings/README.md Install dependencies and start the demo server. The server automatically downloads a pre-generated embeddings index on first run. ```bash docker model pull ai/qwen3-embedding:0.6B-F16 cd demos/embeddings npm install npm start ``` -------------------------------- ### Start the Demo Server Source: https://github.com/docker/model-runner/blob/main/demos/extractor/README.md Initiate the demo application's server process. This command starts the backend that serves the demo interface. ```bash npm start ``` -------------------------------- ### Handle Start Button Click Source: https://github.com/docker/model-runner/blob/main/demos/multimodal/demo.html Initiates the process when the start button is clicked, checking for camera availability. ```javascript function handleStart() { if (!stream) { responseText.value = "Camera not available. Cannot start."; alert("Camera not available. "); } } ``` -------------------------------- ### Install Node.js Dependencies Source: https://github.com/docker/model-runner/blob/main/demos/extractor/README.md Install all necessary packages for the demo application using npm. This command should be run after navigating to the demo directory. ```bash npm install ``` -------------------------------- ### Start Docker Model Gateway Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md Starts the gateway using a specified configuration file. The gateway will be accessible on `http://0.0.0.0:4000` by default. ```console $ docker model gateway --config config.yaml ``` -------------------------------- ### Install OpenAI Python Package Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md Install the `openai` Python package. This is required for interacting with the gateway using the OpenAI SDK in the demo. ```bash pip install openai ``` -------------------------------- ### Check Docker Installation Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Verify that Docker is installed and accessible. Ensure you are using Docker from official repositories for compatibility. ```bash # Check if Docker is from official repositories docker version ``` -------------------------------- ### Install Docker Model Runner Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md Install the Docker Model Runner. Use --gpu flags for GPU support or auto-detection. ```bash ./model-cli install-runner --gpu cuda ``` -------------------------------- ### Install Open WebUI with Docker Model Runner Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md Add the Open WebUI Helm repository, update it, and install the chart with specific configurations to connect to Docker Model Runner. ```bash # Add the Open WebUI Helm repository helm repo add open-webui https://helm.openwebui.com/ helm repo update # Install Open WebUI with auth diabled # See the open-webui Helm chart for # connecting to your auth provider. helm upgrade --install --wait open-webui open-webui/open-webui \ --set ollama.enabled=false \ --set pipelines.enabled=false \ --set extraEnvVars[0].name="WEBUI_AUTH" \ --set-string extraEnvVars[0].value=false \ --set openaiBaseApiUrl="http://docker-model-runner/engines/v1" ``` -------------------------------- ### Start Gateway with Master API Key Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md Starts the gateway with a master API key set via an environment variable. Clients must include this key in their requests. ```console $ GATEWAY_API_KEY=my-secret docker model gateway --config config.yaml ``` -------------------------------- ### Test Docker Model Runner Installation Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md Set up port-forwarding to the service and then test the model runner by running a sample model. ```bash kubectl port-forward service/docker-model-runner-nodeport 31245:80 MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest ``` -------------------------------- ### Install Docker Engine on Linux Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Installs Docker Engine on a Linux system and adds the current user to the docker group. ```bash curl -fsSL https://get.docker.com | sudo bash sudo usermod -aG docker $USER ``` -------------------------------- ### Start Gateway with Custom Port Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md Starts the gateway on port 8080, overriding the default port 4000. Ensure the specified port is not in use. ```console $ docker model gateway --config config.yaml --port 8080 ``` -------------------------------- ### List available models using model-cli Source: https://github.com/docker/model-runner/blob/main/README.md Connects to a manually started model-runner server and lists the available models using the `model-cli` tool. ```bash # List available models MODEL_RUNNER_HOST=http://localhost:13434 ./cmd/cli/model-cli list ``` -------------------------------- ### Enable Debug Logging Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md Starts the gateway with verbose logging enabled. This is helpful for diagnosing issues by providing detailed output. ```console $ docker model gateway --config config.yaml --verbose ``` -------------------------------- ### Model Distribution Client Usage (Go) Source: https://github.com/docker/model-runner/blob/main/pkg/distribution/README.md Demonstrates the core functionalities of the Model Distribution client, including creating a client, pulling, getting, bundling, listing, deleting, tagging, and pushing models. Ensure the client is initialized with a store root path for local caching. ```go import ( "context" "github.com/docker/model-runner/pkg/distribution/distribution" ) // Create a new client client, err := distribution.NewClient( distribution.WithStoreRootPath("/path/to/cache"), ) if err != nil { // Handle error } // Pull a model err = client.PullModel(context.Background(), "registry.example.com/models/llama:v1.0", os.Stdout) if err != nil { // Handle error } // Get a model model, err := client.GetModel("registry.example.com/models/llama:v1.0") if err != nil { // Handle error } // Create a bundle bundle, err := client.GetBundle("registry.example.com/models/llama:v1.0") if err != nil { // Handle error } // Get the GGUF file path within the bundle modelPath, err := bundle.GGUFPath() if err != nil { // Handle error } fmt.Println("Model path:", modelPath) // List all models models, err := client.ListModels() if err != nil { // Handle error } // Delete a model _, err = client.DeleteModel("registry.example.com/models/llama:v1.0", false) if err != nil { // Handle error } // Tag a model err = client.Tag("registry.example.com/models/llama:v1.0", "registry.example.com/models/llama:latest") if err != nil { // Handle error } // Push a model err = client.PushModel(context.Background(), "registry.example.com/models/llama:v1.0", nil) if err != nil { // Handle error } ``` -------------------------------- ### Run Gateway Demo Script Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md Execute the end-to-end demo script for the `model-cli gateway`. This script starts the gateway, tests its features, and then shuts it down. ```bash ./demos/gateway/demo.sh ``` -------------------------------- ### Basic Helm Configuration for Docker Model Runner Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md Example `values.yaml` for basic Docker Model Runner Helm deployment, including storage, image, and node port settings. ```yaml # Storage configuration storage: size: 100Gi storageClass: "" # Set this to the storage class of your cloud provider. # Model pre-pull configuration modelInit: enabled: false models: - "ai/smollm2:latest" # Image configuration image: repository: docker/model-runner tag: "latest" # Use 'latest-cuda' for NVIDIA or 'latest-rocm' for AMD GPUs pullPolicy: IfNotPresent # GPU configuration gpu: enabled: false vendor: nvidia # or amd count: 1 # For AMD GPUs, use 'latest-rocm' image tag # NodePort configuration nodePort: enabled: false port: 31245 ``` -------------------------------- ### Use Custom Host and Port Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md Starts the gateway binding to a specific host address (`127.0.0.1`) and port (`9000`). This is useful for network isolation or avoiding port conflicts. ```console $ docker model gateway --config config.yaml --host 127.0.0.1 --port 9000 ``` -------------------------------- ### Verify dmrlet Installation Source: https://github.com/docker/model-runner/blob/main/README.md Check if the dmrlet binary was built successfully and is executable by running its help command. ```bash # Verify it works ./dmrlet --help ``` -------------------------------- ### Check Docker Version Source: https://github.com/docker/model-runner/blob/main/README.md Displays the installed Docker version. ```bash # Check Docker version docker version ``` -------------------------------- ### Serve a Model with dmrlet Source: https://github.com/docker/model-runner/blob/main/README.md Start serving an AI model using dmrlet. It automatically detects the backend and available GPUs for seamless deployment. ```bash # Auto-detect backend and GPUs dmrlet serve gemma3 ``` -------------------------------- ### Basic Gateway Configuration Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md Example of a basic gateway configuration file. It defines a single provider with two models and sets up bearer-token authentication. ```yaml model_list: # Alias the client uses Provider / actual model on DMR - model_name: fast-model params: model: docker_model_runner/ai/smollm2 # Second entry with same alias → round-robin load balancing - model_name: fast-model params: model: docker_model_runner/ai/qwen3:0.6B-Q4_0 - model_name: big-model params: model: docker_model_runner/ai/gemma3 general_settings: master_key: demo-secret # Bearer token required on all requests num_retries: 2 # retry up to 2 times before fallback fallbacks: - fast-model: [big-model] # automatic fallback chain ``` -------------------------------- ### Reinstall Docker Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Reinstall Docker using the official installation script. This can resolve issues related to incorrect installations or missing components. ```bash # Reinstall from official repository if needed curl -fsSL https://get.docker.com | sudo bash ``` -------------------------------- ### Troubleshoot Docker Installation Source Source: https://github.com/docker/model-runner/blob/main/README.md Commands to check Docker and Docker Model Runner versions to identify if the installation source is from the distribution or Docker's official repository. ```bash # Check Docker version docker version # Check Docker Model Runner version docker model version ``` -------------------------------- ### Navigate to Demo Directory Source: https://github.com/docker/model-runner/blob/main/demos/extractor/README.md Change your current directory to the extractor demo's location within the project. This is a prerequisite for installing dependencies. ```bash cd demos/extractor ``` -------------------------------- ### Verify Docker Model Runner Installation Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Checks if the Docker Model Runner CLI is installed and accessible by displaying its help information and version. ```bash docker model --help docker model version ``` -------------------------------- ### Start model-runner server manually Source: https://github.com/docker/model-runner/blob/main/README.md Starts the model-runner server in a terminal, specifying a custom port to avoid conflicts with Docker Desktop's default port. ```bash MODEL_RUNNER_PORT=13434 ./model-runner ``` -------------------------------- ### Run a Model using `dmr` convenience wrapper Source: https://github.com/docker/model-runner/blob/main/README.md Executes an AI model using the `dmr` convenience wrapper, which starts the server, runs the command, and then shuts down the server. ```bash ./dmr run ai/smollm2 "Hello, how are you?" ``` -------------------------------- ### Run Docker Model for One-Time Prompt Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md Execute a model to process a single prompt and get a response. ```bash docker model run ai/smollm2 "Your prompt here" ``` -------------------------------- ### Example Aggregated Metrics Output Source: https://github.com/docker/model-runner/blob/main/METRICS.md This is an example of the Prometheus-compatible metrics output from the aggregated /metrics endpoint. It includes metrics like total prompt tokens, generation tokens, and requests, each labeled with backend, model, and mode. ```prometheus # HELP llama_prompt_tokens_total Total number of prompt tokens processed # TYPE llama_prompt_tokens_total counter llama_prompt_tokens_total{backend="llama.cpp",model="llama3.2:latest",mode="completion"} 4934 llama_prompt_tokens_total{backend="llama.cpp",model="ai/mxbai-embed-large:335M-F16",mode="embedding"} 4525 # HELP llama_generation_tokens_total Total number of tokens generated # TYPE llama_generation_tokens_total counter llama_generation_tokens_total{backend="llama.cpp",model="llama3.2:latest",mode="completion"} 2156 # HELP llama_requests_total Total number of requests processed # TYPE llama_requests_total counter llama_requests_total{backend="llama.cpp",model="llama3.2:latest",mode="completion"} 127 llama_requests_total{backend="llama.cpp",model="ai/mxbai-embed-large:335M-F16",mode="embedding"} 89 ``` -------------------------------- ### Run Docker Container with Custom Port and Model Path Source: https://github.com/docker/model-runner/blob/main/README.md Starts the application in a Docker container, allowing customization of the TCP port and the host path for persistent model storage. The specified `MODELS_PATH` will be mounted into the container. ```sh # Customize port and model storage location make docker-run PORT=3000 MODELS_PATH=/path/to/your/models ``` -------------------------------- ### Run model for interactive chat with Docker Source: https://github.com/docker/model-runner/blob/main/README.md Start an interactive chat session with a model using the `docker model run` command. Type `/bye` to exit the session. ```bash docker model run nvcr.io/nim/google/gemma-3-1b-it:latest > Tell me a joke ... > /bye ``` -------------------------------- ### Start an interactive chat session with a model Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_run.md Initiate an interactive chat session with a specified model. This allows for multi-turn conversations until the session is terminated. ```console docker model run ai/smollm2 ``` -------------------------------- ### Run an AI Model with Docker Model Runner Source: https://github.com/docker/model-runner/blob/main/README.md Tests the full Docker Model Runner setup by running a specified AI model with a given input string. ```bash # Run a model to test the full setup docker model run ai/gemma3 "Hello" ``` -------------------------------- ### Pull a Model using Docker Source: https://github.com/docker/model-runner/blob/main/demos/extractor/demo.html Use this command to download AI models from Docker Hub. Ensure you have Docker installed and configured. ```bash docker model pull ``` -------------------------------- ### Pull and run a model using model-cli Source: https://github.com/docker/model-runner/blob/main/README.md Connects to a manually started model-runner server and pulls/runs a specified AI model with an input string using the `model-cli` tool. ```bash # Pull and run a model MODEL_RUNNER_HOST=http://localhost:13434 ./cmd/cli/model-cli run ai/smollm2 "Hello, how are you?" ``` -------------------------------- ### List all available models via API Source: https://github.com/docker/model-runner/blob/main/README.md Use this curl command to get a list of all models accessible through the Model Runner API. ```sh # List all available models curl http://localhost:8080/models ``` -------------------------------- ### Send Request with API Key Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md Example of sending a request to a secured gateway. The API key must be provided either as a Bearer token or in the `x-api-key` header. ```console $ curl http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "smollm2", "messages": [{"role": "user", "content": "Hi"}]}' ``` -------------------------------- ### Verify Docker CLI Plugin Availability Source: https://github.com/docker/model-runner/blob/main/README.md Checks if the Docker Model Runner CLI plugin is installed and available by running the `docker model --help` command. ```bash # Check if the Docker CLI plugin is available docker model --help ``` -------------------------------- ### Use OpenAI Python Library for Chat Completions Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md Programmatically interact with the Docker Model Runner's OpenAI-compatible API using the Python client library. Ensure the `openai` library is installed. ```python from openai import OpenAI client = OpenAI( base_url="http://localhost:12434/engines/llama.cpp/v1", api_key="not-needed" # API key not required for local inference ) response = client.chat.completions.create( model="ai/smollm2", messages=[{"role": "user", "content": "Hello!"}] ) ``` -------------------------------- ### JavaScript/TypeScript OpenAI Client Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Example of making a chat completion request using the OpenAI library in JavaScript or TypeScript. Set the `baseURL` to point to your Docker Model Runner instance. ```javascript import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'http://localhost:12434/engines/llama.cpp/v1', apiKey: 'not-needed' }); const response = await client.chat.completions.create({ model: 'ai/smollm2', messages: [{ role: 'user', content: 'Hello!' }] }); console.log(response.choices[0].message.content); ``` -------------------------------- ### Build model-cli Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md Build the `model-cli` binary in release mode. This is a prerequisite for running the gateway demo. ```bash cd model-cli && cargo build --release ``` -------------------------------- ### Python OpenAI Library Chat Completion Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Example of performing a chat completion using the Python OpenAI library. Configure the client with the base URL of the Docker Model Runner. ```python from openai import OpenAI client = OpenAI( base_url="http://localhost:12434/engines/llama.cpp/v1", api_key="not-needed" ) # Chat completion response = client.chat.completions.create( model="ai/smollm2", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ] ) print(response.choices[0].message.content) ``` -------------------------------- ### Initialize Camera Access Source: https://github.com/docker/model-runner/blob/main/demos/multimodal/demo.html Requests camera permissions and sets up the video stream. Displays success or error messages to the user. ```javascript // 1. Ask for camera permission on load async function initCamera() { try { stream = await navigator.mediaDevices.getUserMedia({ video: true, audio: false }); video.srcObject = stream; responseText.value = "Camera access granted. Ready to start."; } catch (err) { console.error("Error accessing camera:", err); responseText.value = `Error accessing camera: ${err.name} - ${err.message}. Please ensure permissions are granted and you are on HTTPS or localhost.`; alert(`Error accessing camera: ${err.name}. Make sure you've granted permission and are on HTTPS or localhost.`); } } ``` -------------------------------- ### Model Runner API Response Example Source: https://github.com/docker/model-runner/blob/main/README.md This is an example of the JSON response you might receive from the Model Runner API, detailing a chat completion. ```json { "id": "chat-12345", "object": "chat.completion", "created": 1682456789, "model": "ai/smollm2", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "I'm doing well, thank you for asking! How can I assist you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 24, "completion_tokens": 16, "total_tokens": 40 } } ``` -------------------------------- ### Run llama-server Source: https://github.com/docker/model-runner/blob/main/llamacpp/native/README.md Execute the compiled llama-server binary, providing the path to the model file. ```bash ./build/bin/com.docker.llama-server --model ``` -------------------------------- ### Get metrics Source: https://github.com/docker/model-runner/blob/main/README.md Retrieves operational metrics for the Model Runner. ```APIDOC ## GET /metrics ### Description Retrieves operational metrics for the Model Runner service. ### Method GET ### Endpoint /metrics ### Response #### Success Response (200) - (type) - Description of the response body containing various metrics. ``` -------------------------------- ### Get Model Info Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Retrieves detailed information about a specific model. ```APIDOC ## GET /models/{model} ### Description Retrieves detailed information about a specific model. ### Method GET ### Endpoint /models/{model} ### Parameters #### Path Parameters - **model** (string) - Required - The ID of the model to retrieve. ### Response #### Success Response (200) - **id** (string) - The unique identifier of the model. - **object** (string) - The type of object (e.g., 'model'). - **owned_by** (string) - The owner of the model. #### Response Example ```json { "id": "ai/smollm2", "object": "model", "owned_by": "local" } ``` ``` -------------------------------- ### Clone Repository and Build CLI Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md Clone the model-cli repository and build the command-line interface. ```bash git clone https://github.com/docker/model-cli.git cd model-cli make build ``` -------------------------------- ### Build llama-server with CMake Source: https://github.com/docker/model-runner/blob/main/llamacpp/native/README.md Use CMake to configure and build the llama-server binary. Specify parallel build jobs for faster compilation. ```bash cmake -B build cmake --build build --parallel 8 --config Release ``` -------------------------------- ### Access Open WebUI Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md Set up port-forwarding for the Open WebUI service and access it via your browser. ```bash kubectl port-forward service/open-webui 8080:80 ``` -------------------------------- ### Get metrics via API Source: https://github.com/docker/model-runner/blob/main/README.md Use this curl command to retrieve performance metrics from the Model Runner. ```sh # Get metrics curl http://localhost:8080/metrics ``` -------------------------------- ### Discover Docker Models: Search Hub, HuggingFace, Specific Source Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Shows how to search for AI models available on Docker Hub and HuggingFace, with options to specify the search source. ```bash # Search Docker Hub docker model search llama # Search HuggingFace docker model search hf.co/bartowski # Search with specific source docker model search --source dockerhub llama ``` -------------------------------- ### Check Docker Model Runner Version Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md Verify if Docker Model Runner is installed and accessible by checking its version. ```bash docker model version ``` -------------------------------- ### Build Docker Image with vLLM Source: https://github.com/docker/model-runner/blob/main/README.md Builds the Docker image with vLLM support using default settings. Ensure you have the necessary build environment set up. ```sh # Build with default settings (vLLM 0.19.1) make docker-build DOCKER_TARGET=final-vllm BASE_IMAGE=nvidia/cuda:13.0.2-runtime-ubuntu24.04 LLAMA_SERVER_VARIANT=cuda ``` -------------------------------- ### Gateway Health Check Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md Check the health status of the `model-cli` gateway. This is a simple GET request to the `/health` endpoint. ```bash GW="http://localhost:4000" KEY="demo-secret" # Health curl "${GW}/health" ``` -------------------------------- ### Generate Documentation Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md Generate documentation for the model-cli project. ```bash make docs ``` -------------------------------- ### Helm Configuration for Model Pre-pulling Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md Configure models to be pre-pulled during pod initialization by enabling `modelInit` and listing the desired models. ```yaml modelInit: enabled: true models: - "ai/smollm2:latest" - "ai/llama3.2:latest" - "ai/mistral:latest" ``` -------------------------------- ### Get information about a specific model via API Source: https://github.com/docker/model-runner/blob/main/README.md Use this curl command to retrieve detailed information about a particular model. ```sh # Get information about a specific model curl http://localhost:8080/models/ai/smollm2 ``` -------------------------------- ### Deploy Docker Model Runner on Docker Desktop Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md Apply the desktop-specific manifest and wait for the deployment to become available. Then, run a model using the specified host. ```bash kubectl apply -f static/docker-model-runner-desktop.yaml kubectl wait --for=condition=Available deployment/docker-model-runner --timeout=5m MODEL_RUNNER_HOST=http://localhost:31245 docker model run ai/smollm2:latest ``` -------------------------------- ### List Available Docker Models Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md View a list of models that have already been downloaded and are available for use with Docker Model Runner. ```bash docker model list ``` -------------------------------- ### List Models using `dmr` convenience wrapper Source: https://github.com/docker/model-runner/blob/main/README.md Lists available models using the `dmr` convenience wrapper. ```bash ./dmr ls ``` -------------------------------- ### Manage Docker Model Runner Service Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Commands to control the lifecycle of the Docker Model Runner service, including starting, stopping, and restarting it. ```bash # Start the runner docker model start-runner # Stop the runner docker model stop-runner # Restart the runner docker model restart-runner ``` -------------------------------- ### Initialize DOM Elements and Variables Source: https://github.com/docker/model-runner/blob/main/demos/multimodal/demo.html Selects all necessary DOM elements and initializes global variables for the application's state, including default instruction text and the recommended model. ```javascript const video = document.getElementById('videoFeed'); const canvas = document.getElementById('canvas'); const baseURL = document.getElementById('baseURL'); const modelSelect = document.getElementById('modelSelect'); const modelWarning = document.getElementById('modelWarning'); const modelInfo = document.getElementById('modelInfo'); const instructionText = document.getElementById('instructionText'); const responseText = document.getElementById('responseText'); const intervalSelect = document.getElementById('intervalSelect'); const startButton = document.getElementById('startButton'); instructionText.value = "What do you see?"; // default instruction let stream; let intervalId; let isProcessing = false; let isWaitingForResponse = false; const RECOMMENDED_MODEL = 'ai/smolvlm:500M-Q8_0'; // Default model ``` -------------------------------- ### Remove Distro Docker Version (Ubuntu/Debian) Source: https://github.com/docker/model-runner/blob/main/README.md Removes the Docker, containerd, and runc packages that might have been installed from a Linux distribution's repositories. ```bash # Remove distro version (Ubuntu/Debian example) sudo apt-get purge docker docker.io containerd runc ``` -------------------------------- ### Inspect Model Configuration Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md View the current configuration settings for a specific model. This is useful for verifying settings after configuration. ```bash # View configuration docker model inspect ai/smollm2 ``` -------------------------------- ### Pull a General-Purpose AI Model Source: https://github.com/docker/model-runner/blob/main/demos/extractor/README.md Use this command to download a suitable AI model for text extraction from Docker Hub. Ensure you have Docker installed and configured. ```bash docker model pull ai/gemma3 ``` -------------------------------- ### Package and Push Model Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md Package a GGUF model and push it to a target registry. Options for license and context size are available. ```bash ./model-cli package --gguf --push ``` -------------------------------- ### Send Request to Gateway Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md Example of sending a chat completion request to the gateway using an OpenAI-compatible client. The `model` field should match a `model_name` defined in your configuration. ```console $ curl http://localhost:4000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "smollm2", "messages": [{"role": "user", "content": "Hello"}] }' ``` -------------------------------- ### List Available Models Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_search.md Lists all available models from Docker Hub when no search term is provided. ```bash docker model search ``` -------------------------------- ### Build nv-gpu-info Executable Source: https://github.com/docker/model-runner/blob/main/llamacpp/native/src/nv-gpu-info/CMakeLists.txt Adds an executable target named 'com.docker.nv-gpu-info' and links it with the previously defined nvapi library. This compiles the native GPU information utility. ```cmake set(TARGET com.docker.nv-gpu-info) add_executable(${TARGET} nv-gpu-info.c) install(TARGETS ${TARGET} RUNTIME) target_link_libraries(${TARGET} nvapi) ``` -------------------------------- ### Build Docker Model Runner from Source Source: https://github.com/docker/model-runner/blob/main/README.md Builds the complete Docker Model Runner stack, including the server, CLI plugin, and a `dmr` convenience wrapper, using the provided Makefile. ```bash make ``` -------------------------------- ### Check Docker Model Runner Pod Logs Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/README.md Stream the logs of the Docker Model Runner deployment to troubleshoot startup issues. ```bash kubectl logs -f deployment/docker-model-runner ``` -------------------------------- ### Build Docker Image for Multi-Architecture Support with vLLM Source: https://github.com/docker/model-runner/blob/main/README.md Builds the Docker image with vLLM support for multiple architectures (amd64 and arm64), automatically selecting appropriate prebuilt wheels. ```sh docker buildx build \ --platform linux/amd64,linux/arm64 \ --target final-vllm \ --build-arg BASE_IMAGE=nvidia/cuda:13.0.2-runtime-ubuntu24.04 \ --build-arg LLAMA_SERVER_VARIANT=cuda \ -t docker/model-runner:vllm . ``` -------------------------------- ### Run model for single prompt with Docker Source: https://github.com/docker/model-runner/blob/main/README.md Execute a model with a single prompt using the `docker model run` command. Ensure the model image is correctly specified. ```bash docker model run nvcr.io/nim/google/gemma-3-1b-it:latest "Explain quantum computing" ``` -------------------------------- ### Run a model with a one-time prompt Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_run.md Use this command to send a single prompt to a model and receive a response. The model is loaded, the prompt is processed, and the output is displayed. ```console docker model run ai/smollm2 "Hi" ``` -------------------------------- ### Makefile Commands for Docker Model Runner Source: https://github.com/docker/model-runner/blob/main/charts/docker-model-runner/CONTRIBUTING.md Common commands for managing the Docker Model Runner chart using the Makefile. These include rendering YAML, installing, upgrading, and uninstalling the chart. ```bash # Render to plain Kubernetes YAML make render ``` ```bash # Install the chart make install ``` ```bash # Upgrade the chart make upgrade ``` ```bash # Uninstall the chart make uninstall ``` -------------------------------- ### Configure Gateway for Multiple Providers with Fallback Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_gateway.md This configuration sets up multiple LLM providers (Groq, OpenAI, Docker Model Runner) and defines fallback strategies. If the primary provider fails, requests will be routed to the specified fallbacks. ```yaml model_list: - model_name: fast params: model: groq/llama-3.1-8b-instant api_key: os.environ/GROQ_API_KEY - model_name: smart params: model: openai/gpt-4o api_key: os.environ/OPENAI_API_KEY - model_name: local params: model: docker_model_runner/ai/smollm2 api_base: http://localhost:12434/engines/llama.cpp/v1 general_settings: num_retries: 2 fallbacks: - fast: [local] - smart: [fast, local] ``` -------------------------------- ### Inspect Docker Model Details Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md Retrieve detailed information and metadata about a specific downloaded model. ```bash docker model inspect ``` -------------------------------- ### Rebuild Search Index Source: https://github.com/docker/model-runner/blob/main/demos/embeddings/index.html Initiates the process of rebuilding the search index. It prompts the user for confirmation due to the potentially long duration of the operation and updates the UI to indicate the indexing process has started. ```javascript async function rebuildIndex() { if (!confirm('Rebuilding the index may take several minutes. Continue?')) { return; } const rebuildBtn = document.getElementById('rebuildBtn'); rebuildBtn.disabled = true; rebuildBtn.textContent = 'Indexing...'; showInfo('Indexing started. This may take several minutes. ``` -------------------------------- ### Run Docker Model with Optional Prompt Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md Execute a specified model, optionally providing a prompt for immediate inference. ```bash docker model run [prompt] ``` -------------------------------- ### Run Docker Models: Interactive, Single Prompt, Detached, Debug Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Demonstrates various ways to run AI models using Docker Model Runner, including interactive chat, single-line prompts, pre-loading for faster requests, and enabling debug logging. ```bash # Interactive chat mode docker model run ai/smollm2 # Single prompt docker model run ai/smollm2 "Explain Docker in one sentence" # Pre-load model for faster subsequent requests docker model run --detach ai/smollm2 # With debug logging docker model run --debug ai/smollm2 "Hello" ``` -------------------------------- ### JavaScript Event Listeners for Model Runner Source: https://github.com/docker/model-runner/blob/main/demos/multimodal/demo.html Handles starting and stopping model processing based on button clicks. Initializes camera and fetches models on page load. Also includes cleanup for the camera stream and intervals on page unload. ```javascript function handleStart() { if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) { alert("getUserMedia not supported on your browser!"); return; } // Check if permission was already granted if (!hasPermission) { // Assuming hasPermission is a boolean flag set elsewhere alert("Please grant permission first."); return; } isProcessing = true; startButton.textContent = "Stop"; startButton.classList.remove('start'); startButton.classList.add('stop'); instructionText.disabled = true; intervalSelect.disabled = true; responseText.value = "Processing started..."; const intervalMs = parseInt(intervalSelect.value, 10); // Initial immediate call sendData(); // Then set interval intervalId = setInterval(sendData, intervalMs); } function handleStop() { isProcessing = false; if (intervalId) { clearInterval(intervalId); intervalId = null; } startButton.textContent = "Start"; startButton.classList.remove('stop'); startButton.classList.add('start'); instructionText.disabled = false; intervalSelect.disabled = false; if (responseText.value.startsWith("Processing started...")) { responseText.value = "Processing stopped."; } } startButton.addEventListener('click', () => { if (isProcessing) { handleStop(); } else { handleStart(); } }); // Initialize camera and fetch models when the page loads window.addEventListener('DOMContentLoaded', () => { initCamera(); fetchModels(); }); // Optional: Stop stream when page is closed/navigated away to release camera window.addEventListener('beforeunload', () => { if (stream) { stream.getTracks().forEach(track => track.stop()); } if (intervalId) { clearInterval(intervalId); } }); ``` -------------------------------- ### Run Unit Tests Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md Execute unit tests for the model-cli project. ```bash make unit-tests ``` -------------------------------- ### Direct Docker Build with Resolved Upstream Image Source: https://github.com/docker/model-runner/blob/main/README.md Demonstrates how to use `docker buildx build` directly, passing a fully resolved upstream image for llama.cpp. This is an alternative to using the `make docker-build` target for advanced customization. ```sh docker buildx build \ --target final-llamacpp \ --build-arg LLAMA_UPSTREAM_IMAGE=ghcr.io/ggml-org/llama.cpp:server-vulkan-b8840 \ -t docker/model-runner:llama-b8840 . ``` -------------------------------- ### List Available Models via Gateway Source: https://github.com/docker/model-runner/blob/main/demos/gateway/README.md Retrieve a list of available models exposed by the gateway. Requires a valid Authorization header. ```bash GW="http://localhost:4000" KEY="demo-secret" # List models curl -H "Authorization: Bearer ${KEY}" "${GW}/v1/models" ``` -------------------------------- ### Push a model to a registry Source: https://github.com/docker/model-runner/blob/main/cmd/cli/docs/reference/model_push.md Use this command to push your model to a specified namespace in a container registry. Ensure you have authenticated with the registry beforehand. ```console docker model push / ``` -------------------------------- ### Build model-runner Docker image Source: https://github.com/docker/model-runner/blob/main/README.md Builds the Docker image for the model-runner service. ```bash cd model-runner make docker-build ``` -------------------------------- ### Go OpenAI Client Chat Completion Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md This Go code snippet shows how to create a chat completion request using the `go-openai` library. Configure the client's `BaseURL` to connect to the Docker Model Runner. ```go package main import ( "context" "fmt" "github.com/sashabaranov/go-openai" ) func main() { config := openai.DefaultConfig("not-needed") config.BaseURL = "http://localhost:12434/engines/llama.cpp/v1" client := openai.NewClientWithConfig(config) resp, err := client.CreateChatCompletion( context.Background(), openai.ChatCompletionRequest{ Model: "ai/smollm2", Messages: []openai.ChatCompletionMessage{ {Role: "user", Content: "Hello!"}, }, }, ) if err != nil { panic(err) } fmt.Println(resp.Choices[0].Message.Content) } ``` -------------------------------- ### JavaScript Event Listeners for Initialization Source: https://github.com/docker/model-runner/blob/main/demos/extractor/demo.html Sets up event listeners for DOMContentLoaded to initialize the application and for changes in the base URL to refresh available models. ```javascript // Initialize on page load window.addEventListener('DOMContentLoaded', () => { loadInvoiceSchema(); fetchModels(); }); // Refresh models when base URL changes document.getElementById('baseUrl').addEventListener('change', fetchModels); ``` -------------------------------- ### Search for Docker Models Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md Find models available on Docker Hub or HuggingFace using a search query. ```bash docker model search ``` -------------------------------- ### Initialize llama.cpp Submodule Source: https://github.com/docker/model-runner/blob/main/llamacpp/native/README.md Ensure the llama.cpp git submodule is initialized and updated. This command must be run from the project root directory. ```bash git submodule update --init --recursive ``` -------------------------------- ### Pull a Docker Model Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md Download a specific AI model to your local machine before running it. ```bash docker model pull ``` -------------------------------- ### Configure Model Parameters Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/references/docker-model-guide.md Configure specific parameters for a model, such as context size. Use this to fine-tune model behavior for your tasks. ```bash # Configure model parameters docker model configure ai/smollm2 --ctx-size 4096 ``` -------------------------------- ### Run Model with Prompt Source: https://github.com/docker/model-runner/blob/main/cmd/cli/README.md Execute a model with a specific prompt using the Docker Model CLI. ```bash ./model-cli run llama.cpp "What is the capital of France?" ``` -------------------------------- ### Show Running Docker Models Source: https://github.com/docker/model-runner/blob/main/cmd/cli/commands/skills/docker-model-runner/SKILL.md View a list of models that are currently active and running. ```bash docker model ps ``` -------------------------------- ### Run Docker Container with Default Settings Source: https://github.com/docker/model-runner/blob/main/README.md Executes the application within a Docker container using default configurations for port and model storage. The `models` directory will be created in the current working directory and mounted into the container. ```sh make docker-run ```