### Setup Virtual Environment and Install Requirements

Source: https://github.com/openvinotoolkit/model_server/blob/main/tests/performance/README.md

Enable a virtual environment and install the necessary Python packages for performance testing.

```bash
virtualenv .venv
. .venv/bin/activate
pip3 install -r ../requirements.txt
```

--------------------------------

### Example: Serve Phi-3-mini Model on Baremetal

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md

Example command to serve the 'Phi-3-mini-FastDraft-50M-int8-ov' model on a baremetal host. Requires OpenVINO Model Server installation and specifies GPU as the target device for text generation.

```bat
ovms --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path models/ --model_name Phi-3-mini-FastDraft-50M-int8-ov --target_device GPU --task text_generation --port 8000 --rest_port 9000
```

--------------------------------

### Get Server Metadata Usage Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md

This example demonstrates how to run the client to retrieve server metadata. Specify the gRPC port and address to connect to the server.

```Bash
python ./grpc_server_metadata.py --grpc_port 9000 --grpc_address localhost
```

--------------------------------

### Start Model Server on Baremetal Host

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md

Execute this command to start the OpenVINO Model Server directly on a baremetal host. Ensure the OpenVINO Model Server package is installed.

```text
ovms --model_path <path_to_model> --model_name <model_name> --port 9000 --rest_port 8000 --log_level DEBUG
```

--------------------------------

### Device Plugin Configuration Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/parameters.md

Specifies parameters for device plugins to optimize inference. For a full list of parameters, refer to the OpenVINO documentation and performance tuning guide. Example shown sets latency optimization.

```json
{"PERFORMANCE_HINT": "LATENCY"}
```

--------------------------------

### Download ResNet Model Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md

This command downloads a ResNet model and its weights for demonstration purposes. Ensure you have `wget` installed.

```bash
mkdir -p models/resnet/1
wget -P models/resnet/1 https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin
wget -P models/resnet/1 https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml
```

--------------------------------

### Install Necessary Packages

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/cpp/kserve-api/README.md

Installs required packages for building and running the client library and samples.

```bash
apt-get update && apt-get install cmake build-essential libssl-dev zlib1g-dev git rapidjson-dev python3
```

--------------------------------

### gRPC Server Readiness Check Usage Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/cpp/kserve-api/README.md

Example of how to run the grpc_server_ready client to verify if the server is ready.

```bash
./grpc_server_ready --grpc_port 9000 --grpc_address localhost
Server Ready: True
```

--------------------------------

### Run Async Inference Client Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md

This command demonstrates how to run the asynchronous inference client with specific parameters, including image paths, input/output names, and transpose settings. It shows sample output including image data range, processing start, and classification results.

```Bash
python http_async_infer_resnet.py --http_port 8000 --images_numpy_path ../../imgs_nhwc.npy --labels_numpy_path ../../lbs.npy --input_name 0 --output_name 1463 --transpose_input False --model_name resnet
```

```text
Image data range: 0.0 : 255.0
Start processing:
        Model name: resnet
        Iterations: 10
        Images numpy path: ../../imgs_nhwc.npy
        Numpy file shape: (10, 3, 224, 224)

imagenet top results in a single batch:
         0 airliner 404 ; Correct match.
imagenet top results in a single batch:
         0 Arctic fox, white fox, Alopex lagopus 279 ; Correct match.
imagenet top results in a single batch:
         0 bee 309 ; Correct match.
imagenet top results in a single batch:
         0 golden retriever 207 ; Correct match.
imagenet top results in a single batch:
         0 gorilla, Gorilla gorilla 366 ; Correct match.
imagenet top results in a single batch:
         0 magnetic compass 635 ; Correct match.
imagenet top results in a single batch:
         0 peacock 84 ; Correct match.
imagenet top results in a single batch:
         0 pelican 144 ; Correct match.
imagenet top results in a single batch:
         0 snail 113 ; Correct match.
imagenet top results in a single batch:
         0 zebra 340 ; Correct match.
Classification accuracy: 100.00
```

--------------------------------

### Install vLLM and Download Dataset

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/scaling/README.md

Installs the vLLM repository and downloads the necessary dataset for benchmarking. Ensure you are using the correct branch and have PyTorch with CPU support installed.

```bash
git clone --branch v0.7.3 --depth 1 https://github.com/vllm-project/vllm
cd vllm
pip3 install -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
cd benchmarks
curl -L https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json -o ShareGPT_V3_unfiltered_cleaned_split.json
```

--------------------------------

### Start GenAI Model from Hugging Face (Baremetal)

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md

Use this command to serve a Hugging Face model with OVMS on a baremetal host. Ensure the OpenVINO Model Server package is installed.

```text
ovms --source_model <model_name_in_HF> --model_repository_path /models --model_name <ovms_servable_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_OPTIONS]
```

--------------------------------

### Get Model Readiness Usage Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md

Run this client to check if a specific model is ready for inference. Provide the gRPC connection details and the model name.

```Bash
python ./grpc_model_ready.py --grpc_port 9000 --grpc_address localhost --model_name resnet
```

--------------------------------

### Start OpenVINO Model Server with GPU

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_cache.md

Use this command to start the model server with GPU support. Ensure you have a 'model' directory with your models and a 'cache' directory. The first start populates the cache, making subsequent starts faster.

```bash
$ mkdir cache
$ chmod -R 755 model
$ docker run -p 9000:9000 -d -u $(id -u):$(id -g) --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v ${PWD}/model/fdsample:/model:ro -v ${PWD}/cache:/opt/cache:rw openvino/model_server:latest-gpu --model_name model --model_path /model --target_device GPU --port 9000
```

--------------------------------

### Start Model Server with GPU Acceleration (Binary)

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/accelerators.md

Launch the OpenVINO Model Server using the binary package with GPU acceleration enabled. Ensure necessary drivers and packages are installed.

```bat
ovms --model_path model --model_name resnet --port 9000 --target_device GPU
```

--------------------------------

### Install Dependencies and Download Models

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/mediapipe/object_detection/README.md

Install required Python packages and download the necessary models for the demo.

```console
pip install -r requirements.txt

python mediapipe_object_detection.py --download_models
```

--------------------------------

### gRPC Server Liveness Check Usage Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/cpp/kserve-api/README.md

Example of how to run the grpc_server_live client to verify if the server is alive.

```bash
./grpc_server_live --grpc_port 9000 --grpc_address localhost
Server Live: True
```

--------------------------------

### Install Triton Client Package

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md

Install the Triton client library with all dependencies using pip.

```bash
pip3 install tritonclient[all]
```

--------------------------------

### Install Requests and Download Image

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/vlm_npu/README.md

Installs the necessary Python requests library and downloads a sample image for testing.

```bash
pip3 install requests
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/static/images/zebra.jpeg -o zebra.jpeg
```

--------------------------------

### Install virtualenv Package

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/developer_guide.md

Install the 'virtualenv' package using pip3. This is a prerequisite for running functional tests in the guide.

```bash
pip3 install virtualenv
```

--------------------------------

### Prepare Virtual Environment and Install Dependencies

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md

Navigate to the client samples directory, create a virtual environment, activate it, and install required Python packages.

```bash
cd model_server/client/python/kserve-api/samples
virtualenv .venv
. .venv/bin/activate
pip install -r requirements.txt
```

--------------------------------

### Download and Install Dependencies

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/integration_with_OpenWebUI/README.md

Download the export script and install necessary Python packages. This is a prerequisite for model preparation.

```bash
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
```

--------------------------------

### Install Client Requirements

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/python_demos/clip_image_classification/README.md

Clone the repository, navigate to the demo directory, set up a virtual environment, and install necessary Python packages.

```bash
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server/demos/python_demos/clip_image_classification/
virtualenv .venv
. .venv/bin/activate
pip3 install -r requirements.txt
```

--------------------------------

### Start Mediamtx RTSP Server (Standalone)

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/real_time_stream_analysis/python/README.md

Starts the Mediamtx RTSP server after it has been installed as a standalone binary or via winget.

```bat
mediamtx 
```

--------------------------------

### gRPC Server Metadata Check Usage Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/cpp/kserve-api/README.md

Example of how to run the grpc_server_metadata client to retrieve server information.

```bash
./grpc_server_metadata --grpc_port 9000 --grpc_address localhost
Name: OpenVINO Model Server
Version: 2022.2.c290da85
```

--------------------------------

### Get Model Metadata Usage Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md

Example of using the `rest_get_model_metadata.py` script to retrieve metadata for a model. The output includes model specifications and signature definitions.

```bash
python rest_get_model_metadata.py --rest_port 8000
```

```json
{
 "modelSpec": {
  "name": "resnet",
  "signatureName": "",
  "version": "1"
 },
 "metadata": {
  "signature_def": {
   "@type": "type.googleapis.com/tensorflow.serving.SignatureDefMap",
   "signatureDef": {
    "serving_default": {
     "inputs": {
      "0": {
       "dtype": "DT_FLOAT",
       "tensorShape": {
        "dim": [
         {
          "size": "1",
          "name": ""
         },
         {
          "size": "3",
          "name": ""
         },
         {
          "size": "224",
          "name": ""
         },
         {
          "size": "224",
          "name": ""
         }
        ],
        "unknownRank": false
       },
       "name": "0"
      }
     },
     "outputs": {
      "1463": {
       "dtype": "DT_FLOAT",
       "tensorShape": {
        "dim": [
         {
          "size": "1",
          "name": ""
         },
         {
          "size": "1000",
          "name": ""
         }
        ],
        "unknownRank": false
       },
       "name": "1463"
      }
     },
     "methodName": "",
     "defaults": {}
    }
   }
  }
 }
}
```

--------------------------------

### Install Dependencies and Prepare Audio Samples

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/audio/README.md

Installs necessary Python packages, creates a directory for audio samples, and downloads a sample audio file. This is a prerequisite for preparing speaker embeddings.

```console
pip install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/audio/requirements.txt
mkdir audio_samples
curl --create-dirs "https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0032_8k.wav" -o audio_samples/audio.wav
curl --create-dirs https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/audio/create_speaker_embedding.py -o models/speakers/create_speaker_embedding.py
python models/speakers/create_speaker_embedding.py audio_samples/audio.wav models/speakers/voice1.bin
```

--------------------------------

### Get Model Status Usage Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md

Example of how to use the `rest_get_model_status.py` script to retrieve the status of a specific model version. The output shows the version, state, and status of the model.

```bash
python rest_get_model_status.py --rest_port 8000 --model_version 1
```

```json
{
 "model_version_status": [
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": "OK"
   }
  }
 ]
}
```

--------------------------------

### Build Client Library and Samples

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/go/kserve-api/README.md

Build the Go client library and associated samples for the KServe API. This step is necessary before running the client-side examples.

```Bash
cd client/go/kserve-api
bash build.sh
cd build
```

--------------------------------

### Get Model Status API Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_server_rest_api_tfs.md

Use this command to retrieve the status of a specific model version served by OpenVINO Model Server. The version can be omitted to get the status of all available versions.

```bash
$ curl http://localhost:8001/v1/models/person-detection/versions/1
```

--------------------------------

### Start Model Server with Configuration File

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/metrics.md

This option uses a JSON configuration file to define model settings and enable monitoring. Ensure the configuration file is correctly formatted.

```bash
mkdir workspace
wget -N https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P workspace/models/resnet50/1
echo '{
 "model_config_list": [
     {
        "config": {
             "name": "resnet",
             "base_path": "/workspace/models/resnet50"
        }
     }
 ],
 "monitoring":
     {
         "metrics":
         {
             "enable" : true
         }
     }
}' >> workspace/config.json
```

```bash
docker run -d -u $(id -u) -v ${PWD}/workspace:/workspace -p 9000:9000 -p 8000:8000 openvino/model_server:latest \
       --config_path /workspace/config.json \
       --port 9000 --rest_port 8000
```

--------------------------------

### Get Model Metadata Usage Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md

This example shows how to fetch metadata for a specific model using the gRPC client. Ensure you provide the correct gRPC address, port, and model name.

```Bash
python ./grpc_model_metadata.py --grpc_port 9000 --grpc_address localhost --model_name resnet
```

--------------------------------

### Prepare Python Client for String Input

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/universal-sentence-encoder/README.md

Sets up the Python environment by installing dependencies and cloning the model server repository. This is a prerequisite for running the string input client script.

```bash
git clone https://github.com/openvinotoolkit/model_server
```

```bash
pip install --upgrade pip
```

```bash
pip install -r model_server/demos/universal-sentence-encoder/requirements.txt
```

--------------------------------

### Start Model Server Container

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/dynamic_shape_custom_node.md

This command starts the OpenVINO Model Server Docker container, mounting the local models directory and exposing the gRPC port. Ensure Docker is installed and the image is pulled.

```bash
docker run --rm -d -v ${PWD}:/models -p 9000:9000 openvino/model_server:latest --config_path /models/config.json --port 9000
```

--------------------------------

### Example: Serve Phi-3-mini Model with Docker

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md

Example command to serve the 'Phi-3-mini-FastDraft-50M-int8-ov' model using Docker. Requires Docker Engine and specifies CPU as the target device for text generation.

```text
docker run --user $(id -u):$(id -g) -p 9000:9000 -p 8000:8000 --rm -v <model_repository_path>:/models openvino/model_server:latest \
--port 8000 --rest_port 9000 --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path /models/ --model_name Phi-3-mini-FastDraft-50M-int8-ov --target_device CPU --task text_generation
```

--------------------------------

### Example: Serve All Versions

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_version_policy.md

This example configures the server to serve all available versions of a model. This can be useful for load balancing across all deployed versions.

```json
{"all": {}}
```

--------------------------------

### Download BERT Model and Start OVMS

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/bert_question_answering/python/README.md

Downloads the BERT model files and starts the OpenVINO Model Server with the model configured for dynamic shapes. Ensure the model path and name match your setup.

```bash
curl --create-dirs https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/bert-small-uncased-whole-word-masking-squad-int8-0002/FP32-INT8/bert-small-uncased-whole-word-masking-squad-int8-0002.bin https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/bert-small-uncased-whole-word-masking-squad-int8-0002/FP32-INT8/bert-small-uncased-whole-word-masking-squad-int8-0002.xml -o model/1/bert-small-uncased-whole-word-masking-squad-int8-0002.bin -o model/1/bert-small-uncased-whole-word-masking-squad-int8-0002.xml
chmod -R 755 model
docker run -d -v $(pwd)/model:/models -p 9000:9000 openvino/model_server:latest  --model_path /models --model_name bert --port 9000 --shape '{"attention_mask": "(1,-1)", "input_ids": "(1,-1)", "position_ids": "(1,-1)", "token_type_ids": "(1,-1)"}'
```

--------------------------------

### Install Client Dependencies and Run Demo

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/using_onnx_model/python/README.md

Install the required Python packages for the client and run the ONNX model demo script, specifying the service URL. This client sends raw image data for inference.

```bash
pip3 install -r requirements.txt
python onnx_model_demo.py --service_url localhost:9001
```

--------------------------------

### Start Model Server with Docker

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md

Use this command to start the OpenVINO Model Server using Docker. Ensure Docker Engine is installed and models are prepared. Ports 9000 (gRPC) and 8000 (REST) are exposed.

```text
docker run -d --rm -v <models_repository>:/models -p 9000:9000 -p 8000:8000 openvino/model_server:latest \
--model_path <path_to_model> --model_name <model_name> --port 9000 --rest_port 8000 --log_level DEBUG
```

--------------------------------

### Download Example Client Components

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/ovms_quickstart.md

Downloads Python scripts and requirements for an example client application to interact with the OpenVINO Model Server, along with a COCO class names file.

```bash
wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/object_detection/python/object_detection.py
wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/object_detection/python/requirements.txt
wget https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/master/data/dataset_classes/coco_91cl.txt
```

--------------------------------

### Run Qwen3-30B-A3B-Instruct Model Server on GPU

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/agentic_ai/README.md

Starts the OpenVINO Model Server with the Qwen3-30B-A3B-Instruct model, enabling tool-guided generation on GPU. This command is for a larger, more capable model.

```bash
mkdir -p ${HOME}/models
docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v ${HOME}/models:/models --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) openvino/model_server:weekly \
--rest_port 8000 --source_model OpenVINO/Qwen3-30B-A3B-Instruct-2507-int4-ov --model_repository_path /models --tool_parser hermes3 --target_device GPU --task text_generation --enable_tool_guided_generation true
```

--------------------------------

### Setup for Agentic Model Function Call Tests

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/accuracy/README.md

Initial setup steps for running agentic model tests using the Berkeley function call leaderboard. This includes cloning the repository, applying a patch, and installing dependencies.

```text
git clone https://github.com/ShishirPatil/gorilla
cd gorilla/berkeley-function-call-leaderboard
git checkout 9b8a5202544f49a846aced185a340361231ef3e1
curl -s https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/continuous_batching/accuracy/gorilla.patch | git apply -v
pip install -e . --extra-index-url "https://download.pytorch.org/whl/cpu"
```

--------------------------------

### gRPC Client Usage Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md

This example demonstrates how to run the gRPC client with specific parameters, including image and label paths, gRPC address, input/output names, and transpose settings. It also shows the expected output format, including processing times and classification accuracy.

```bash
python grpc_predict_resnet.py --grpc_port 9000 --images_numpy_path ../../imgs.npy --input_name 0 --output_name 1463 --transpose_input False --labels_numpy_path ../../lbs.npy
```

--------------------------------

### Start OpenVINO Model Server on Bare Metal (Linux/Windows)

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/ovms_quickstart.md

Starts the OpenVINO Model Server using the binary package. Ensure environment variables like LD_LIBRARY_PATH and PATH are set correctly on Linux, or run setupvars.bat on Windows.

```bat
ovms --model_name faster_rcnn --model_path model --port 9000
```

--------------------------------

### Deploy OVMS Binary with GPU Target on Windows

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/README.md

This command deploys the OpenVINO Model Server using a binary installation on Windows 11, specifically targeting a GPU for inference. Ensure OVMS is installed according to the baremetal deployment guide.

```bat
ovms.exe --model_repository_path c:\models --source_model OpenVINO/Qwen3-30B-A3B-Instruct-2507-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --rest_port 8000 --model_name Qwen3-30B-A3B-Instruct-2507-int4-ov
```

--------------------------------

### Stop Sleep Process in Docker Container

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/developer_guide.md

Install psmisc and use killall to stop the sleep process, allowing tests to start.

```bash
yum install psmisc; killall sleep
```

--------------------------------

### Run and Modify Example Applications

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/c_api_minimal_app/README.md

Run the pre-built demo applications within the Docker container or modify their source code. Access the source files and rebuild as needed.

```bash
docker run -it openvino/model_server-capi:latest
cat main_capi.c
cat main_capi.cpp
```

--------------------------------

### Get Server Metadata (Java gRPC)

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md

Retrieve server metadata using Java with gRPC. This example uses Netty for channel management.

```java
public static void main(String[] args) {
    ManagedChannel channel = ManagedChannelBuilder
                    .forAddress("localhost", 9000)
                    .usePlaintext().build();
    GRPCInferenceServiceBlockingStub grpc_stub = GRPCInferenceServiceGrpc.newBlockingStub(channel);

    ServerMetadataRequest.Builder request = ServerMetadataRequest.newBuilder();
    ServerMetadataResponse response = grpc_stub.serverMetadata(request.build());
    
    channel.shutdownNow();
}
```

--------------------------------

### Get Model Metadata API Example

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_server_rest_api_tfs.md

This command fetches the metadata for a specific model version. If the version is not specified, metadata for the latest version will be returned.

```bash
$ curl http://localhost:8001/v1/models/person-detection/versions/1/metadata
```

--------------------------------

### Get Model Metadata

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_server_rest_api_kfs.md

This example demonstrates how to use curl to request metadata for a specific model. Ensure the REST_URL and REST_PORT are correctly set in your environment.

```bash
$ curl http://localhost:8000/v2/models/resnet
{"name":"resnet","versions":["1"],"platform":"OpenVINO","inputs":[{"name":"0","datatype":"FP32","shape":[1,224,224,3]}],"outputs":[{"name":"1463","datatype":"FP32","shape":[1,1000]}]}
```

--------------------------------

### Example: Pull Qwen3-4B model on Baremetal Host

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/pull_optimum_cli.md

Example command to pull the Qwen/Qwen3-4B model on a baremetal host, preparing it for text generation with int8 weight format.

```bat
ovms --pull --source_model "Qwen/Qwen3-4B" --model_repository_path /models --model_name Qwen3-4B --task text_generation --weight-format int8
```

--------------------------------

### Start OpenVINO Model Server (Bare Metal Windows)

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/image_generation/README.md

This command starts the OpenVINO Model Server on Windows for image generation tasks. It requires the `setupvars` script to be run first. Ensure the model repository path and task are correctly specified.

```bat
mkdir models

ovms --rest_port 8000 ^
  --model_repository_path ./models/ ^
  --task image_generation ^
  --source_model OpenVINO/stable-diffusion-v1-5-int8-ov
```

--------------------------------

### Deploy Whisper Model on Bare Metal

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/audio/README.md

Starts the OpenVINO Model Server directly on the host machine. Ensure GPU drivers are installed if targeting GPU.

```bat
ovms --rest_port 8000 --model_path /models/openai/whisper-large-v3-turbo --model_name openai/whisper-large-v3-turbo
```

--------------------------------

### Get Server Metadata (Python REST)

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md

Use this snippet to retrieve server metadata via REST in Python. Ensure the Triton client library is installed.

```python
import tritonclient.http as httpclient

client = httpclient.InferenceServerClient("localhost:9000")
server_metadata = client.get_server_metadata()
```

--------------------------------

### Download Model and Prepare Directory

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/benchmark/python/README.md

Downloads a sample model (resnet50-binary-0001) and sets up the required directory structure for OVMS deployment.

```bash
mkdir workspace workspace/resnet50-binary-0001 workspace/resnet50-binary-0001/1
cd workspace/resnet50-binary-0001/1
wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml
wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin
cd ../../..
```

--------------------------------

### Get Server Metadata (Python gRPC)

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md

Use this snippet to retrieve server metadata via gRPC in Python. Ensure the Triton client library is installed.

```python
import tritonclient.grpc as grpcclient

client = grpcclient.InferenceServerClient("localhost:9000")
server_metadata = client.get_server_metadata()
```

--------------------------------

### Complete Graph Configuration with Python Nodes

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/python_support/reference.md

Example of a graph configuration with three sequential Python nodes. This setup defines graph inputs/outputs and node connections.

```protobuf
input_stream: "OVMS_PY_TENSOR:first_number"
output_stream: "OVMS_PY_TENSOR:last_number"

node {
  name: "first_python_node"
  calculator: "PythonExecutorCalculator"
  input_side_packet: "PYTHON_NODE_RESOURCES:py"
  input_stream: "INPUT:first_number"
  output_stream: "OUTPUT:second_number"
  node_options: {
    [type.googleapis.com / mediapipe.PythonExecutorCalculatorOptions]: {
      handler_path: "/ovms/workspace/incrementer.py"
    }
  }
}

node {
  name: "second_python_node"
  calculator: "PythonExecutorCalculator"
  input_side_packet: "PYTHON_NODE_RESOURCES:py"
  input_stream: "INPUT:second_number"
  output_stream: "OUTPUT:third_number"
  node_options: {
    [type.googleapis.com / mediapipe.PythonExecutorCalculatorOptions]: {
      handler_path: "/ovms/workspace/incrementer.py"
    }
  }
}

node {
  name: "third_python_node"
  calculator: "PythonExecutorCalculator"
  input_side_packet: "PYTHON_NODE_RESOURCES:py"
  input_stream: "INPUT:third_number"
  output_stream: "OUTPUT:last_number"
  node_options: {
    [type.googleapis.com / mediapipe.PythonExecutorCalculatorOptions]: {
      handler_path: "/ovms/workspace/incrementer.py"
    }
  }
}
```

--------------------------------

### Run OpenVINO Model Server with Docker

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/image_classification_using_tf_model/python/README.md

Start the OpenVINO Model Server using Docker, mounting the local model directory to the container. Ensure Docker is installed and running.

```bash
chmod -R 755 model
docker run -d -v $PWD/model:/models -p 9000:9000 openvino/model_server:latest --model_path /models --model_name resnet --port 9000
```

--------------------------------

### Install Client Dependencies and Run Demo

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/mediapipe/iris_tracking/README.md

Install required Python packages using pip, download a sample image, and then launch the MediaPipe Iris tracking client application. The client connects to the OVMS instance via gRPC.

```console
pip install -r requirements.txt
# download a sample image for analysis
wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/people/people2.jpeg
echo people2.jpeg>input_images.txt
# launch the client
python mediapipe_iris_tracking.py --grpc_port 9000 --images_list input_images.txt
```

--------------------------------

### Start Model Server with CLI

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/metrics.md

Use this command to start the model server with a specific model and enable metrics collection.

```bash
wget -N https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P models/resnet50/1
docker run -d -u $(id -u) -v $(pwd)/models:/models -p 9000:9000 -p 8000:8000 openvino/model_server:latest \
       --model_name resnet --model_path /models/resnet50 --port 9000 \
       --rest_port 8000 \
       --metrics_enable
```

--------------------------------

### gRPC Server Readiness Check (Help)

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md

Displays the help message for the grpc_server_ready.py script, detailing arguments for checking server readiness via gRPC.

```bash
python ./grpc_server_ready.py --help
```

--------------------------------

### Download and Prepare Model Export Script

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/agentic_ai/README.md

Download the model export script, install its dependencies, and create a directory for storing models. This is the initial setup for exporting models to OpenVINO format.

```text
# Download export script, install its dependencies and create directory for the models
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py
pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt
mkdir models
```

--------------------------------

### Start GenAI Model from Hugging Face (Docker)

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md

Use this command to serve a Hugging Face model with OVMS using Docker. Ensure Docker Engine is installed and the model repository path is accessible.

```text
docker run --user $(id -u):$(id -g) -p 9000:9000 -p 8000:8000 --rm -v <model_repository_path>:/models openvino/model_server:latest \
--port 8000 --rest_port 9000 --source_model <model_name_in_HF> --model_repository_path /models --model_name <ovms_servable_name> --target_device <DEVICE> --task <task> [TASK_SPECIFIC_OPTIONS]
```

--------------------------------

### Start OpenVINO Model Server

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_server_c_api.md

Initialize and start the OpenVINO Model Server using a configuration file. Ensure to call OVMS_ServerDelete to release resources when the server is no longer needed.

```c
OVMS_Server* server;
OVMS_ServerSettings settings;
OVMS_ModelsSettings modelsSettings;
OVMS_ServerNew(&server, &settings, &modelsSettings);
OVMS_ServerStartFromConfigurationFile(server, "path/to/config.json");
// ... schedule inferences ...
OVMS_ServerDelete(server);
```

--------------------------------

### Graph Configuration with Sparse Attention

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/llm/reference.md

An example of a graph.pbtxt configuration file demonstrating how to enable and configure sparse attention for the LLMExecutor calculator. This setup is used for optimizing attention mechanisms in large language models.

```protobuf
input_stream: "HTTP_REQUEST_PAYLOAD:input"
output_stream: "HTTP_RESPONSE_PAYLOAD:output"

node: {
  name: "LLMExecutor"
  calculator: "HttpLLMCalculator"
  input_stream: "LOOPBACK:loopback"
  input_stream: "HTTP_REQUEST_PAYLOAD:input"
  input_side_packet: "LLM_NODE_RESOURCES:llm"
  output_stream: "LOOPBACK:loopback"
  output_stream: "HTTP_RESPONSE_PAYLOAD:output"
  input_stream_info: {
    tag_index: 'LOOPBACK:0',
    back_edge: true
  }
  node_options: {
      [type.googleapis.com / mediapipe.LLMCalculatorOptions]: {
          models_path: "./",
          sparse_attention_config: {
            mode: TRISHAPE
            num_last_dense_tokens_in_prefill: 100
            num_retained_start_tokens_in_cache: 128
            num_retained_recent_tokens_in_cache: 1920
            xattention_threshold: 0.8
            xattention_block_size: 64
            xattention_stride: 8
          }
      }
  }
  input_stream_handler {
    input_stream_handler: "SyncSetInputStreamHandler",
    options {
      [mediapipe.SyncSetInputStreamHandlerOptions.ext] {
        sync_set {
          tag_index: "LOOPBACK:0"
        }
      }
    }
  }
}
```

--------------------------------

### Run Holistic Tracking Client Application

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/mediapipe/holistic_tracking/README.md

Install dependencies, download a sample image, and launch the Python client application to perform holistic tracking. Ensure the gRPC port matches the server configuration.

```bash
pip install -r requirements.txt
# download a sample image for analysis
curl -kL -o girl.jpeg https://cdn.pixabay.com/photo/2019/03/12/20/39/girl-4051811_960_720.jpg
echo girl.jpeg>input_images.txt
# launch the client
python mediapipe_holistic_tracking.py --grpc_port 9000 --images_list input_images.txt
```

--------------------------------

### Deploy Model Server on Bare Metal (Linux/Windows)

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/speculative_decoding/README.md

Starts the OpenVINO Model Server from the command line. Ensure environment variables are set correctly for your OS (LD_LIBRARY_PATH/PATH for Linux, setupvars for Windows). This command assumes models are in the ./models directory.

```bash
ovms --rest_port 8000 --rest_workers 2 --config_path ./models/config.json
```

--------------------------------

### C++ gRPC Request Prediction

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md

Example of sending a prediction request using the C++ gRPC client. This requires the triton client library for C++ and includes setup for input data and inference options.

```cpp
#include "grpc_client.h"

namespace tc = triton::client;
int main() {
    std::unique_ptr<tc::InferenceServerGrpcClient> client;
    tc::InferenceServerGrpcClient::Create(&client, "localhost:9000");

    std::vector<int64_t> shape{1, 10};
    tc::InferInput* input;
    tc::InferInput::Create(&input, "input_name", shape, "FP32");
    std::shared_ptr<tc::InferInput> input_ptr;
    input_ptr.reset(input);

    std::vector<float> input_data(10);
    for (size_t i = 0; i < 10; ++i) {
        input_data[i] = i;
    }
    std::vector<tc::InferInput*> inputs = {input_ptr.get()};
    tc::InferOptions options("model_name");
    tc::InferResult* result;
    input_ptr->AppendRaw(input_data);
    client->Infer(&result, options, inputs);
    input->Reset();
}
```

--------------------------------

### Deploy OVMS Container with CPU Target on Linux

Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/README.md

Use this command to start the OpenVINO Model Server container on a Linux system, targeting the CPU for inference. Ensure you have Docker installed and a directory for models mounted.

```bash
mkdir -p ${HOME}/models
docker run -it -p 8000:8000 --rm --user $(id -u):$(id -g) -v ${HOME}/models:/models/:rw openvino/model_server:weekly --model_repository_path /models --source_model OpenVINO/Qwen3-30B-A3B-Instruct-2507-int4-ov --task text_generation --target_device CPU --tool_parser hermes3 --rest_port 8000 --model_name Qwen3-30B-A3B-Instruct-2507-int4-ov
```

--------------------------------

### Install Dependencies and Run Inference Client

Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/ovms_quickstart.md

Install project dependencies using pip and then run the object detection client script. Ensure requirements.txt is present.

```bash
pip install --upgrade pip
pip install -r requirements.txt

python object_detection.py --image coco_bike.jpg --output output.jpg --service_url localhost:9000
```

--------------------------------

### Start OpenVINO Model Server

Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md

Download a sample model and start the OpenVINO Model Server using Docker. Ensure ports 8000 (REST) and 9000 (gRPC) are exposed.

```bash
wget -N https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P models/resnet50/1
docker run -d -u $(id -u) -v $(pwd)/models:/models -p 8000:8000 -p 9000:9000 openvino/model_server:latest --model_name resnet --model_path /models/resnet50 --port 9000 --rest_port 8000
```