### Setup Virtual Environment and Install Requirements Source: https://github.com/openvinotoolkit/model_server/blob/main/tests/performance/README.md Enable a virtual environment and install the necessary Python packages for performance testing. ```bash virtualenv .venv . .venv/bin/activate pip3 install -r ../requirements.txt ``` -------------------------------- ### Example: Serve Phi-3-mini Model on Baremetal Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md Example command to serve the 'Phi-3-mini-FastDraft-50M-int8-ov' model on a baremetal host. Requires OpenVINO Model Server installation and specifies GPU as the target device for text generation. ```bat ovms --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path models/ --model_name Phi-3-mini-FastDraft-50M-int8-ov --target_device GPU --task text_generation --port 8000 --rest_port 9000 ``` -------------------------------- ### Get Server Metadata Usage Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md This example demonstrates how to run the client to retrieve server metadata. Specify the gRPC port and address to connect to the server. ```Bash python ./grpc_server_metadata.py --grpc_port 9000 --grpc_address localhost ``` -------------------------------- ### Start Model Server on Baremetal Host Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md Execute this command to start the OpenVINO Model Server directly on a baremetal host. Ensure the OpenVINO Model Server package is installed. ```text ovms --model_path --model_name --port 9000 --rest_port 8000 --log_level DEBUG ``` -------------------------------- ### Device Plugin Configuration Example Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/parameters.md Specifies parameters for device plugins to optimize inference. For a full list of parameters, refer to the OpenVINO documentation and performance tuning guide. Example shown sets latency optimization. ```json {"PERFORMANCE_HINT": "LATENCY"} ``` -------------------------------- ### Download ResNet Model Example Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md This command downloads a ResNet model and its weights for demonstration purposes. Ensure you have `wget` installed. ```bash mkdir -p models/resnet/1 wget -P models/resnet/1 https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin wget -P models/resnet/1 https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml ``` -------------------------------- ### Install Necessary Packages Source: https://github.com/openvinotoolkit/model_server/blob/main/client/cpp/kserve-api/README.md Installs required packages for building and running the client library and samples. ```bash apt-get update && apt-get install cmake build-essential libssl-dev zlib1g-dev git rapidjson-dev python3 ``` -------------------------------- ### gRPC Server Readiness Check Usage Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/cpp/kserve-api/README.md Example of how to run the grpc_server_ready client to verify if the server is ready. ```bash ./grpc_server_ready --grpc_port 9000 --grpc_address localhost Server Ready: True ``` -------------------------------- ### Run Async Inference Client Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md This command demonstrates how to run the asynchronous inference client with specific parameters, including image paths, input/output names, and transpose settings. It shows sample output including image data range, processing start, and classification results. ```Bash python http_async_infer_resnet.py --http_port 8000 --images_numpy_path ../../imgs_nhwc.npy --labels_numpy_path ../../lbs.npy --input_name 0 --output_name 1463 --transpose_input False --model_name resnet ``` ```text Image data range: 0.0 : 255.0 Start processing: Model name: resnet Iterations: 10 Images numpy path: ../../imgs_nhwc.npy Numpy file shape: (10, 3, 224, 224) imagenet top results in a single batch: 0 airliner 404 ; Correct match. imagenet top results in a single batch: 0 Arctic fox, white fox, Alopex lagopus 279 ; Correct match. imagenet top results in a single batch: 0 bee 309 ; Correct match. imagenet top results in a single batch: 0 golden retriever 207 ; Correct match. imagenet top results in a single batch: 0 gorilla, Gorilla gorilla 366 ; Correct match. imagenet top results in a single batch: 0 magnetic compass 635 ; Correct match. imagenet top results in a single batch: 0 peacock 84 ; Correct match. imagenet top results in a single batch: 0 pelican 144 ; Correct match. imagenet top results in a single batch: 0 snail 113 ; Correct match. imagenet top results in a single batch: 0 zebra 340 ; Correct match. Classification accuracy: 100.00 ``` -------------------------------- ### Install vLLM and Download Dataset Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/scaling/README.md Installs the vLLM repository and downloads the necessary dataset for benchmarking. Ensure you are using the correct branch and have PyTorch with CPU support installed. ```bash git clone --branch v0.7.3 --depth 1 https://github.com/vllm-project/vllm cd vllm pip3 install -r requirements-cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu cd benchmarks curl -L https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json -o ShareGPT_V3_unfiltered_cleaned_split.json ``` -------------------------------- ### Start GenAI Model from Hugging Face (Baremetal) Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md Use this command to serve a Hugging Face model with OVMS on a baremetal host. Ensure the OpenVINO Model Server package is installed. ```text ovms --source_model --model_repository_path /models --model_name --target_device --task [TASK_SPECIFIC_OPTIONS] ``` -------------------------------- ### Get Model Readiness Usage Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md Run this client to check if a specific model is ready for inference. Provide the gRPC connection details and the model name. ```Bash python ./grpc_model_ready.py --grpc_port 9000 --grpc_address localhost --model_name resnet ``` -------------------------------- ### Start OpenVINO Model Server with GPU Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_cache.md Use this command to start the model server with GPU support. Ensure you have a 'model' directory with your models and a 'cache' directory. The first start populates the cache, making subsequent starts faster. ```bash $ mkdir cache $ chmod -R 755 model $ docker run -p 9000:9000 -d -u $(id -u):$(id -g) --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v ${PWD}/model/fdsample:/model:ro -v ${PWD}/cache:/opt/cache:rw openvino/model_server:latest-gpu --model_name model --model_path /model --target_device GPU --port 9000 ``` -------------------------------- ### Start Model Server with GPU Acceleration (Binary) Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/accelerators.md Launch the OpenVINO Model Server using the binary package with GPU acceleration enabled. Ensure necessary drivers and packages are installed. ```bat ovms --model_path model --model_name resnet --port 9000 --target_device GPU ``` -------------------------------- ### Install Dependencies and Download Models Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/mediapipe/object_detection/README.md Install required Python packages and download the necessary models for the demo. ```console pip install -r requirements.txt python mediapipe_object_detection.py --download_models ``` -------------------------------- ### gRPC Server Liveness Check Usage Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/cpp/kserve-api/README.md Example of how to run the grpc_server_live client to verify if the server is alive. ```bash ./grpc_server_live --grpc_port 9000 --grpc_address localhost Server Live: True ``` -------------------------------- ### Install Triton Client Package Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md Install the Triton client library with all dependencies using pip. ```bash pip3 install tritonclient[all] ``` -------------------------------- ### Install Requests and Download Image Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/vlm_npu/README.md Installs the necessary Python requests library and downloads a sample image for testing. ```bash pip3 install requests curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/static/images/zebra.jpeg -o zebra.jpeg ``` -------------------------------- ### Install virtualenv Package Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/developer_guide.md Install the 'virtualenv' package using pip3. This is a prerequisite for running functional tests in the guide. ```bash pip3 install virtualenv ``` -------------------------------- ### Prepare Virtual Environment and Install Dependencies Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md Navigate to the client samples directory, create a virtual environment, activate it, and install required Python packages. ```bash cd model_server/client/python/kserve-api/samples virtualenv .venv . .venv/bin/activate pip install -r requirements.txt ``` -------------------------------- ### Download and Install Dependencies Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/integration_with_OpenWebUI/README.md Download the export script and install necessary Python packages. This is a prerequisite for model preparation. ```bash curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt ``` -------------------------------- ### Install Client Requirements Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/python_demos/clip_image_classification/README.md Clone the repository, navigate to the demo directory, set up a virtual environment, and install necessary Python packages. ```bash git clone https://github.com/openvinotoolkit/model_server.git cd model_server/demos/python_demos/clip_image_classification/ virtualenv .venv . .venv/bin/activate pip3 install -r requirements.txt ``` -------------------------------- ### Start Mediamtx RTSP Server (Standalone) Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/real_time_stream_analysis/python/README.md Starts the Mediamtx RTSP server after it has been installed as a standalone binary or via winget. ```bat mediamtx ``` -------------------------------- ### gRPC Server Metadata Check Usage Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/cpp/kserve-api/README.md Example of how to run the grpc_server_metadata client to retrieve server information. ```bash ./grpc_server_metadata --grpc_port 9000 --grpc_address localhost Name: OpenVINO Model Server Version: 2022.2.c290da85 ``` -------------------------------- ### Get Model Metadata Usage Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md Example of using the `rest_get_model_metadata.py` script to retrieve metadata for a model. The output includes model specifications and signature definitions. ```bash python rest_get_model_metadata.py --rest_port 8000 ``` ```json { "modelSpec": { "name": "resnet", "signatureName": "", "version": "1" }, "metadata": { "signature_def": { "@type": "type.googleapis.com/tensorflow.serving.SignatureDefMap", "signatureDef": { "serving_default": { "inputs": { "0": { "dtype": "DT_FLOAT", "tensorShape": { "dim": [ { "size": "1", "name": "" }, { "size": "3", "name": "" }, { "size": "224", "name": "" }, { "size": "224", "name": "" } ], "unknownRank": false }, "name": "0" } }, "outputs": { "1463": { "dtype": "DT_FLOAT", "tensorShape": { "dim": [ { "size": "1", "name": "" }, { "size": "1000", "name": "" } ], "unknownRank": false }, "name": "1463" } }, "methodName": "", "defaults": {} } } } } } ``` -------------------------------- ### Install Dependencies and Prepare Audio Samples Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/audio/README.md Installs necessary Python packages, creates a directory for audio samples, and downloads a sample audio file. This is a prerequisite for preparing speaker embeddings. ```console pip install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/audio/requirements.txt mkdir audio_samples curl --create-dirs "https://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0032_8k.wav" -o audio_samples/audio.wav curl --create-dirs https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/audio/create_speaker_embedding.py -o models/speakers/create_speaker_embedding.py python models/speakers/create_speaker_embedding.py audio_samples/audio.wav models/speakers/voice1.bin ``` -------------------------------- ### Get Model Status Usage Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md Example of how to use the `rest_get_model_status.py` script to retrieve the status of a specific model version. The output shows the version, state, and status of the model. ```bash python rest_get_model_status.py --rest_port 8000 --model_version 1 ``` ```json { "model_version_status": [ { "version": "1", "state": "AVAILABLE", "status": { "error_code": "OK", "error_message": "OK" } } ] } ``` -------------------------------- ### Build Client Library and Samples Source: https://github.com/openvinotoolkit/model_server/blob/main/client/go/kserve-api/README.md Build the Go client library and associated samples for the KServe API. This step is necessary before running the client-side examples. ```Bash cd client/go/kserve-api bash build.sh cd build ``` -------------------------------- ### Get Model Status API Example Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_server_rest_api_tfs.md Use this command to retrieve the status of a specific model version served by OpenVINO Model Server. The version can be omitted to get the status of all available versions. ```bash $ curl http://localhost:8001/v1/models/person-detection/versions/1 ``` -------------------------------- ### Start Model Server with Configuration File Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/metrics.md This option uses a JSON configuration file to define model settings and enable monitoring. Ensure the configuration file is correctly formatted. ```bash mkdir workspace wget -N https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P workspace/models/resnet50/1 echo '{ "model_config_list": [ { "config": { "name": "resnet", "base_path": "/workspace/models/resnet50" } } ], "monitoring": { "metrics": { "enable" : true } } }' >> workspace/config.json ``` ```bash docker run -d -u $(id -u) -v ${PWD}/workspace:/workspace -p 9000:9000 -p 8000:8000 openvino/model_server:latest \ --config_path /workspace/config.json \ --port 9000 --rest_port 8000 ``` -------------------------------- ### Get Model Metadata Usage Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md This example shows how to fetch metadata for a specific model using the gRPC client. Ensure you provide the correct gRPC address, port, and model name. ```Bash python ./grpc_model_metadata.py --grpc_port 9000 --grpc_address localhost --model_name resnet ``` -------------------------------- ### Prepare Python Client for String Input Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/universal-sentence-encoder/README.md Sets up the Python environment by installing dependencies and cloning the model server repository. This is a prerequisite for running the string input client script. ```bash git clone https://github.com/openvinotoolkit/model_server ``` ```bash pip install --upgrade pip ``` ```bash pip install -r model_server/demos/universal-sentence-encoder/requirements.txt ``` -------------------------------- ### Start Model Server Container Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/dynamic_shape_custom_node.md This command starts the OpenVINO Model Server Docker container, mounting the local models directory and exposing the gRPC port. Ensure Docker is installed and the image is pulled. ```bash docker run --rm -d -v ${PWD}:/models -p 9000:9000 openvino/model_server:latest --config_path /models/config.json --port 9000 ``` -------------------------------- ### Example: Serve Phi-3-mini Model with Docker Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md Example command to serve the 'Phi-3-mini-FastDraft-50M-int8-ov' model using Docker. Requires Docker Engine and specifies CPU as the target device for text generation. ```text docker run --user $(id -u):$(id -g) -p 9000:9000 -p 8000:8000 --rm -v :/models openvino/model_server:latest \ --port 8000 --rest_port 9000 --source_model "OpenVINO/Phi-3-mini-FastDraft-50M-int8-ov" --model_repository_path /models/ --model_name Phi-3-mini-FastDraft-50M-int8-ov --target_device CPU --task text_generation ``` -------------------------------- ### Example: Serve All Versions Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_version_policy.md This example configures the server to serve all available versions of a model. This can be useful for load balancing across all deployed versions. ```json {"all": {}} ``` -------------------------------- ### Download BERT Model and Start OVMS Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/bert_question_answering/python/README.md Downloads the BERT model files and starts the OpenVINO Model Server with the model configured for dynamic shapes. Ensure the model path and name match your setup. ```bash curl --create-dirs https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/bert-small-uncased-whole-word-masking-squad-int8-0002/FP32-INT8/bert-small-uncased-whole-word-masking-squad-int8-0002.bin https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/bert-small-uncased-whole-word-masking-squad-int8-0002/FP32-INT8/bert-small-uncased-whole-word-masking-squad-int8-0002.xml -o model/1/bert-small-uncased-whole-word-masking-squad-int8-0002.bin -o model/1/bert-small-uncased-whole-word-masking-squad-int8-0002.xml chmod -R 755 model docker run -d -v $(pwd)/model:/models -p 9000:9000 openvino/model_server:latest --model_path /models --model_name bert --port 9000 --shape '{"attention_mask": "(1,-1)", "input_ids": "(1,-1)", "position_ids": "(1,-1)", "token_type_ids": "(1,-1)"}' ``` -------------------------------- ### Install Client Dependencies and Run Demo Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/using_onnx_model/python/README.md Install the required Python packages for the client and run the ONNX model demo script, specifying the service URL. This client sends raw image data for inference. ```bash pip3 install -r requirements.txt python onnx_model_demo.py --service_url localhost:9001 ``` -------------------------------- ### Start Model Server with Docker Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md Use this command to start the OpenVINO Model Server using Docker. Ensure Docker Engine is installed and models are prepared. Ports 9000 (gRPC) and 8000 (REST) are exposed. ```text docker run -d --rm -v :/models -p 9000:9000 -p 8000:8000 openvino/model_server:latest \ --model_path --model_name --port 9000 --rest_port 8000 --log_level DEBUG ``` -------------------------------- ### Download Example Client Components Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/ovms_quickstart.md Downloads Python scripts and requirements for an example client application to interact with the OpenVINO Model Server, along with a COCO class names file. ```bash wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/object_detection/python/object_detection.py wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/object_detection/python/requirements.txt wget https://raw.githubusercontent.com/openvinotoolkit/open_model_zoo/master/data/dataset_classes/coco_91cl.txt ``` -------------------------------- ### Run Qwen3-30B-A3B-Instruct Model Server on GPU Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/agentic_ai/README.md Starts the OpenVINO Model Server with the Qwen3-30B-A3B-Instruct model, enabling tool-guided generation on GPU. This command is for a larger, more capable model. ```bash mkdir -p ${HOME}/models docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v ${HOME}/models:/models --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) openvino/model_server:weekly \ --rest_port 8000 --source_model OpenVINO/Qwen3-30B-A3B-Instruct-2507-int4-ov --model_repository_path /models --tool_parser hermes3 --target_device GPU --task text_generation --enable_tool_guided_generation true ``` -------------------------------- ### Setup for Agentic Model Function Call Tests Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/accuracy/README.md Initial setup steps for running agentic model tests using the Berkeley function call leaderboard. This includes cloning the repository, applying a patch, and installing dependencies. ```text git clone https://github.com/ShishirPatil/gorilla cd gorilla/berkeley-function-call-leaderboard git checkout 9b8a5202544f49a846aced185a340361231ef3e1 curl -s https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/continuous_batching/accuracy/gorilla.patch | git apply -v pip install -e . --extra-index-url "https://download.pytorch.org/whl/cpu" ``` -------------------------------- ### gRPC Client Usage Example Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md This example demonstrates how to run the gRPC client with specific parameters, including image and label paths, gRPC address, input/output names, and transpose settings. It also shows the expected output format, including processing times and classification accuracy. ```bash python grpc_predict_resnet.py --grpc_port 9000 --images_numpy_path ../../imgs.npy --input_name 0 --output_name 1463 --transpose_input False --labels_numpy_path ../../lbs.npy ``` -------------------------------- ### Start OpenVINO Model Server on Bare Metal (Linux/Windows) Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/ovms_quickstart.md Starts the OpenVINO Model Server using the binary package. Ensure environment variables like LD_LIBRARY_PATH and PATH are set correctly on Linux, or run setupvars.bat on Windows. ```bat ovms --model_name faster_rcnn --model_path model --port 9000 ``` -------------------------------- ### Deploy OVMS Binary with GPU Target on Windows Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/README.md This command deploys the OpenVINO Model Server using a binary installation on Windows 11, specifically targeting a GPU for inference. Ensure OVMS is installed according to the baremetal deployment guide. ```bat ovms.exe --model_repository_path c:\models --source_model OpenVINO/Qwen3-30B-A3B-Instruct-2507-int4-ov --task text_generation --target_device GPU --tool_parser hermes3 --rest_port 8000 --model_name Qwen3-30B-A3B-Instruct-2507-int4-ov ``` -------------------------------- ### Stop Sleep Process in Docker Container Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/developer_guide.md Install psmisc and use killall to stop the sleep process, allowing tests to start. ```bash yum install psmisc; killall sleep ``` -------------------------------- ### Run and Modify Example Applications Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/c_api_minimal_app/README.md Run the pre-built demo applications within the Docker container or modify their source code. Access the source files and rebuild as needed. ```bash docker run -it openvino/model_server-capi:latest cat main_capi.c cat main_capi.cpp ``` -------------------------------- ### Get Server Metadata (Java gRPC) Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md Retrieve server metadata using Java with gRPC. This example uses Netty for channel management. ```java public static void main(String[] args) { ManagedChannel channel = ManagedChannelBuilder .forAddress("localhost", 9000) .usePlaintext().build(); GRPCInferenceServiceBlockingStub grpc_stub = GRPCInferenceServiceGrpc.newBlockingStub(channel); ServerMetadataRequest.Builder request = ServerMetadataRequest.newBuilder(); ServerMetadataResponse response = grpc_stub.serverMetadata(request.build()); channel.shutdownNow(); } ``` -------------------------------- ### Get Model Metadata API Example Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_server_rest_api_tfs.md This command fetches the metadata for a specific model version. If the version is not specified, metadata for the latest version will be returned. ```bash $ curl http://localhost:8001/v1/models/person-detection/versions/1/metadata ``` -------------------------------- ### Get Model Metadata Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_server_rest_api_kfs.md This example demonstrates how to use curl to request metadata for a specific model. Ensure the REST_URL and REST_PORT are correctly set in your environment. ```bash $ curl http://localhost:8000/v2/models/resnet {"name":"resnet","versions":["1"],"platform":"OpenVINO","inputs":[{"name":"0","datatype":"FP32","shape":[1,224,224,3]}],"outputs":[{"name":"1463","datatype":"FP32","shape":[1,1000]}]} ``` -------------------------------- ### Example: Pull Qwen3-4B model on Baremetal Host Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/pull_optimum_cli.md Example command to pull the Qwen/Qwen3-4B model on a baremetal host, preparing it for text generation with int8 weight format. ```bat ovms --pull --source_model "Qwen/Qwen3-4B" --model_repository_path /models --model_name Qwen3-4B --task text_generation --weight-format int8 ``` -------------------------------- ### Start OpenVINO Model Server (Bare Metal Windows) Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/image_generation/README.md This command starts the OpenVINO Model Server on Windows for image generation tasks. It requires the `setupvars` script to be run first. Ensure the model repository path and task are correctly specified. ```bat mkdir models ovms --rest_port 8000 ^ --model_repository_path ./models/ ^ --task image_generation ^ --source_model OpenVINO/stable-diffusion-v1-5-int8-ov ``` -------------------------------- ### Deploy Whisper Model on Bare Metal Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/audio/README.md Starts the OpenVINO Model Server directly on the host machine. Ensure GPU drivers are installed if targeting GPU. ```bat ovms --rest_port 8000 --model_path /models/openai/whisper-large-v3-turbo --model_name openai/whisper-large-v3-turbo ``` -------------------------------- ### Get Server Metadata (Python REST) Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md Use this snippet to retrieve server metadata via REST in Python. Ensure the Triton client library is installed. ```python import tritonclient.http as httpclient client = httpclient.InferenceServerClient("localhost:9000") server_metadata = client.get_server_metadata() ``` -------------------------------- ### Download Model and Prepare Directory Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/benchmark/python/README.md Downloads a sample model (resnet50-binary-0001) and sets up the required directory structure for OVMS deployment. ```bash mkdir workspace workspace/resnet50-binary-0001 workspace/resnet50-binary-0001/1 cd workspace/resnet50-binary-0001/1 wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin cd ../../.. ``` -------------------------------- ### Get Server Metadata (Python gRPC) Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md Use this snippet to retrieve server metadata via gRPC in Python. Ensure the Triton client library is installed. ```python import tritonclient.grpc as grpcclient client = grpcclient.InferenceServerClient("localhost:9000") server_metadata = client.get_server_metadata() ``` -------------------------------- ### Complete Graph Configuration with Python Nodes Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/python_support/reference.md Example of a graph configuration with three sequential Python nodes. This setup defines graph inputs/outputs and node connections. ```protobuf input_stream: "OVMS_PY_TENSOR:first_number" output_stream: "OVMS_PY_TENSOR:last_number" node { name: "first_python_node" calculator: "PythonExecutorCalculator" input_side_packet: "PYTHON_NODE_RESOURCES:py" input_stream: "INPUT:first_number" output_stream: "OUTPUT:second_number" node_options: { [type.googleapis.com / mediapipe.PythonExecutorCalculatorOptions]: { handler_path: "/ovms/workspace/incrementer.py" } } } node { name: "second_python_node" calculator: "PythonExecutorCalculator" input_side_packet: "PYTHON_NODE_RESOURCES:py" input_stream: "INPUT:second_number" output_stream: "OUTPUT:third_number" node_options: { [type.googleapis.com / mediapipe.PythonExecutorCalculatorOptions]: { handler_path: "/ovms/workspace/incrementer.py" } } } node { name: "third_python_node" calculator: "PythonExecutorCalculator" input_side_packet: "PYTHON_NODE_RESOURCES:py" input_stream: "INPUT:third_number" output_stream: "OUTPUT:last_number" node_options: { [type.googleapis.com / mediapipe.PythonExecutorCalculatorOptions]: { handler_path: "/ovms/workspace/incrementer.py" } } } ``` -------------------------------- ### Run OpenVINO Model Server with Docker Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/image_classification_using_tf_model/python/README.md Start the OpenVINO Model Server using Docker, mounting the local model directory to the container. Ensure Docker is installed and running. ```bash chmod -R 755 model docker run -d -v $PWD/model:/models -p 9000:9000 openvino/model_server:latest --model_path /models --model_name resnet --port 9000 ``` -------------------------------- ### Install Client Dependencies and Run Demo Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/mediapipe/iris_tracking/README.md Install required Python packages using pip, download a sample image, and then launch the MediaPipe Iris tracking client application. The client connects to the OVMS instance via gRPC. ```console pip install -r requirements.txt # download a sample image for analysis wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/people/people2.jpeg echo people2.jpeg>input_images.txt # launch the client python mediapipe_iris_tracking.py --grpc_port 9000 --images_list input_images.txt ``` -------------------------------- ### Start Model Server with CLI Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/metrics.md Use this command to start the model server with a specific model and enable metrics collection. ```bash wget -N https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P models/resnet50/1 docker run -d -u $(id -u) -v $(pwd)/models:/models -p 9000:9000 -p 8000:8000 openvino/model_server:latest \ --model_name resnet --model_path /models/resnet50 --port 9000 \ --rest_port 8000 \ --metrics_enable ``` -------------------------------- ### gRPC Server Readiness Check (Help) Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md Displays the help message for the grpc_server_ready.py script, detailing arguments for checking server readiness via gRPC. ```bash python ./grpc_server_ready.py --help ``` -------------------------------- ### Download and Prepare Model Export Script Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/agentic_ai/README.md Download the model export script, install its dependencies, and create a directory for storing models. This is the initial setup for exporting models to OpenVINO format. ```text # Download export script, install its dependencies and create directory for the models curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/export_model.py -o export_model.py pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/main/demos/common/export_models/requirements.txt mkdir models ``` -------------------------------- ### Start GenAI Model from Hugging Face (Docker) Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/starting_server.md Use this command to serve a Hugging Face model with OVMS using Docker. Ensure Docker Engine is installed and the model repository path is accessible. ```text docker run --user $(id -u):$(id -g) -p 9000:9000 -p 8000:8000 --rm -v :/models openvino/model_server:latest \ --port 8000 --rest_port 9000 --source_model --model_repository_path /models --model_name --target_device --task [TASK_SPECIFIC_OPTIONS] ``` -------------------------------- ### Start OpenVINO Model Server Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/model_server_c_api.md Initialize and start the OpenVINO Model Server using a configuration file. Ensure to call OVMS_ServerDelete to release resources when the server is no longer needed. ```c OVMS_Server* server; OVMS_ServerSettings settings; OVMS_ModelsSettings modelsSettings; OVMS_ServerNew(&server, &settings, &modelsSettings); OVMS_ServerStartFromConfigurationFile(server, "path/to/config.json"); // ... schedule inferences ... OVMS_ServerDelete(server); ``` -------------------------------- ### Graph Configuration with Sparse Attention Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/llm/reference.md An example of a graph.pbtxt configuration file demonstrating how to enable and configure sparse attention for the LLMExecutor calculator. This setup is used for optimizing attention mechanisms in large language models. ```protobuf input_stream: "HTTP_REQUEST_PAYLOAD:input" output_stream: "HTTP_RESPONSE_PAYLOAD:output" node: { name: "LLMExecutor" calculator: "HttpLLMCalculator" input_stream: "LOOPBACK:loopback" input_stream: "HTTP_REQUEST_PAYLOAD:input" input_side_packet: "LLM_NODE_RESOURCES:llm" output_stream: "LOOPBACK:loopback" output_stream: "HTTP_RESPONSE_PAYLOAD:output" input_stream_info: { tag_index: 'LOOPBACK:0', back_edge: true } node_options: { [type.googleapis.com / mediapipe.LLMCalculatorOptions]: { models_path: "./", sparse_attention_config: { mode: TRISHAPE num_last_dense_tokens_in_prefill: 100 num_retained_start_tokens_in_cache: 128 num_retained_recent_tokens_in_cache: 1920 xattention_threshold: 0.8 xattention_block_size: 64 xattention_stride: 8 } } } input_stream_handler { input_stream_handler: "SyncSetInputStreamHandler", options { [mediapipe.SyncSetInputStreamHandlerOptions.ext] { sync_set { tag_index: "LOOPBACK:0" } } } } } ``` -------------------------------- ### Run Holistic Tracking Client Application Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/mediapipe/holistic_tracking/README.md Install dependencies, download a sample image, and launch the Python client application to perform holistic tracking. Ensure the gRPC port matches the server configuration. ```bash pip install -r requirements.txt # download a sample image for analysis curl -kL -o girl.jpeg https://cdn.pixabay.com/photo/2019/03/12/20/39/girl-4051811_960_720.jpg echo girl.jpeg>input_images.txt # launch the client python mediapipe_holistic_tracking.py --grpc_port 9000 --images_list input_images.txt ``` -------------------------------- ### Deploy Model Server on Bare Metal (Linux/Windows) Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/speculative_decoding/README.md Starts the OpenVINO Model Server from the command line. Ensure environment variables are set correctly for your OS (LD_LIBRARY_PATH/PATH for Linux, setupvars for Windows). This command assumes models are in the ./models directory. ```bash ovms --rest_port 8000 --rest_workers 2 --config_path ./models/config.json ``` -------------------------------- ### C++ gRPC Request Prediction Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/clients_kfs.md Example of sending a prediction request using the C++ gRPC client. This requires the triton client library for C++ and includes setup for input data and inference options. ```cpp #include "grpc_client.h" namespace tc = triton::client; int main() { std::unique_ptr client; tc::InferenceServerGrpcClient::Create(&client, "localhost:9000"); std::vector shape{1, 10}; tc::InferInput* input; tc::InferInput::Create(&input, "input_name", shape, "FP32"); std::shared_ptr input_ptr; input_ptr.reset(input); std::vector input_data(10); for (size_t i = 0; i < 10; ++i) { input_data[i] = i; } std::vector inputs = {input_ptr.get()}; tc::InferOptions options("model_name"); tc::InferResult* result; input_ptr->AppendRaw(input_data); client->Infer(&result, options, inputs); input->Reset(); } ``` -------------------------------- ### Deploy OVMS Container with CPU Target on Linux Source: https://github.com/openvinotoolkit/model_server/blob/main/demos/continuous_batching/README.md Use this command to start the OpenVINO Model Server container on a Linux system, targeting the CPU for inference. Ensure you have Docker installed and a directory for models mounted. ```bash mkdir -p ${HOME}/models docker run -it -p 8000:8000 --rm --user $(id -u):$(id -g) -v ${HOME}/models:/models/:rw openvino/model_server:weekly --model_repository_path /models --source_model OpenVINO/Qwen3-30B-A3B-Instruct-2507-int4-ov --task text_generation --target_device CPU --tool_parser hermes3 --rest_port 8000 --model_name Qwen3-30B-A3B-Instruct-2507-int4-ov ``` -------------------------------- ### Install Dependencies and Run Inference Client Source: https://github.com/openvinotoolkit/model_server/blob/main/docs/ovms_quickstart.md Install project dependencies using pip and then run the object detection client script. Ensure requirements.txt is present. ```bash pip install --upgrade pip pip install -r requirements.txt python object_detection.py --image coco_bike.jpg --output output.jpg --service_url localhost:9000 ``` -------------------------------- ### Start OpenVINO Model Server Source: https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md Download a sample model and start the OpenVINO Model Server using Docker. Ensure ports 8000 (REST) and 9000 (gRPC) are exposed. ```bash wget -N https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.{xml,bin} -P models/resnet50/1 docker run -d -u $(id -u) -v $(pwd)/models:/models -p 8000:8000 -p 9000:9000 openvino/model_server:latest --model_name resnet --model_path /models/resnet50 --port 9000 --rest_port 8000 ```