KServe API usage samples

March 3, 2025 · View on GitHub

OpenVINO Model Server 2022.2 release introduced support for KServe API.

This guide shows how to interact with KServe API endpoints on both gRPC and HTTP interfaces. It covers following topics:

GRPC API Examples
HTTP API Example

Note: Some of the samples will use ResNet50.

Before you run the samples

Clone OpenVINO™ Model Server GitHub repository and enter model_server directory.

git clone https://github.com/openvinotoolkit/model_server.git

Prepare virtualenv

cd model_server/client/python/kserve-api/samples
virtualenv .venv
. .venv/bin/activate
pip install -r requirements.txt

Download the Pretrained Model

Download the model files and store them in the models directory

mkdir -p models/resnet/1
curl https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.bin https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/2/resnet50-binary-0001/FP32-INT1/resnet50-binary-0001.xml -o models/resnet/1/resnet50-binary-0001.bin -o models/resnet/1/resnet50-binary-0001.xml

Pull the Latest Model Server Image

Pull the latest version of OpenVINO™ Model Server from Docker Hub :

docker pull openvino/model_server:latest

Start the Model Server Container with Downloaded Model

Start the server container with the image pulled in the previous step and mount the models directory :

docker run --rm -d -v $(pwd)/models:/models -p 9000:9000 -p 8000:8000 openvino/model_server:latest --model_name resnet --model_path /models/resnet --port 9000 --rest_port 8000 --layout NHWC:NCHW

Note: The model default setting is to accept inputs in layout NCHW, but we change it to NHWC to make it work with samples using either regular, array-like input data or JPEG encoded images.

Once you finish above steps, you are ready to run the samples.

GRPC Examples

Run the Client to get server liveness

Command

python ./grpc_server_live.py --help
usage: grpc_server_live.py [-h] [--grpc_address GRPC_ADDRESS] [--grpc_port GRPC_PORT]

Sends request via KServe gRPC API to check if server is alive.

optional arguments:
  -h, --help            show this help message and exit
  --grpc_address GRPC_ADDRESS
                        Specify url to grpc service. default:localhost
  --grpc_port GRPC_PORT
                        Specify port to grpc service. default: 9000

Usage Example

python ./grpc_server_live.py --grpc_port 9000 --grpc_address localhost
Server Live: True

Run the Client to get server readiness

Command

python ./grpc_server_ready.py --help
usage: grpc_server_ready.py [-h] [--grpc_address GRPC_ADDRESS] [--grpc_port GRPC_PORT]

Sends request via KServe gRPC API to check if server is ready.

optional arguments:
  -h, --help            show this help message and exit
  --grpc_address GRPC_ADDRESS
                        Specify url to grpc service. default:localhost
  --grpc_port GRPC_PORT
                        Specify port to grpc service. default: 9000

Usage Example

python ./grpc_server_ready.py --grpc_port 9000 --grpc_address localhost
Server Ready: True

Run the Client to get server metadata

Command

python ./grpc_server_metadata.py --help
usage: grpc_server_metadata.py [-h] [--grpc_address GRPC_ADDRESS] [--grpc_port GRPC_PORT]

Sends request via KServe gRPC API to get server metadata.

optional arguments:
  -h, --help            show this help message and exit
  --grpc_address GRPC_ADDRESS
                        Specify url to gRPC service. default:localhost
  --grpc_port GRPC_PORT
                        Specify port to gRPC service. default: 9000

Usage Example

python ./grpc_server_metadata.py --grpc_port 9000 --grpc_address localhost
name: "OpenVINO Model Server"
version: "2022.2.c290da85"

Run the Client to get model readiness

Command

python ./grpc_model_ready.py --help
usage: grpc_model_ready.py [-h] [--grpc_address GRPC_ADDRESS] [--grpc_port GRPC_PORT] [--model_name MODEL_NAME] [--model_version MODEL_VERSION]

Sends requests via KServe gRPC API to check if model is ready for inference.

optional arguments:
  -h, --help            show this help message and exit
  --grpc_address GRPC_ADDRESS
                        Specify url to grpc service. default:localhost
  --grpc_port GRPC_PORT
                        Specify port to grpc service. default: 9000
  --model_name MODEL_NAME
                        Define model name, must be same as is in service. default: resnet
  --model_version MODEL_VERSION
                        Define model version. If not specified, the default version will be taken from model server

Usage Example

python ./grpc_model_ready.py --grpc_port 9000 --grpc_address localhost --model_name resnet
Model Ready: True

Run the Client to get metadata

Command

python ./grpc_model_metadata.py --help
usage: grpc_model_metadata.py [-h] [--grpc_address GRPC_ADDRESS] [--grpc_port GRPC_PORT] [--model_name MODEL_NAME] [--model_version MODEL_VERSION]

Sends requests via KServe gRPC API to get model metadata

optional arguments:
  -h, --help            show this help message and exit
  --grpc_address GRPC_ADDRESS
                        Specify url to grpc service. default:localhost
  --grpc_port GRPC_PORT
                        Specify port to grpc service. default: 9000
  --model_name MODEL_NAME
                        Define model name, must be same as is in service. default: resnet
  --model_version MODEL_VERSION
                        Define model version. If not specified, the default version will be taken from model server

Usage Example

python ./grpc_model_metadata.py --grpc_port 9000 --grpc_address localhost --model_name resnet
model metadata:
name: "resnet"
versions: "1"
platform: "OpenVINO"
inputs {
  name: "0"
  datatype: "FP32"
  shape: 1
  shape: 224
  shape: 224
  shape: 3
}
outputs {
  name: "1463"
  datatype: "FP32"
  shape: 1
  shape: 1000
}

Run the Client to perform inference

Command

python grpc_infer_resnet.py --help
usage: grpc_infer_resnet.py [-h] --images_numpy_path IMAGES_NUMPY_PATH [--labels_numpy_path LABELS_NUMPY_PATH] [--grpc_address GRPC_ADDRESS]
                            [--grpc_port GRPC_PORT] [--input_name INPUT_NAME] [--output_name OUTPUT_NAME] [--transpose_input {False,True}]
                            [--transpose_method {nchw2nhwc,nhwc2nchw}] [--iterations ITERATIONS] [--batchsize BATCHSIZE] [--model_name MODEL_NAME]
                            [--pipeline_name PIPELINE_NAME] [--dag-batch-size-auto] [--tls] [--server_cert SERVER_CERT] [--client_cert CLIENT_CERT] [--client_key CLIENT_KEY]

Sends requests via KServe gRPC API using images in numpy format. It displays performance statistics and optionally the model accuracy

optional arguments:
  -h, --help            show this help message and exit
  --images_numpy_path IMAGES_NUMPY_PATH
                        numpy in shape [n,w,h,c] or [n,c,h,w]
  --labels_numpy_path LABELS_NUMPY_PATH
                        numpy in shape [n,1] - can be used to check model accuracy
  --grpc_address GRPC_ADDRESS
                        Specify url to grpc service. default:localhost
  --grpc_port GRPC_PORT
                        Specify port to grpc service. default: 9000
  --input_name INPUT_NAME
                        Specify input tensor name. default: input
  --output_name OUTPUT_NAME
                        Specify output name. default: resnet_v1_50/predictions/Reshape_1
  --transpose_input {False,True}
                        Set to False to skip NHWC>NCHW or NCHW>NHWC input transposing. default: True
  --transpose_method {nchw2nhwc,nhwc2nchw}
                        How the input transposition should be executed: nhwc2nchw or nchw2nhwc
  --iterations ITERATIONS
                        Number of requests iterations, as default use number of images in numpy memmap. default: 0 (consume all frames)
  --batchsize BATCHSIZE
                        Number of images in a single request. default: 1
  --model_name MODEL_NAME
                        Define model name, must be same as is in service. default: resnet
  --pipeline_name PIPELINE_NAME
                        Define pipeline name, must be same as is in service
  --dag-batch-size-auto
                        Add demultiplexer dimension at front
  --tls                 use TLS communication with gRPC endpoint
  --server_cert SERVER_CERT
                        Path to server certificate
  --client_cert CLIENT_CERT
                        Path to client certificate
  --client_key CLIENT_KEY
                        Path to client key

Usage Example

python grpc_infer_resnet.py --grpc_port 9000 --images_numpy_path ../../imgs_nhwc.npy --labels_numpy_path ../../lbs.npy --input_name 0 --output_name 1463 --model_name resnet --transpose_input False
Image data range: 0.0 : 255.0
Start processing:
        Model name: resnet
        Iterations: 10
        Images numpy path: ../../imgs_nhwc.npy
        Numpy file shape: (10, 224, 224, 3)

Iteration 1; Processing time: 29.98 ms; speed 33.36 fps
imagenet top results in a single batch:
         0 airliner 404 ; Correct match.
Iteration 2; Processing time: 23.50 ms; speed 42.56 fps
imagenet top results in a single batch:
         0 Arctic fox, white fox, Alopex lagopus 279 ; Correct match.
Iteration 3; Processing time: 23.42 ms; speed 42.71 fps
imagenet top results in a single batch:
         0 bee 309 ; Correct match.
Iteration 4; Processing time: 23.25 ms; speed 43.01 fps
imagenet top results in a single batch:
         0 golden retriever 207 ; Correct match.
Iteration 5; Processing time: 24.08 ms; speed 41.53 fps
imagenet top results in a single batch:
         0 gorilla, Gorilla gorilla 366 ; Correct match.
Iteration 6; Processing time: 26.38 ms; speed 37.91 fps
imagenet top results in a single batch:
         0 magnetic compass 635 ; Correct match.
Iteration 7; Processing time: 25.46 ms; speed 39.27 fps
imagenet top results in a single batch:
         0 peacock 84 ; Correct match.
Iteration 8; Processing time: 24.60 ms; speed 40.65 fps
imagenet top results in a single batch:
         0 pelican 144 ; Correct match.
Iteration 9; Processing time: 23.52 ms; speed 42.52 fps
imagenet top results in a single batch:
         0 snail 113 ; Correct match.
Iteration 10; Processing time: 22.40 ms; speed 44.64 fps
imagenet top results in a single batch:
         0 zebra 340 ; Correct match.

processing time for all iterations
average time: 24.20 ms; average speed: 41.32 fps
median time: 23.50 ms; median speed: 42.55 fps
max time: 29.00 ms; min speed: 34.48 fps
min time: 22.00 ms; max speed: 45.45 fps
time percentile 90: 26.30 ms; speed percentile 90: 38.02 fps
time percentile 50: 23.50 ms; speed percentile 50: 42.55 fps
time standard deviation: 1.94
time variance: 3.76
Classification accuracy: 100.00

Run the Client to perform inference with binary encoded image

Command

python grpc_infer_binary_resnet.py --help
usage: grpc_infer_binary_resnet.py [-h] [--images_list IMAGES_LIST] [--grpc_address GRPC_ADDRESS] [--grpc_port GRPC_PORT] [--input_name INPUT_NAME] [--output_name OUTPUT_NAME] [--batchsize BATCHSIZE]
                                   [--model_name MODEL_NAME] [--pipeline_name PIPELINE_NAME] [--tls]

Sends requests via KServe gRPC API using images in format supported by OpenCV. It displays performance statistics and optionally the model accuracy

optional arguments:
  -h, --help            show this help message and exit
  --images_list IMAGES_LIST
                        path to a file with a list of labeled images
  --grpc_address GRPC_ADDRESS
                        Specify url to grpc service. default:localhost
  --grpc_port GRPC_PORT
                        Specify port to grpc service. default: 9000
  --input_name INPUT_NAME
                        Specify input tensor name. default: input
  --output_name OUTPUT_NAME
                        Specify output name. default: resnet_v1_50/predictions/Reshape_1
  --batchsize BATCHSIZE
                        Number of images in a single request. default: 1
  --model_name MODEL_NAME
                        Define model name, must be same as is in service. default: resnet
  --pipeline_name PIPELINE_NAME
                        Define pipeline name, must be same as is in service
  --tls                 use TLS communication with GRPC endpoint

Usage Example

python grpc_infer_binary_resnet.py --grpc_port 9000 --images_list ../../resnet_input_images.txt --input_name 0 --output_name 1463 --model_name resnet
Start processing:
        Model name: resnet
Iteration 0; Processing time: 27.09 ms; speed 36.92 fps
imagenet top results in a single batch:
         0 airliner 404 ; Correct match.
Iteration 1; Processing time: 27.62 ms; speed 36.20 fps
imagenet top results in a single batch:
         0 Arctic fox, white fox, Alopex lagopus 279 ; Correct match.
Iteration 2; Processing time: 25.33 ms; speed 39.48 fps
imagenet top results in a single batch:
         0 bee 309 ; Correct match.
Iteration 3; Processing time: 23.54 ms; speed 42.47 fps
imagenet top results in a single batch:
         0 golden retriever 207 ; Correct match.
Iteration 4; Processing time: 21.62 ms; speed 46.25 fps
imagenet top results in a single batch:
         0 gorilla, Gorilla gorilla 366 ; Correct match.
Iteration 5; Processing time: 24.28 ms; speed 41.18 fps
imagenet top results in a single batch:
         0 magnetic compass 635 ; Correct match.
Iteration 6; Processing time: 23.52 ms; speed 42.51 fps
imagenet top results in a single batch:
         0 peacock 84 ; Correct match.
Iteration 7; Processing time: 24.10 ms; speed 41.49 fps
imagenet top results in a single batch:
         0 pelican 144 ; Correct match.
Iteration 8; Processing time: 25.36 ms; speed 39.44 fps
imagenet top results in a single batch:
         0 snail 113 ; Correct match.
Iteration 9; Processing time: 32.67 ms; speed 30.61 fps
imagenet top results in a single batch:
         0 zebra 340 ; Correct match.

processing time for all iterations
average time: 25.10 ms; average speed: 39.84 fps
median time: 24.50 ms; median speed: 40.82 fps
max time: 32.00 ms; min speed: 31.25 fps
min time: 21.00 ms; max speed: 47.62 fps
time percentile 90: 27.50 ms; speed percentile 90: 36.36 fps
time percentile 50: 24.50 ms; speed percentile 50: 40.82 fps
time standard deviation: 2.88
time variance: 8.29
Classification accuracy: 100.00

Run the Client to perform asynchronous inference

Command

python grpc_async_infer_resnet.py --help
usage: grpc_async_infer_resnet.py [-h] --images_numpy_path IMAGES_NUMPY_PATH [--labels_numpy_path LABELS_NUMPY_PATH] [--grpc_address GRPC_ADDRESS] [--grpc_port GRPC_PORT] [--input_name INPUT_NAME]
                                  [--output_name OUTPUT_NAME] [--transpose_input {False,True}] [--transpose_method {nchw2nhwc,nhwc2nchw}] [--iterations ITERATIONS] [--batchsize BATCHSIZE] [--model_name MODEL_NAME]
                                  [--pipeline_name PIPELINE_NAME] [--dag-batch-size-auto] [--tls] [--server_cert SERVER_CERT] [--client_cert CLIENT_CERT] [--client_key CLIENT_KEY] [--timeout TIMEOUT]

Sends requests via KServe gRPC API using images in numpy format. It displays performance statistics and optionally the model accuracy

optional arguments:
  -h, --help            show this help message and exit
  --images_numpy_path IMAGES_NUMPY_PATH
                        numpy in shape [n,w,h,c] or [n,c,h,w]
  --labels_numpy_path LABELS_NUMPY_PATH
                        numpy in shape [n,1] - can be used to check model accuracy
  --grpc_address GRPC_ADDRESS
                        Specify url to grpc service. default:localhost
  --grpc_port GRPC_PORT
                        Specify port to grpc service. default: 9000
  --input_name INPUT_NAME
                        Specify input tensor name. default: input
  --output_name OUTPUT_NAME
                        Specify output name. default: resnet_v1_50/predictions/Reshape_1
  --transpose_input {False,True}
                        Set to False to skip NHWC>NCHW or NCHW>NHWC input transposing. default: True
  --transpose_method {nchw2nhwc,nhwc2nchw}
                        How the input transposition should be executed: nhwc2nchw or nchw2nhwc
  --iterations ITERATIONS
                        Number of requests iterations, as default use number of images in numpy memmap. default: 0 (consume all frames)
  --batchsize BATCHSIZE
                        Number of images in a single request. default: 1
  --model_name MODEL_NAME
                        Define model name, must be same as is in service. default: resnet
  --pipeline_name PIPELINE_NAME
                        Define pipeline name, must be same as is in service
  --dag-batch-size-auto
                        Add demultiplexer dimension at front
  --tls                 use TLS communication with gRPC endpoint
  --server_cert SERVER_CERT
                        Path to server certificate
  --client_cert CLIENT_CERT
                        Path to client certificate
  --client_key CLIENT_KEY
                        Path to client key
  --timeout TIMEOUT     Request timeout

Usage Example

python grpc_async_infer_resnet.py --grpc_port 9000 --images_numpy_path ../../imgs_nhwc.npy --labels_numpy_path ../../lbs.npy --input_name 0 --output_name 1463 --transpose_input False --model_name resnet
Image data range: 0.0 : 255.0
Start processing:
        Model name: resnet
        Iterations: 10
        Images numpy path: ../../imgs_nhwc.npy
        Numpy file shape: (10, 3, 224, 224)

imagenet top results in a single batch:
         0 peacock 84 ; Correct match.
imagenet top results in a single batch:
         0 gorilla, Gorilla gorilla 366 ; Correct match.
imagenet top results in a single batch:
         0 airliner 404 ; Correct match.
imagenet top results in a single batch:
         0 snail 113 ; Correct match.
imagenet top results in a single batch:
         0 bee 309 ; Correct match.
imagenet top results in a single batch:
         0 golden retriever 207 ; Correct match.
imagenet top results in a single batch:
         0 magnetic compass 635 ; Correct match.
imagenet top results in a single batch:
         0 Arctic fox, white fox, Alopex lagopus 279 ; Correct match.
imagenet top results in a single batch:
         0 pelican 144 ; Correct match.
imagenet top results in a single batch:
         0 zebra 340 ; Correct match.
Classification accuracy: 100.00

HTTP Examples