TensorFlow Serving API {#ovmsdocsclients_tfs}
September 5, 2025 ยท View on GitHub
---
maxdepth: 1
hidden:
---
gRPC API <ovms_docs_grpc_api_tfs>
RESTful API <ovms_docs_rest_api_tfs>
Examples <https://github.com/openvinotoolkit/model_server/blob/main/client/python/tensorflow-serving-api/samples/README.md>
Python Client
When creating a Python-based client application, you can use tensorflow-serving-api package that can be used with OpenVINO Model Server.
Install the Package
::::{tab-set} :::{tab-item} tensorflow-serving-api :sync: tensorflow-serving-api
pip3 install tensorflow-serving-api
::: ::::
Request Model Status
::::{tab-set} :::{tab-item} tensorflow-serving-api :sync: tensorflow-serving-api
import grpc
from tensorflow_serving.apis import model_service_pb2_grpc, get_model_status_pb2
from tensorflow_serving.apis.get_model_status_pb2 import ModelVersionStatus
channel = grpc.insecure_channel("localhost:9000")
model_service_stub = model_service_pb2_grpc.ModelServiceStub(channel)
status_request = get_model_status_pb2.GetModelStatusRequest()
status_request.model_spec.name = "my_model"
status_response = model_service_stub.GetModelStatus(status_request, 10.0)
model_status = {}
model_version_status = status_response.model_version_status
for model_version in model_version_status:
model_status[model_version.version] = dict([
('state', ModelVersionStatus.State.Name(model_version.state)),
('error_code', model_version.status.error_code),
('error_message', model_version.status.error_message),
])
::: :::{tab-item} curl :sync: curl
curl http://localhost:8000/v1/models/my_model
::: ::::
Request Model Metadata
::::{tab-set} :::{tab-item} tensorflow-serving-api :sync: tensorflow-serving-api
import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc, get_model_metadata_pb2
from tensorflow.core.framework.types_pb2 import DataType
channel = grpc.insecure_channel("localhost:9000")
prediction_service_stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
metadata_request = get_model_metadata_pb2.GetModelMetadataRequest()
metadata_request.model_spec.name = "my_model"
metadata_response = prediction_service_stub.GetModelMetadata(metadata_request, 10.0)
model_metadata = {}
signature_def = metadata_response.metadata['signature_def']
signature_map = get_model_metadata_pb2.SignatureDefMap()
signature_map.ParseFromString(signature_def.value)
model_signature = signature_map.ListFields()[0][1]['serving_default']
inputs_metadata = {}
for input_name, input_info in model_signature.inputs.items():
input_shape = [d.size for d in input_info.tensor_shape.dim]
inputs_metadata[input_name] = dict([
("shape", input_shape),
("dtype", DataType.Name(input_info.dtype))
])
outputs_metadata = {}
for output_name, output_info in model_signature.outputs.items():
output_shape = [d.size for d in output_info.tensor_shape.dim]
outputs_metadata[output_name] = dict([
("shape", output_shape),
("dtype", DataType.Name(output_info.dtype))
])
version = metadata_response.model_spec.version.value
model_metadata = dict([
("model_version", version),
("inputs", inputs_metadata),
("outputs", outputs_metadata)
])
::: :::{tab-item} curl :sync: curl
curl http://localhost:8000/v1/models/my_model/metadata
::: ::::
Request Prediction on a Binary Input
::::{tab-set} :::{tab-item} tensorflow-serving-api :sync: tensorflow-serving-api
import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc, predict_pb2
from tensorflow import make_tensor_proto, make_ndarray
channel = grpc.insecure_channel("localhost:9000")
prediction_service_stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
with open("img.jpeg", "rb") as f:
data = f.read()
predict_request = predict_pb2.PredictRequest()
predict_request.model_spec.name = "my_model"
predict_request.inputs["input_name"].CopyFrom(make_tensor_proto(data))
predict_response = prediction_service_stub.Predict(predict_request, 10.0)
results = make_ndarray(predict_response.outputs["output_name"])
::: :::{tab-item} curl :sync: curl
curl -X POST http://localhost:8000/v1/models/my_model:predict
-H 'Content-Type: application/json'
-d '{"instances": [{"input_name": {"b64":"YXdlc29tZSBpbWFnZSBieXRlcw=="}}]}'
::: ::::
Request Prediction on a Numpy Array
::::{tab-set} :::{tab-item} tensorflow-serving-api :sync: tensorflow-serving-api
import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc, predict_pb2
from tensorflow import make_tensor_proto
channel = grpc.insecure_channel("localhost:9000")
prediction_service_stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
data = np.array([1.0, 2.0, ..., 1000.0])
predict_request = predict_pb2.PredictRequest()
predict_request.model_spec.name = "my_model"
predict_request.inputs["input_name"].CopyFrom(make_tensor_proto(data))
predict_response = prediction_service_stub.Predict(predict_request, 10.0)
results = make_ndarray(predict_response.outputs["output_name"])
::: :::{tab-item} curl :sync: curl
curl -X POST http://localhost:8000/v1/models/my_model:predict
-H 'Content-Type: application/json'
-d '{"instances": [{"input_name": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}]}'
::: ::::
Request Prediction on a string
::::{tab-set} :::{tab-item} tensorflow-serving-api :sync: tensorflow-serving-api
import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc, predict_pb2
from tensorflow import make_tensor_proto
channel = grpc.insecure_channel("localhost:9000")
prediction_service_stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
data = ["<string>"]
predict_request = predict_pb2.PredictRequest()
predict_request.model_spec.name = "my_model"
predict_request.inputs["input_name"].CopyFrom(make_tensor_proto(data))
predict_response = prediction_service_stub.Predict(predict_request, 1)
results = predict_response.outputs["output_name"]
::: :::{tab-item} curl :sync: curl
curl -X POST http://localhost:8000/v1/models/my_model:predict
-H 'Content-Type: application/json'
-d '{"instances": [{"input_name": "<string>"}]}'
::: ::::
C++ and Go Clients
Creating a client application in C++ or Go follows the same principles as Python, but using them adds some complexity. There is no package or library available for them with convenient functions to interact with OpenVINO Model Server.
To successfully set up communication with the model server, you need to implement the logic to communicate with endpoints specified in the API. For gRPC, download and compile protos, then link and use them in your application according to the gRPC API specification. For REST, prepare your data and pack it into the appropriate JSON structure according to the REST API specification.
See our Go demo to learn how to build a sample Go-based client application in a Docker container and get predictions via the gRPC API.