Model server parameters

August 29, 2024 ยท View on GitHub

ParameterDescription
image_namemodel server docker image. The default is the latest public docker image
deployment_parameters.replicasnumber if model server replicas to be used. In case if enabled autoscaling, it defines the initial number of replicas
deployment_parameters.openshift_service_meshWhen the value is true, it adds the annotations enabling the models server deployment for OpenShift Service Mesh
deployment_parameters.extra_envs_secretSecret name including extra environment variables to be applied in the deployed pods oc create secret generic env_secret --from-file envfile.txt
deployment_parameters.extra_envs_configmapConfigmap name including extra environment variables to be applied in the deployed pods oc create configmap env_configmap --from-literal=ENVNAME=VALUE
service_parameters.grpc_portgRPC service port; the default value is 8080
service_parameters.rest_portREST API service port; the default value is 8081
service_parameters.service_typeservice type; the default value is ClusterIP
models_settings.single_model_modeset true if one one model should be deployed; value false indicate that config.json file should be used to configure multiple models
models_settings.config_configmap_nameConfig map hosting the config.json file
models_settings.config_pathPath to the config file in case it was mounted in the container via a persistent volume claim
models_settings.model_nameModel name to be used on the client side in the remote calls
models_settings.model_pathPath to the model folder in the model repository; for example gs://<bucket_name>/<model_dir>
models_settings.nireqThe size of internal request queue. When set to 0 or no value is set value is calculated automatically based on available resources
models_settings.plugin_configAdds OpenVINO plugin configuration for tuning the performance. Value {\"PERFORMANCE_HINT\":\"LATENCY\"} optimizes the inference latency with a single client scenario
models_settings.batch_sizechange the model batch size
models_settings.shapeshape is optional and takes precedence over batch_size. The shape argument changes the model that is enabled in the model server to fit the parameters. shape accepts three forms of the values: a tuple, such as (-1,3,100-200,224) - The tuple defines the shape to use for all incoming requests for models with a single input. Each dimension can be a static value 3, a range 100-200 or -1 which is undefined value. A dictionary of shapes, such as {"input1":"(1,3,224,224)","input2":"(1,3,50,50)", "input3":"auto"} set shape for multiple inputs
models_settings.model_version_policy'{"latest": { "num_versions":1 }}'
models_settings.layoutChange layout of the model input or output with image data; NCHW:NHWC changes the layout from NCHW to NHWC
models_settings.target_deviceAny supported OpenVINO target device like CPU/GPU/HDDL/MULTI/HETERO/AUTO
models_settings.is_statefulset true it the model is stateful
models_settings.idle_sequence_cleanupIf set to true, model will be subject to periodic sequence cleaner scans. See idle sequence cleanup
models_settings.low_latency_transformationIf set to true, model server will apply low latency transformation on model load
models_settings.max_sequence_numberDetermines how many sequences can be handled concurrently by a model instance.
server_settings.file_system_poll_wait_secondsTime interval between config and model versions changes detection in seconds. Default value is 1. Zero value disables changes monitoring.
server_settings.log_levelOne of ERROR/WARNING/INFO/DEBUG
server_settings.grpc_workersnumber of gRPC servers; default is 1
server_settings.rest_workersnumber of REST server threads; default is calculated automatically
models_repository.https_proxyproxy to be used to pull cloud storage models
models_repository.http_proxyproxy to be used to pull cloud storage models
models_repository.storage_typeone of google storage, s3, azure blob or cluster
models_repository.models_host_pathMounts node local path in container as /models folder
models_repository.models_volume_claimMounts persistent volume claim in the container as /models; persistent Volume Claim should be create in the same namespace and populated with the model repository content
models_repository.runAsUseraccount security context
models_repository.runAsGroupgroup security context
models_repository.aws_secret_access_keyS3 storage secret key, use it with S3 storage for models
models_repository.aws_access_key_idS3 storage access key id, use it with S3 storage for models
models_repository.aws_regionS3 storage secret key, use it with S3 storage for models
models_repository.s3_compat_api_endpointS3 compatibility api endpoint, use it with Minio storage for models
models_repository.gcp_creds_secret_namesecret resource including GCP credentials, use it with google storage for models; create it via kubectl create secret generic <secret name> --from-file gcp-creds.json
models_repository.azure_storage_connection_stringConnection string to the Azure Storage authentication account, use it with Azure storage for models

Check an example of the fully functional ModelServer resource