Triton Local Cache

July 11, 2023 ยท View on GitHub

License

Triton Local Cache

This repo contains an example TRITONCACHE API implementation for caching data locally in-memory.

Ask questions or report problems in the main Triton issues page.

Build the Cache

Use a recent cmake to build. First install the required dependencies.

$ apt-get install libboost-dev rapidjson-dev

To build the cache:

$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
$ make install

The following required Triton repositories will be pulled and used in the build. By default the "main" branch/tag will be used for each repo but the following CMake arguments can be used to override.

  • triton-inference-server/core: -D TRITON_CORE_REPO_TAG=[tag]
  • triton-inference-server/common: -D TRITON_COMMON_REPO_TAG=[tag]

Configuring the Cache

Like other TRITONCACHE implementations, this cache is configured through the tritonserver --cache-config CLI arg or through the TRITONSERVER_SetCacheConfig API.

Currently, the following config fields are supported:

  • size: The fixed size (in bytes) of CPU memory allocated to the cache upfront. If this value is too large (ex: greater than available memory) or too small (ex: smaller than required overhead such as ~1-2 KB), initialization may fail.
    • example: tritonserver --cache-config local,size=1048576

Metrics

When TRITON_ENABLE_METRICS is enabled in this cache (enabled by default), it will check to see if the running Triton server has metrics enabled as well. If so, the cache will publish additional cache-specific metrics to Triton's metrics endpoint through the Custom Metrics API.

Cache Metrics

The following metrics are reported by this cache implementation:

CategoryMetricMetric NameDescriptionGranularityFrequency
UtilizationTotal Cache Utilizationnv_cache_utilTotal cache utilization rate (0.0 - 1.0)Server-widePer interval
CountTotal Cache Entry Countnv_cache_num_entriesTotal number of entries stored in cacheServer-widePer interval
Total Cache Lookup Countnv_cache_num_lookupsTotal number of cache lookups done by TritonServer-widePer interval
Total Cache Hit Countnv_cache_num_hitsTotal number of cache hitsServer-widePer interval
Total Cache Miss Countnv_cache_num_missesTotal number of cache missesServer-widePer interval
Total Cache Eviction Countnv_cache_num_evictionsTotal number of cache evictionsServer-widePer interval
LatencyTotal Cache Lookup Timenv_cache_lookup_durationCumulative time spent doing cache lookups (microseconds)Server-widePer interval
Total Cache Insertion Timenv_cache_insertion_durationCumulative time spent doint cache insertions (microseconds)Server-widePer interval