Changelog
November 27, 2025 ยท View on GitHub
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
[1.2.0] - 2025-11-25
Feature
Universal:
- Add support for distributed sharing of the KV Cache, to suppot KV Cache sharing between CPU and SSD, as well as distributed sharing of PCFS (#17)
- Add GDS (GPU Direct Storage) Support (#25)
- TP16 support (#26)
- Support more kv cache layout. Now include: vLLM, SGLang, TensorRT-LM (#27)
- GDS refactor & gtensor support (#42)
- Support construct TensorSharedHandle directly from CUDA IPC Handle (#44)
Targeting vllm:
- Support dp > 1 while integrated with vllm (#18)
- Add launch scripts for vllm adaption (#47)
- Support TP16 for vLLM+FlexKV (#59)
Targeting TensorRT-LLM:
Optimization
- Mla d2h transfer optimization (#19)
- optimize SSD I/O (#33)
- Enhance cache eviction with frequency-aware grace time mechanism (#38)
- Replace std::map with std::unordered_map in RadixTree (#41)
Bugfix
- Fix wrong head number for DeepSeek for vllm integration (#23)
- Fix bug, if cpu match len is bigger than ssd when put, it will cause error (#24)
- Fix benchmark_worker (#31)
- Fix segfault caused by radix tree array out-of-bounds access (#39)
- Fix cache_info (#40)
- Fix port for GPU registration (#45)
- Fix SSD allocator (#46)
- Fix vllm init num_kv_heads bug (#67)
- Fix model_config for non-MLA models (#68)
Misc
- Add doc for: FlexKV + TensorRT-LLM (#52)
- For config: Simplify user configuration (#37), and other slight update (#43)
[1.1.0] - 2025-09-15
- Add op-level callback for local get/put #13
- Add doc for: FlexKV + Dynamo (#14), flexkv_config.json (#15),
[1.0.0] - 2025-09-11
Added
- C++ radix tree for fast match, need set "index_accel": true in cache_config
- sync kernel launch
- a huge change that move cache engine to a library for accelerator(vLLM e.g.) to use instead of server-client mode. This accelerate the get and put when no KVCache is matched. This version includes breaking API changes and is not backward compatible.
- add evict_ratio, need set "evict_ratio": 0.05 in cache_config
- reducing the bubble inner the launch kernel
- add vLLM 0.10.1.1 adapter
Fixed
- cython release package
[0.1.0] - 2025-08-29
Init
- init version
- add license