GliDe with a CaPE ICML 2024

August 13, 2024 ยท View on GitHub

Official code for GLIDE with a CAPE: A Low-Hassle Method to Accelerate Speculative Decoding.

Currently, the codebase is a little bit ugly, and I will try to re-built it.

TODO

  • Triton-based Tree Attention
  • Copy-based Tree KV Cache
  • Training with clean codebase, remove ugly deepspeed.