Roadmap
January 22, 2024 ยท View on GitHub
Functionality
- Batched inference
- Fine-grained KV cache management
- Explore tree sparsity
- Fine-tune Medusa heads together with LM head from scratch
- Distill from any model without access to the original training data