SubTrack++

November 11, 2025 · View on GitHub

SubTrack++ is a memory- and time-efficient training framework for large language models (LLMs), designed to make high-performance LLM training more accessible. SubTrack++ leverages Grassmannian gradient subspace tracking, projection-aware optimization, and gradient recovery scaling to deliver superior convergence, reduced wall-time, and minimal memory overhead—without compromising accuracy.

🚀 What Makes SubTrack++ Different?

Grassmannian Subspace Tracking: Tracks low-rank gradient subspaces using geometry-aware updates, avoiding costly SVD computations and providing robust adaptation throughout training.
Projection-Aware Optimizer: Extends the Adam optimizer to reflect changes in gradient subspaces, maintaining accurate momentum updates even as the subspace evolves.
Recovery Scaling: Recovers and scales discarded gradient components to boost training performance and generalization.
Full-Parameter Training with Low Memory: Achieves state-of-the-art evaluation loss while maintaining the memory efficiency of GaLore and other low-rank methods.
Faster Convergence: Reduces pre-training wall-time by up to 43% compared to previous best methods on LLaMA models up to 7B parameters.

📦 Installation

pip install -r requirements.txt

🧪 Running SubTrack++

Example pre-training command (LLaMA 1B on C4 dataset):

torchrun --standalone --nproc_per_node 1 torchrun_main.py \
    --model_config configs/llama_1b.json \
    --single_gpu \
    --lr 0.0001 \
    --low_rank_scale 0.25 \
    --rank 512 \
    --subspace_update_interval 200 \
    --batch_size 8 \
    --total_batch_size 16 \
    --num_training_steps 10000 \
    --warmup_steps 1000 \
    --weight_decay 0 \
    --dtype bfloat16 \
    --eval_every 10000 \
    --optimizer low_rank_adamw  \
    --st_init_step_size 10000 \
    --subspace_update_method subtrack \
    --adaptive_optimizer \
    --recovery_scaling

You can find a list of example scripts in script folder. Ensure you configure dataset paths and checkpoint locations as needed.

The code is built on top of the GaLore repository, available here.

📚 Citation

If you use this work, please cite:

@inproceedings{
rajabi2025subtrack,
title={SubTrack++ : Gradient Subspace Tracking for Scalable {LLM} Training},
author={Sahar Rajabi and Nayeema Nonta and Sirisha Rambhatla},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=6geRIdlFWJ}
}