Value Residual Learning

March 20, 2025 · View on GitHub

This official repo includes instructions for running Resformer and SVformer introduced in the following paper Value Residual Learning.

Requirement

pip install transformers=4.44.2.

Data

Download the tokenizer and place it in the "data/tokenizer/RedPajama-INCITE-Base-7B".
Follow the instructions in the "README.md" located in "src_data/" to prepare "processed_slimpajama_20B" and place it in the "data/".

Analysis

The code for entropy analysis and token similarity analysis can be found in "analyze/get_entropy.py" and "analyze/get_simlarity.py" respectively.

Train

mkdir logs, mkdir output

Modify the "CACHE" and "CODE_DIR" in the "*.sh" file, then run bash scripts/run_llama_baseline_82M.sh and bash scripts/run_llama_resformer_82M.sh.

Relative Loss Analysis

Run analyze/plot_relative_loss.py.

Notable attempts and variants:

modded nanogpt project: twitter , github , u-net form value residual
rwkv7: code , paper
BS-RoFormer: github , model
x-formers: github , code