Reproducing LLQR Paper Results

May 27, 2026 · View on GitHub

This guide collects commands for the paper-facing LLQR runs:

Run commands from the repository root. Replace placeholder paths such as <aim_repo>, <imagenet2012_location>, and <iwslt14_tokenized_de_en_location> with local cluster paths before launching.

Use the commands below to reproduce the main experiment families and compare your runs with the corresponding paper figures and tables.

Conventions

Use uv run python run.py ... from this repository root.
For your own runs, save the exact command, Git commit, host, accelerator type, seed, dataset path, Aim repository, Aim run hash, final metric, best metric, and checkpoint path.
Treat full CIFAR architecture sweeps, ImageNet, and IWSLT14 translation as GPU or cluster jobs.
For multi-seed runs, add init_key=<seed> and save the seed list next to the resulting table or figure.

Reproduction Matrix

Paper surface	Family	Presets	Main sweep	Data requirement
Layerwise LQR	CIFAR architecture runs	`resnet18-`, `vgg16bn-`, `wide-resnet28x10-`, `pyramidnet110-`	`sam_mode=null` x `(divergence, ema_decay)`	CIFAR via configured loader
Layerwise LQR	ImageNet ResNet-50	`resnet50-imagenet`, `short-resnet50-imagenet`	`sam_mode=null` x `(divergence, ema_decay)`	ImageNet 2012 directory
Layerwise LQR	IWSLT14 De-En Transformer	`transformer-iwslt14-de-en`	`sam_mode=null`, fixed `ngd/0.925`	tokenized fairseq-style IWSLT14 directory
LLQR + SAM	CIFAR architecture runs	`resnet18-`, `vgg16bn-`, `wide-resnet28x10-`, `pyramidnet110-`	`base_sam`, `base_fsam`, `fisher_sam` x `(divergence, ema_decay)` x LLQR toggle	CIFAR via configured loader
LLQR + SAM	ImageNet ResNet-50	`resnet50-imagenet`, `short-resnet50-imagenet`	`base_sam`, `base_fsam`, `fisher_sam` x `(divergence, ema_decay)` x LLQR toggle	ImageNet 2012 directory
LLQR + SAM	IWSLT14 De-En Transformer	`transformer-iwslt14-de-en`	`base_sam`, `base_fsam`, `fisher_sam`, fixed `ngd/0.925`, LLQR toggle	tokenized fairseq-style IWSLT14 directory
LLQR large-batch route	ImageNet ResNet-50	`resnet50-imagenet`	chunked grouped LLQR update route	ImageNet 2012 directory

LLQR Toggle Convention

Use use_preconditioner and sam_use_preconditioner_on_update together for base_sam and base_fsam ablations.

LLQR-disabled:

use_preconditioner=false sam_use_preconditioner_on_update=false perturb_mode=ema_grad

LLQR-enabled:

use_preconditioner=true sam_use_preconditioner_on_update=true perturb_mode=ema_precond_grad

For fisher_sam, use perturbation_rho=0.1 perturb_mode=ema_grad. Canonical Fisher-SAM uses the vanilla outer update, so sam_use_preconditioner_on_update is intentionally inert for that mode.

LLQR Paper Runs Without SAM

The Layerwise LQR paper runs use the same experiment presets as the overlapping SAM sections, with sam_mode=null and LLQR preconditioning enabled.

For CIFAR, use:

llqr_experiments=(
  resnet18-cifar100
  vgg16bn-cifar100
  wide-resnet28x10-cifar100
  pyramidnet110-cifar100
  resnet18-cifar10
  vgg16bn-cifar10
  wide-resnet28x10-cifar10
  pyramidnet110-cifar10
)

divergence_pairs=("ngd 0.95" "newton 0.9")

for experiment in "${llqr_experiments[@]}"; do
  for pair in "${divergence_pairs[@]}"; do
    read -r divergence ema_decay <<< "${pair}"

    uv run python run.py \
      experiment="${experiment}" \
      sam_mode=null \
      use_preconditioner=true \
      divergence="${divergence}" \
      ema_decay="${ema_decay}" \
      "logging.aim_repo=<aim_repo>"
  done
done

For ImageNet ResNet-50, use:

llqr_imagenet_experiments=(
  resnet50-imagenet
  short-resnet50-imagenet
)

divergence_pairs=("ngd 0.95" "newton 0.9")

for experiment in "${llqr_imagenet_experiments[@]}"; do
  for pair in "${divergence_pairs[@]}"; do
    read -r divergence ema_decay <<< "${pair}"

    uv run python run.py \
      experiment="${experiment}" \
      "dataset.dataset_dir=<imagenet2012_location>" \
      sam_mode=null \
      use_preconditioner=true \
      divergence="${divergence}" \
      ema_decay="${ema_decay}" \
      "logging.aim_repo=<aim_repo>"
  done
done

For IWSLT14 German-to-English translation, use the data preparation flow in IWSLT14 Transformer LLQR + SAM Paper Runs, then launch:

uv run python run.py \
  experiment=transformer-iwslt14-de-en \
  "dataset.dataset_dir=<iwslt14_tokenized_de_en_location>" \
  sam_mode=null \
  use_preconditioner=true \
  divergence=ngd \
  ema_decay=0.925 \
  "logging.aim_repo=<aim_repo>"

CIFAR LLQR + SAM Paper Runs

The CIFAR paper sweep uses these experiment presets:

resnet18-cifar100
vgg16bn-cifar100
wide-resnet28x10-cifar100
pyramidnet110-cifar100
resnet18-cifar10
vgg16bn-cifar10
wide-resnet28x10-cifar10
pyramidnet110-cifar10

The CIFAR sweep varies:

sam_mode: base_sam, base_fsam, fisher_sam
paired divergence and EMA settings: divergence=ngd ema_decay=0.95 and divergence=newton ema_decay=0.9
the LLQR toggle pair for base_sam and base_fsam
init_key, when running additional seeds

Launch the main variants:

experiments=(
  resnet18-cifar100
  vgg16bn-cifar100
  wide-resnet28x10-cifar100
  pyramidnet110-cifar100
  resnet18-cifar10
  vgg16bn-cifar10
  wide-resnet28x10-cifar10
  pyramidnet110-cifar10
)

sam_modes=(base_sam base_fsam fisher_sam)
divergence_pairs=("ngd 0.95" "newton 0.9")

for experiment in "${experiments[@]}"; do
  for sam_mode in "${sam_modes[@]}"; do
    for pair in "${divergence_pairs[@]}"; do
      read -r divergence ema_decay <<< "${pair}"
      fisher_args=()
      if [ "${sam_mode}" = "fisher_sam" ]; then
        fisher_args=(perturbation_rho=0.1 perturb_mode=ema_grad)
      fi

      uv run python run.py \
        experiment="${experiment}" \
        sam_mode="${sam_mode}" \
        divergence="${divergence}" \
        ema_decay="${ema_decay}" \
        "logging.aim_repo=<aim_repo>" \
        "${fisher_args[@]}"
    done
  done
done

Add the LLQR-disabled or LLQR-enabled override block from LLQR Toggle Convention to each base_sam and base_fsam command as needed.

ImageNet ResNet-50 LLQR + SAM Paper Runs

The ImageNet reproduction presets are:

resnet50-imagenet: current config uses total_epochs=100 and precond_batch_size=256.
short-resnet50-imagenet: current config uses total_epochs=50 and precond_batch_size=128.

Provide the ImageNet 2012 data location at launch time with dataset.dataset_dir=<imagenet2012_location>.

The ImageNet sweep uses the same sam_mode and divergence pairs as CIFAR. The LLQR-enabled SAM setting is perturbation_rho=0.075 gbar_beta=0.6 perturb_mode=ema_precond_grad; keep those as explicit overrides for this experiment family.

Launch the main ImageNet variants:

imagenet_experiments=(
  resnet50-imagenet
  short-resnet50-imagenet
)

sam_modes=(base_sam base_fsam fisher_sam)
divergence_pairs=("ngd 0.95" "newton 0.9")

for experiment in "${imagenet_experiments[@]}"; do
  for sam_mode in "${sam_modes[@]}"; do
    for pair in "${divergence_pairs[@]}"; do
      read -r divergence ema_decay <<< "${pair}"
      paper_args=(perturbation_rho=0.075 gbar_beta=0.6)
      llqr_args=(
        use_preconditioner=true
        sam_use_preconditioner_on_update=true
        perturb_mode=ema_precond_grad
      )
      if [ "${sam_mode}" = "fisher_sam" ]; then
        paper_args=(perturbation_rho=0.1 perturb_mode=ema_grad)
        llqr_args=()
      fi

      uv run python run.py \
        experiment="${experiment}" \
        "dataset.dataset_dir=<imagenet2012_location>" \
        sam_mode="${sam_mode}" \
        divergence="${divergence}" \
        ema_decay="${ema_decay}" \
        "logging.aim_repo=<aim_repo>" \
        "${paper_args[@]}" \
        "${llqr_args[@]}"
    done
  done
done

For base_sam and base_fsam, replace llqr_args with the LLQR-disabled override block when running the non-LLQR ablation.

IWSLT14 Transformer LLQR + SAM Paper Runs

The IWSLT14 reproduction preset is transformer-iwslt14-de-en. It expects the fairseq-style tokenized IWSLT14 German-to-English directory. Either set IWSLT14_DATA_DIR or pass dataset.dataset_dir=<iwslt14_tokenized_de_en_location> at launch time.

Prepare the tokenized text dataset once on a login or prep node:

SCRATCH=/path/to/scratch bash scripts/prepare_iwslt14_de_en_scratch.sh

The helper downloads the public IWSLT14 De-En archive, uses Moses and subword-nmt, and writes:

$SCRATCH/iwslt14_de_en_cache/iwslt14.tokenized.de-en/
$SCRATCH/iwslt14_de_en_cache/iwslt14.tokenized.de-en.tar.gz

If Moses or subword-nmt already exist on the cluster, point the helper at the shared installs:

SCRATCH=/path/to/scratch \
MOSES_ROOT=/path/to/mosesdecoder \
SUBWORD_NMT_ROOT=/path/to/subword-nmt \
bash scripts/prepare_iwslt14_de_en_scratch.sh

Then stage the packed dataset into node-local storage inside each SLURM job:

cd "$SLURM_TMPDIR"
tar -xzf "$SCRATCH/iwslt14_de_en_cache/iwslt14.tokenized.de-en.tar.gz"
export IWSLT14_DATA_DIR="$SLURM_TMPDIR/iwslt14.tokenized.de-en"

The training loader builds .llqr_numeric_cache/ inside the extracted tokenized directory on first use.

The explored IWSLT14 setup uses divergence=ngd ema_decay=0.925. Do not use the CIFAR/ImageNet Newton pair unless a new reproduction note validates it for translation. The preset keeps start_sam_after_step=4000, so active SAM begins after inverse-sqrt warmup.

Launch the documented IWSLT14 variants:

iwslt_sam_modes=(base_sam base_fsam fisher_sam)

for sam_mode in "${iwslt_sam_modes[@]}"; do
  fisher_args=()
  if [ "${sam_mode}" = "fisher_sam" ]; then
    fisher_args=(perturbation_rho=0.1 perturb_mode=ema_grad)
  fi

  uv run python run.py \
    experiment=transformer-iwslt14-de-en \
    "dataset.dataset_dir=<iwslt14_tokenized_de_en_location>" \
    sam_mode="${sam_mode}" \
    divergence=ngd \
    ema_decay=0.925 \
    "logging.aim_repo=<aim_repo>" \
    "${fisher_args[@]}"
done

For base_sam and base_fsam, add the LLQR-disabled or LLQR-enabled override block from LLQR Toggle Convention as needed.

Large-Batch ResNet-50 Route

For ResNet-50/ImageNet runs that need precond_batch_size=256 on the validated A100 surface, prefer grouped outer chunking with the default exact mixed-term second-order route:

uv run python run.py \
  experiment=resnet50-imagenet \
  "dataset.dataset_dir=<imagenet2012_location>" \
  llqr_batch_update_mode=chunked_lqr_segment \
  llqr_batch_update_chunk_size=128 \
  llqr_use_fast_paths=true \
  "logging.aim_repo=<aim_repo>"

Keep llqr_second_order_mode=batched_exact and llqr_second_order_chunk_size=null for this route. The opt-in llqr_second_order_mode=sample_separable_exact route is an exact fallback for eligible grouped LLQR segments, not the recommended compute path when grouped chunked batched_exact already fits.