AKiRa: Augmentation Kit on Rays for Optical Video Generation

June 30, 2025 Β· View on GitHub

AKiRa: Augmentation Kit on Rays for Optical Video Generation

Xi Wang, Robin Courant, Marc Christie, Vicky Kalogeiton

License

🌐 Project Webpage


Teaser

Overview

As a TL;DR:
We introduce AKiRa (Augmentation Kit on Rays) β€” a ray-space augmentation framework using PlΓΌcker coordinates that enables video generation models to directly control camera and lens parameters, including:
focal length, lens distortion, aperture, and focus point (for bokeh effects).


πŸ”¬ Benchmark: FlowSim

Evaluating camera-to-video models can be challenging, especially when traditional pose estimation metrics like Absolute Pose Error (APE) and Relative Pose Error (RPE) are unreliable due to short baselines or shaky camera motions.

FlowSim offers a robust alternative by computing optical flow similarity between videos, providing a scalable metric to assess camera motion consistency in generated videos.

Key Features of FlowSim:

  • Pose-Free Evaluation: Eliminates the need for explicit camera pose estimation.
  • Robustness: Effective in scenarios with small translations or unstable camera movements.
  • Scalability: Suitable for large-scale evaluations of synthetic-to-real generalization.

πŸ‘‰ Explore the FlowSim repository:
Triocrossing/FlowSim

πŸš€ Training

bash ./dist_run.sh configs/train_akira/svd_320_576.yaml N_GPU train_akira.py

πŸ“¦ Pretrained Checkpoints:
πŸ€—akira checkpoint on Huggingface

πŸ™ Acknowledgment
Part of the codebase is adapted from CameraCtrl β€” many thanks to the authors for their excellent work and their project!


License

This project is licensed under the terms of the MIT License.