AKiRa: Augmentation Kit on Rays for Optical Video Generation
June 30, 2025 Β· View on GitHub
AKiRa: Augmentation Kit on Rays for Optical Video Generation

Overview
As a TL;DR:
We introduce AKiRa (Augmentation Kit on Rays) β a ray-space augmentation framework using PlΓΌcker coordinates that enables video generation models to directly control camera and lens parameters, including:
focal length, lens distortion, aperture, and focus point (for bokeh effects).
π¬ Benchmark: FlowSim
Evaluating camera-to-video models can be challenging, especially when traditional pose estimation metrics like Absolute Pose Error (APE) and Relative Pose Error (RPE) are unreliable due to short baselines or shaky camera motions.
FlowSim offers a robust alternative by computing optical flow similarity between videos, providing a scalable metric to assess camera motion consistency in generated videos.
Key Features of FlowSim:
- Pose-Free Evaluation: Eliminates the need for explicit camera pose estimation.
- Robustness: Effective in scenarios with small translations or unstable camera movements.
- Scalability: Suitable for large-scale evaluations of synthetic-to-real generalization.
π Explore the FlowSim repository:
Triocrossing/FlowSim
π Training
bash ./dist_run.sh configs/train_akira/svd_320_576.yaml N_GPU train_akira.py
π¦ Pretrained Checkpoints:
π€akira checkpoint on Huggingface
π Acknowledgment
Part of the codebase is adapted from CameraCtrl β many thanks to the authors for their excellent work and their project!
License
This project is licensed under the terms of the MIT License.