README.md

April 1, 2026 · View on GitHub

Fine-tuning the MicroVerse video generation model on microscopic biological simulation data, supporting both LoRA and full parameter training with automatic checkpoint resume.

Quick Start

1. Installation

cd train
pip install -r requirements.txt

2. Prepare Dataset

Create a directory with your videos and a metadata.json file:

data/your_dataset/
├── metadata.json
└── videos/
    ├── video_001.mp4
    ├── video_002.mp4
    └── ...

The metadata.json should be a JSON array:

[
  {
    "id": "video_001",
    "video": "videos/video_001.mp4",
    "prompt": "A detailed description of the microscopic process..."
  }
]

Fields:

Field	Type	Required	Description
`id`	string	No	Unique identifier for the sample
`video`	string	Yes	Relative path to the video file
`prompt`	string	Yes	Text description of the microscopic process

Video requirements:

Format: MP4 (H.264 recommended)
Resolution: Any (will be resized to training resolution)
Duration: 2–10 seconds recommended
FPS: Any (frames will be sampled automatically)

3. Training

LoRA Training (Recommended for getting started)

# 1.3B model — works on 1× 24 GB GPU
bash scripts/train_lora_1.3B.sh

# 14B model — works on 2× 24 GB GPUs
bash scripts/train_lora_14B.sh

Full Parameter Training

# 1.3B model — requires 2× 40 GB GPUs
bash scripts/train_full_1.3B.sh

# 14B model — requires 2× 80 GB GPUs with DeepSpeed
bash scripts/train_full_14B.sh

Edit the scripts to set:

--dataset_base_path — path to your dataset directory
--dataset_metadata_path — path to your metadata.json
--output_path — where to save checkpoints
--learning_rate — learning rate (1e-4 for LoRA, 1e-5 for full)
--num_epochs — number of training epochs
--save_steps — checkpoint save interval

4. Inference

Single video generation

python inference.py \
  --model_size 1.3B \
  --mode lora \
  --checkpoint ./outputs/lora_1.3B \
  --prompt "DNA replication inside a eukaryotic cell nucleus" \
  --output ./results/output.mp4

Batch generation from JSON

python inference.py \
  --model_size 14B \
  --mode full \
  --checkpoint ./outputs/full_14B \
  --metadata ./data/test_samples.json \
  --output ./results/batch_output/

Training Configurations

GPU Requirements

Mode	Model	Min GPUs	Min VRAM per GPU	Accelerate Config
LoRA	1.3B	1	24 GB	`accelerate_1gpu.yaml`
LoRA	14B	2	24 GB	`accelerate_2gpu.yaml`
Full	1.3B	2	40 GB	`accelerate_2gpu.yaml`
Full	14B	2	80 GB	`accelerate_deepspeed_2gpu.yaml`

LoRA vs Full Fine-tuning

Aspect	LoRA	Full
VRAM	Low	High
Speed	Fast	Slow
Checkpoint size	~100–500 MB	~3–27 GB
Quality	Good	Best
Recommended LR	1e-4	1e-5
Recommended epochs	3–5	1–2

Key Training Arguments

Argument	Default	Description
`--learning_rate`	1e-4	Learning rate
`--num_epochs`	1	Number of training epochs
`--height`	480	Video height (pixels)
`--width`	832	Video width (pixels)
`--num_frames`	81	Number of frames per video
`--save_steps`	None	Save checkpoint every N steps
`--lora_rank`	32	LoRA rank
`--lora_target_modules`	q,k,v,o,ffn.0,ffn.2	LoRA target modules
`--use_gradient_checkpointing_offload`	False	Offload gradient checkpoints to CPU
`--dataset_repeat`	1	Repeat dataset N times per epoch

Training automatically resumes from the latest checkpoint if interrupted. Checkpoints are saved as step-{N}.safetensors in the output directory.

Project Structure

train/
├── train.py                 # Training entry point
├── inference.py             # Inference entry point
├── requirements.txt         # Python dependencies
├── configs/                 # Accelerate/DeepSpeed configs
│   ├── accelerate_1gpu.yaml
│   ├── accelerate_2gpu.yaml
│   ├── accelerate_deepspeed_2gpu.yaml
│   └── accelerate_deepspeed_4gpu.yaml
├── scripts/                 # Ready-to-use shell scripts
│   ├── train_lora_1.3B.sh
│   ├── train_lora_14B.sh
│   ├── train_full_1.3B.sh
│   ├── train_full_14B.sh
│   ├── infer_lora_1.3B.sh
│   ├── infer_lora_14B.sh
│   ├── infer_full_1.3B.sh
│   ├── infer_full_14B.sh
│   └── infer_batch.sh
├── data/                    # Dataset directory
│   └── example/
│       ├── metadata.json    # Example metadata
│       └── videos/
└── diffsynth/               # Core training/inference engine

Acknowledgments

This training framework builds upon Wan2.1 and is adapted for MicroVerse microscale biological simulation.