ExploreGS-GS-DataGen
May 17, 2026 · View on GitHub
Bulk 3D Gaussian Splatting dataset generation. Given a multi-view image
collection (DL3DV-10K or MipNeRF-360), this repo trains a Gaussian Splatting
model per scene, renders train/test images, visibility masks and depth maps,
and emits per-scene metadata (partition.json, cam_extrinsics.npy,
cam_intrinsics.npy) for downstream training.
It is a fork of the Inria 3D Gaussian Splatting
codebase (train.py, render.py, scene/, gaussian_renderer/, submodules/),
with the dataset-generation orchestration layered on top. See LICENSE.md —
the original code is for non-commercial research and evaluation use.
Setup
The environment is almost identical to the official Inria 3D Gaussian Splatting repo — if you already have that set up, you can reuse it directly. And then,
pip install GPUtil
The CUDA rasterizer and KNN submodules under submodules/ are installed by the
conda environment. SIBR_viewers is the optional interactive viewer (build
separately, see the upstream 3DGS instructions). For hardware requirements,
CUDA/compiler setup, and troubleshooting, refer to the
upstream 3DGS README.
Expected input layout
Each scene directory needs decimated images (images_4/) plus camera data:
Our codes is basically for DL3DV-10K dataset. (960P we used.)
<dataset-root>/
<subset>/ # ex: "1K" for DL3DV
<scene>/
images_4/ # required: 4x-downscaled images
transforms.json # required for DL3DV
sparse/ # required for DL3DV
Notes
There are some hardcoded values, including images_4 or image folder directories or etc You would change them for your settings.
Usage
dataset_generation.py scans the dataset root, builds one job per
(scene, split_ratio), and dispatches them across idle GPUs (via GPUtil +
a thread pool). Each job runs the requested --stages in order.
python dataset_generation.py --dataset dl3dv \
--dataset-root /path/to/DL3DV-10K-960P \
--output-dir /path/to/DL3DV_GS \
--subset 1K --split-ratio 0.1 0.3 0.5 0.7 0.9 \
--stages train render visibility
Or refer run.sh
Stages
| Stage | Action |
|---|---|
train | optimize Gaussians (train.py) |
render | render train/test RGB + GT images (render.py) |
visibility | render visibility masks (render.py --override_visibility_mask) |
depth | render depth maps only (render.py --depth_only) |
postprocess | write partition.json / cam_extrinsics.npy without training (train.py --save_mode) — backfills metadata for already-trained scenes |
partition_fix | validate partition.json against cam_extrinsics.npy; rewrite it, or delete the scene if unrecoverable |
clean | rm -rf the scene's output directory |
Key options
| Flag | Description |
|---|---|
--dataset {dl3dv,mipnerf360} | dataset preset (readiness checks, OMP threads, default subset) |
--subset | sub-directories under the dataset root to process |
--split-ratio | train/test split ratios; one output model per ratio |
--cut-start / --cut-end | scene-list slice — give each server a disjoint range to split work across machines |
--skip-existing | skip scenes that already have point_cloud.ply + cam_extrinsics.npy + partition.json |
--max-workers | concurrent GPU jobs |
--excluded-gpus | GPU ids to leave alone |
--dry-run | print commands without running them |
Run python dataset_generation.py --help for the full list.
Output layout
<output-dir>/<subset>/<scene>_<split>/
point_cloud/iteration_30000/point_cloud.ply
cameras.json
cam_extrinsics.npy / cam_intrinsics.npy
partition.json # {"train": [...], "test": [...]} camera indices
train/ours_30000/{renders,gt}/
test/ours_30000/{renders,gt}/
train|test/ours_30000/visibility_mask_global_2ndpick_th0.5/
Single-scene scripts
The orchestrator wraps the standard 3DGS entry points, which can also be run directly on one scene:
python train.py -s <scene> -m <model_dir> -r 4 -i images_4 --split 0.8 --seed 0
python render.py -m <model_dir> [--override_visibility_mask | --depth_only]
python metrics.py -m <model_dir>
License
See LICENSE.md. The Gaussian Splatting code is © 2023 Inria & MPII,
provided for non-commercial research and evaluation use.