ExploreGS-GS-DataGen

May 17, 2026 · View on GitHub

Bulk 3D Gaussian Splatting dataset generation. Given a multi-view image collection (DL3DV-10K or MipNeRF-360), this repo trains a Gaussian Splatting model per scene, renders train/test images, visibility masks and depth maps, and emits per-scene metadata (partition.json, cam_extrinsics.npy, cam_intrinsics.npy) for downstream training.

It is a fork of the Inria 3D Gaussian Splatting codebase (train.py, render.py, scene/, gaussian_renderer/, submodules/), with the dataset-generation orchestration layered on top. See LICENSE.md — the original code is for non-commercial research and evaluation use.

Setup

The environment is almost identical to the official Inria 3D Gaussian Splatting repo — if you already have that set up, you can reuse it directly. And then,

pip install GPUtil

The CUDA rasterizer and KNN submodules under submodules/ are installed by the conda environment. SIBR_viewers is the optional interactive viewer (build separately, see the upstream 3DGS instructions). For hardware requirements, CUDA/compiler setup, and troubleshooting, refer to the upstream 3DGS README.

Expected input layout

Each scene directory needs decimated images (images_4/) plus camera data: Our codes is basically for DL3DV-10K dataset. (960P we used.)

<dataset-root>/
  <subset>/                # ex: "1K" for DL3DV
    <scene>/
      images_4/            # required: 4x-downscaled images
      transforms.json      # required for DL3DV
      sparse/              # required for DL3DV

Notes

There are some hardcoded values, including images_4 or image folder directories or etc You would change them for your settings.

Usage

dataset_generation.py scans the dataset root, builds one job per (scene, split_ratio), and dispatches them across idle GPUs (via GPUtil + a thread pool). Each job runs the requested --stages in order.

python dataset_generation.py --dataset dl3dv \
    --dataset-root /path/to/DL3DV-10K-960P \
    --output-dir   /path/to/DL3DV_GS \
    --subset 1K --split-ratio 0.1 0.3 0.5 0.7 0.9 \
    --stages train render visibility

Or refer run.sh

Stages

StageAction
trainoptimize Gaussians (train.py)
renderrender train/test RGB + GT images (render.py)
visibilityrender visibility masks (render.py --override_visibility_mask)
depthrender depth maps only (render.py --depth_only)
postprocesswrite partition.json / cam_extrinsics.npy without training (train.py --save_mode) — backfills metadata for already-trained scenes
partition_fixvalidate partition.json against cam_extrinsics.npy; rewrite it, or delete the scene if unrecoverable
cleanrm -rf the scene's output directory

Key options

FlagDescription
--dataset {dl3dv,mipnerf360}dataset preset (readiness checks, OMP threads, default subset)
--subsetsub-directories under the dataset root to process
--split-ratiotrain/test split ratios; one output model per ratio
--cut-start / --cut-endscene-list slice — give each server a disjoint range to split work across machines
--skip-existingskip scenes that already have point_cloud.ply + cam_extrinsics.npy + partition.json
--max-workersconcurrent GPU jobs
--excluded-gpusGPU ids to leave alone
--dry-runprint commands without running them

Run python dataset_generation.py --help for the full list.

Output layout

<output-dir>/<subset>/<scene>_<split>/
  point_cloud/iteration_30000/point_cloud.ply
  cameras.json
  cam_extrinsics.npy / cam_intrinsics.npy
  partition.json                 # {"train": [...], "test": [...]} camera indices
  train/ours_30000/{renders,gt}/
  test/ours_30000/{renders,gt}/
  train|test/ours_30000/visibility_mask_global_2ndpick_th0.5/

Single-scene scripts

The orchestrator wraps the standard 3DGS entry points, which can also be run directly on one scene:

python train.py  -s <scene> -m <model_dir> -r 4 -i images_4 --split 0.8 --seed 0
python render.py -m <model_dir> [--override_visibility_mask | --depth_only]
python metrics.py -m <model_dir>

License

See LICENSE.md. The Gaussian Splatting code is © 2023 Inria & MPII, provided for non-commercial research and evaluation use.