Benchmarking PySceneDetect
May 28, 2026 ยท View on GitHub
Benchmarks PySceneDetect's detection accuracy and latency against public shot-boundary-detection corpora. Scoring follows the TRECVID-SBD convention (greedy 1-to-1 nearest-neighbor matching with a configurable frame tolerance for hard cuts; point-in-interval matching for fade transitions; mean absolute frame offset on matched events) so numbers are comparable to published SBD results.
Supported datasets:
- BBC Planet Earth: 11 long-form broadcast clips; hard cuts only
- AutoShot: Short-form web clips; hard cuts only
- ClipShots: Short-form web clips; hard cuts and typed gradual transitions (fades/dissolves)
Usage
# Single detector x single dataset:
python -m benchmark --detector detect-content --dataset BBC
Pass --help for --dataset-root, --backend, --tolerance, and --out options.
Dataset Download
BBC
# annotations
wget -O BBC/fixed.zip https://zenodo.org/records/14873790/files/fixed.zip
unzip BBC/fixed.zip -d BBC
rm -rf BBC/fixed.zip
# videos
wget -O BBC/videos.zip https://zenodo.org/records/14873790/files/videos.zip
unzip BBC/videos.zip -d BBC
rm -rf BBC/videos.zip
AutoShot
Download AutoShot_test.tar.gz from
Google Drive.
tar -zxvf AutoShot_test.tar.gz
rm AutoShot_test.tar.gz
ClipShots
ClipShots is gated behind a dataset request form; direct wget-style download links are not
published. See the download instructions to
obtain the annotations and videos. The expected on-disk layout is:
ClipShots/
annotations/{train,test,only_gradual}.json
video_lists/{train,test,only_gradual}.txt
videos/*.mp4
The loader defaults to the test split (500 videos). The full corpus is ~46 GB.
Set --dataset-root /path/to/datasets to override. The default dataset location assumes they are
all placed in the benchmark folder (e.g. benchmark/BBC, benchmark/AutoShot, benchmark/ClipShots).
Results (defaults)
Generated by scripts/benchmark_defaults.sh at tolerance=0 (strict frame-exact matching).
Elapsed is mean wall-clock seconds per video.
BBC
| Detector | Recall | Precision | F1 | Mean s/video |
|---|---|---|---|---|
| AdaptiveDetector | 87.12 | 96.55 | 91.59 | 36.12 |
| ContentDetector | 84.70 | 88.77 | 86.69 | 37.02 |
| HashDetector | 92.30 | 75.56 | 83.10 | 25.51 |
| HistogramDetector | 89.84 | 72.03 | 79.96 | 22.29 |
| ThresholdDetector | 0.06 | 0.70 | 0.11 | 16.05 |
AutoShot
| Detector | Recall | Precision | F1 | Mean s/video |
|---|---|---|---|---|
| AdaptiveDetector | 70.59 | 77.46 | 73.86 | 3.52 |
| ContentDetector | 63.49 | 76.19 | 69.26 | 4.80 |
| HashDetector | 56.48 | 76.11 | 64.84 | 4.14 |
| HistogramDetector | 63.27 | 53.23 | 57.82 | 3.76 |
| ThresholdDetector | 0.75 | 38.64 | 1.47 | 3.28 |
ClipShots (hard cuts)
| Detector | Recall | Precision | F1 | Mean s/video |
|---|---|---|---|---|
| AdaptiveDetector | 85.97 | 41.25 | 55.75 | 1.81 |
| ContentDetector | 81.93 | 42.36 | 55.84 | 2.52 |
| HashDetector | 81.34 | 30.14 | 43.98 | 1.04 |
| HistogramDetector | 72.20 | 11.47 | 19.80 | 0.71 |
| ThresholdDetector | 0.08 | 0.58 | 0.14 | 0.64 |
ClipShots (fades)
| Detector | Recall | Precision | F1 |
|---|---|---|---|
| AdaptiveDetector | 13.65 | 98.12 | 23.96 |
| ContentDetector | 26.03 | 98.04 | 41.14 |
| HashDetector | 18.77 | 94.53 | 31.33 |
| HistogramDetector | 69.67 | 81.99 | 75.33 |
| ThresholdDetector | 5.69 | 99.24 | 10.77 |
Citations
BBC
@InProceedings{bbc_dataset,
author = {Lorenzo Baraldi and Costantino Grana and Rita Cucchiara},
title = {A Deep Siamese Network for Scene Detection in Broadcast Videos},
booktitle = {Proceedings of the 23rd ACM International Conference on Multimedia},
year = {2015},
}
AutoShot
@InProceedings{autoshot_dataset,
author = {Wentao Zhu and Yufang Huang and Xiufeng Xie and Wenxian Liu and Jincan Deng and Debing Zhang and Zhangyang Wang and Ji Liu},
title = {AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
year = {2023},
}
ClipShots
@InProceedings{clipshots_dataset,
author = {Shitao Tang and Litong Feng and Zhanghui Kuang and Yimin Chen and Wei Zhang},
title = {Fast Video Shot Transition Localization with Deep Structured Models},
booktitle = {Asian Conference on Computer Vision (ACCV)},
year = {2018},
}