README.md
October 13, 2025 · View on GitHub
Diffusion Models For Low-Light Image Enhancement: A Multi-Perspective Taxonomy And Performance Analysis
A structured and visual companion repository summarizing the paper
- Authors: Eashan Adhikarla, Yixin Liu and Brian D. Davison
- Affiliation: Lehigh University
- Published: 2025 (arXiv preprint)
Why this repo
This README is the fast path through the survey. It gives a taxonomy cheat sheet, dataset and metric lookup, a decision guide for selecting LLIE diffusion methods, and links back to exact sections of the paper for detail. The paper presents a six-part taxonomy, a benchmark and metrics view, challenges, and forward directions.
TL;DR of the survey
- Six-part taxonomy for diffusion models in LLIE: Intrinsic Decomposition, Spectral and Latent, Accelerated, Guided, Multimodal, Autonomous. The taxonomy is grounded in model mechanism and conditioning signals.
- Benchmarks and analysis cover datasets, metrics, and a cross-benchmark performance landscape, with deployment tradeoffs among fidelity, perception, and efficiency.
- Practical challenges include latency, generalization, data scarcity, interpretability, and ethics. The paper outlines directions for on-device use and foundation model adaptation.
Quick navigator
Use the paper’s HTML view for jump links. HTML‡arXiv
- Key Observations
- Background (LLIE problem framing and diffusion fundamentals)
- Taxonomy (six categories with representative methods and tradeoffs)
- Datasets and Metrics (FR, NR, distribution, and task-based)
- Cross-Benchmark Landscape
- Challenges (latency, generalization, data dependence, fidelity vs perception vs efficiency, interpretability, ethics)
- Future Directions (foundation models, on-device real time, self supervised and zero shot, controllability)
Taxonomy cheat sheet
Short descriptions to help you decide what to read or build next. See Section 4 for details.
-
Intrinsic Decomposition
Retinex or physics grounded formulations where diffusion operates with priors on illumination or reflectance. Best when interpretability and physical plausibility matter. -
Spectral and Latent
Operate in Fourier, wavelet, or latent spaces to reduce compute while preserving structure. Good for high resolution and faster sampling. -
Accelerated
Fewer steps via trajectory optimization, distillation, or latent shortcuts. The place to look for real time systems. -
Guided
Spatial masks, exposure controls, prompts, or instructions steer the enhancement. Useful for controllable brightness and region-aware edits. -
Multimodal
Fuse RGB with other sensors or align enhancement to downstream tasks. Robust in extreme darkness when sensors or auxiliary signals exist. -
Autonomous
Self supervised, zero shot, or UDA setups that reduce reliance on paired data and improve scalability across domains.
If you are here to choose a method
- Need speed → read Accelerated and Spectral and Latent. Combine step reduction or distillation with latent or frequency spaces.
- Need control → read Guided for exposure control, spatial masks, or instruction guidance.
- Deploy on-device → see Challenges 6.1 and Future 7.2 on latency, memory, and energy, then pair with Accelerated.
- Unpaired data → see Autonomous for zero shot and self supervised routes.
- Downstream tasks → see Multimodal and task-aligned evaluation in 5.2.4.
Datasets at a glance
Common LLIE datasets referenced in the survey. Use this like a lookup card. The sizes and notes below match Table 2 in Section 5.1.
| Dataset | Type | Size | Summary |
|---|---|---|---|
| LOL | Paired | 500 pairs | Mostly indoor, real capture with varying exposure and ISO. |
| LOLv2 | Paired | Real 789, Synth 1000 | Indoor and outdoor, real capture and synthesis. |
| LSRW | Paired | 5,650 pairs | DSLR and smartphone, mild misalignment, diverse scenes. |
| SID | Paired RAW | 5,094 pairs | Extreme low light RAW to RGB, Sony and Fuji subsets. |
| ExDark | Unpaired | 7,363 | Object labels for task-based checks, 12 classes. |
| SICE | Multi exposure | 589 sequences | MEF and HDR style evaluation with under and over exposed content. |
| VE-LOL | Mixed | L: 2.5k, H: 11k | Diverse human centric content, face annotations in H. |
| NTIRE 2024 | Challenge (RAW) | 230 train, 70 val/test | High resolution night scenes, real capture. |
| MIT FiveK | Paired RAW | 5,000 | Expert retouch supervision for tonal edits. |
Metrics guide
See Section 5.2 for pros, cons, and tradeoffs.
- Full reference: PSNR, SSIM, LPIPS
- No reference: NIQE, PI, BRISQUE family
- Distribution: FID, KID, DISTS
- Task based: impact on detection or recognition (mAP, accuracy)
Cross-benchmark landscape
The paper compares methods across datasets and shows how conclusions shift with metrics and domains. Read Section 5.3 before claiming a universal win. It may save you a revision cycle.
Practical challenges to plan for
- Latency and compute: first constraint for real time and mobile targets. See 6.1 and 7.2.
- Generalization across scenes and sensors: real darkness is not one distribution. See 6.2 and 6.3.
- Fidelity vs perception vs efficiency: do not optimize one in isolation. See 6.4.
- Interpretability and XAI: useful for safety and failure triage. See 6.5.
- Ethics: strong enhancement can hallucinate plausible but false details. See 6.6.
Representative themes inside the taxonomy
Examples that appear in the survey narrative: guided exposure control and region-aware edits, multimodal fusion for robustness, and accelerated sampling or distillation for speed.
For a broader external index of diffusion papers in low-level vision, see this curated list.
How to use this repo
- Use the Taxonomy to pick the right design axis for your application.
- Use the Datasets table to choose training and evaluation splits that match your target domain.
- Use the Metrics guide to decide the correct fidelity and perception balance.
- Jump to the HTML view for details behind each choice.
Future directions worth tracking
- Foundation model adaptation: steer large pretrained diffusion models toward LLIE with minimal fine tuning.
- On-device pipelines: combine step reduction with latent or spectral operations for real time on phones and edge cameras.
- Principled self supervised and zero shot: for transfer without paired data.
- Better controllability and interpretability: important for professional workflows and safety contexts.
Contributing
If you spot a missing dataset quirk, a metric pitfall, or a new LLIE diffusion paper that fits the taxonomy, open a pull request. Include a short note on which taxonomy category it fits and the evaluation setting it uses.
📚 Citation
If you find this repository useful, please cite our paper:
@article{adhikarla2025diffusion,
title={Diffusion Models for Low-Light Image Enhancement: A Multi-Perspective Taxonomy and Performance Analysis},
author={Adhikarla, Eashan and Liu, Yixin and Davison, Brian D},
journal={arXiv preprint arXiv:2510.05976},
year={2025}
}