README.md

October 13, 2025 · View on GitHub

Diffusion Models For Low-Light Image Enhancement: A Multi-Perspective Taxonomy And Performance Analysis

A structured and visual companion repository summarizing the paper

Authors: Eashan Adhikarla, Yixin Liu and Brian D. Davison
Affiliation: Lehigh University
Published: 2025 (arXiv preprint)

Why this repo

This README is the fast path through the survey. It gives a taxonomy cheat sheet, dataset and metric lookup, a decision guide for selecting LLIE diffusion methods, and links back to exact sections of the paper for detail. The paper presents a six-part taxonomy, a benchmark and metrics view, challenges, and forward directions.

TL;DR of the survey

Six-part taxonomy for diffusion models in LLIE: Intrinsic Decomposition, Spectral and Latent, Accelerated, Guided, Multimodal, Autonomous. The taxonomy is grounded in model mechanism and conditioning signals.
Benchmarks and analysis cover datasets, metrics, and a cross-benchmark performance landscape, with deployment tradeoffs among fidelity, perception, and efficiency.
Practical challenges include latency, generalization, data scarcity, interpretability, and ethics. The paper outlines directions for on-device use and foundation model adaptation.

Quick navigator

Use the paper’s HTML view for jump links. HTML‡arXiv

Key Observations
Background (LLIE problem framing and diffusion fundamentals)
Taxonomy (six categories with representative methods and tradeoffs)
Datasets and Metrics (FR, NR, distribution, and task-based)
Cross-Benchmark Landscape
Challenges (latency, generalization, data dependence, fidelity vs perception vs efficiency, interpretability, ethics)
Future Directions (foundation models, on-device real time, self supervised and zero shot, controllability)

Taxonomy cheat sheet

Short descriptions to help you decide what to read or build next. See Section 4 for details.

Intrinsic Decomposition
Retinex or physics grounded formulations where diffusion operates with priors on illumination or reflectance. Best when interpretability and physical plausibility matter.
Spectral and Latent
Operate in Fourier, wavelet, or latent spaces to reduce compute while preserving structure. Good for high resolution and faster sampling.
Accelerated
Fewer steps via trajectory optimization, distillation, or latent shortcuts. The place to look for real time systems.
Guided
Spatial masks, exposure controls, prompts, or instructions steer the enhancement. Useful for controllable brightness and region-aware edits.
Multimodal
Fuse RGB with other sensors or align enhancement to downstream tasks. Robust in extreme darkness when sensors or auxiliary signals exist.
Autonomous
Self supervised, zero shot, or UDA setups that reduce reliance on paired data and improve scalability across domains.

If you are here to choose a method

Need speed → read Accelerated and Spectral and Latent. Combine step reduction or distillation with latent or frequency spaces.
Need control → read Guided for exposure control, spatial masks, or instruction guidance.
Deploy on-device → see Challenges 6.1 and Future 7.2 on latency, memory, and energy, then pair with Accelerated.
Unpaired data → see Autonomous for zero shot and self supervised routes.
Downstream tasks → see Multimodal and task-aligned evaluation in 5.2.4.

Datasets at a glance

Common LLIE datasets referenced in the survey. Use this like a lookup card. The sizes and notes below match Table 2 in Section 5.1.

Dataset	Type	Size	Summary
LOL	Paired	500 pairs	Mostly indoor, real capture with varying exposure and ISO.
LOLv2	Paired	Real 789, Synth 1000	Indoor and outdoor, real capture and synthesis.
LSRW	Paired	5,650 pairs	DSLR and smartphone, mild misalignment, diverse scenes.
SID	Paired RAW	5,094 pairs	Extreme low light RAW to RGB, Sony and Fuji subsets.
ExDark	Unpaired	7,363	Object labels for task-based checks, 12 classes.
SICE	Multi exposure	589 sequences	MEF and HDR style evaluation with under and over exposed content.
VE-LOL	Mixed	L: 2.5k, H: 11k	Diverse human centric content, face annotations in H.
NTIRE 2024	Challenge (RAW)	230 train, 70 val/test	High resolution night scenes, real capture.
MIT FiveK	Paired RAW	5,000	Expert retouch supervision for tonal edits.

Metrics guide

See Section 5.2 for pros, cons, and tradeoffs.

Full reference: PSNR, SSIM, LPIPS
No reference: NIQE, PI, BRISQUE family
Distribution: FID, KID, DISTS
Task based: impact on detection or recognition (mAP, accuracy)

Cross-benchmark landscape

The paper compares methods across datasets and shows how conclusions shift with metrics and domains. Read Section 5.3 before claiming a universal win. It may save you a revision cycle.

Practical challenges to plan for

Latency and compute: first constraint for real time and mobile targets. See 6.1 and 7.2.
Generalization across scenes and sensors: real darkness is not one distribution. See 6.2 and 6.3.
Fidelity vs perception vs efficiency: do not optimize one in isolation. See 6.4.
Interpretability and XAI: useful for safety and failure triage. See 6.5.
Ethics: strong enhancement can hallucinate plausible but false details. See 6.6.

Representative themes inside the taxonomy

Examples that appear in the survey narrative: guided exposure control and region-aware edits, multimodal fusion for robustness, and accelerated sampling or distillation for speed.

For a broader external index of diffusion papers in low-level vision, see this curated list.

How to use this repo

Use the Taxonomy to pick the right design axis for your application.
Use the Datasets table to choose training and evaluation splits that match your target domain.
Use the Metrics guide to decide the correct fidelity and perception balance.
Jump to the HTML view for details behind each choice.

Future directions worth tracking

Foundation model adaptation: steer large pretrained diffusion models toward LLIE with minimal fine tuning.
On-device pipelines: combine step reduction with latent or spectral operations for real time on phones and edge cameras.
Principled self supervised and zero shot: for transfer without paired data.
Better controllability and interpretability: important for professional workflows and safety contexts.

Contributing

If you spot a missing dataset quirk, a metric pitfall, or a new LLIE diffusion paper that fits the taxonomy, open a pull request. Include a short note on which taxonomy category it fits and the evaluation setting it uses.

📚 Citation

If you find this repository useful, please cite our paper:

@article{adhikarla2025diffusion,
  title={Diffusion Models for Low-Light Image Enhancement: A Multi-Perspective Taxonomy and Performance Analysis},
  author={Adhikarla, Eashan and Liu, Yixin and Davison, Brian D},
  journal={arXiv preprint arXiv:2510.05976},
  year={2025}
}