CloudTran
November 11, 2025 · View on GitHub
:artificial_satellite: Source code accompanying the paper CloudTran++: Improved cloud removal from multi-temporal satellite images using axial transformer networks.
:bulb: Inspired by Colorization Transformer
:black_nib: Authors: Dionysis Christopoulos, Valsamis Ntouskos, Konstantinos Karantzalos
:classical_building: Remote Sensing Lab, National Technical University of Athens, Greece
![]()
Paper summary
We propose a novel cloud removal method for multi-temporal sattelite images, using Axial Transformer networks. Axial attention offers efficiency, with drastically reduced computational cost compared to the standard self-attention blocks, by applying attention independently on each tensor axis, without sacrificing the global receptive field.
Network Architecture
CloudTran consists of two components, the Core and the Upsampler:
- The
corenetwork is an encoder-decoder model based on the Axial Transformer architecture. It performs cloud removal to a downsampled version of the original inputs, in an autoregressive fashion. - The
corenetwork has an auxiliary parallel cloud removal model considering only the encoder outputs. - The
upsampleris a parallel, deterministic self-attention based network. It superesolves the downsampled cloud-free image back to the original resolution.
Our models for both individual components have been trained on two NVIDIA Quadro RTX 6000 GPUs with 24GB of VRAM each. Training with fewer resources can be achieved by adjusting the model size. Additional details regarding hyperparameter tuning on Implementation section on paper!!.
To evaluate the performance of our models and compare the resulted cloud-free images with baselines, we use metrics like PSNR, SSIM, MSE and FID/KID.
Requirements
A conda environment named cloudtran can be created and activated with:
conda env create -f environment.yaml
conda activate cloudtran
Alternatively:
pip install -r requirements.txt
In particular, ensure that the version of TensorFlow is >= 2.6.0
Dataset
In the paper we introduced two new RGB datasets and also adapted the original SEN12MS-CR-TS dataset into the desired format. More details regarding the construction methodologies and specific properties of these datasets can be found in the corresponding section.
Dataset download links:
- In-house dataset: here
- SEN12-MS-CR-TS L1C version (rgb): here
- SEN12-MS-CR-TS L2A version (rgb, America & Europe): here
Dataset Format
For the method to run successfully, the dataset should follow the specified structure shown below:
path/to/dataset/
┣ masks_final_testset
┃ ┣ 00000
┃ ┃ ┣ mask_of_patch0.png
┃ ┃ ┣ ...
┃ ┃ ┗ mask_of_patchT.png # T: number of available dates
┃ ┣ ...
┃ ┗ N_testset # Total number of patches in the testset
┣ masks_final_trainset
┃ ┣ 00000
┃ ┣ ...
┃ ┗ N_trainset # Total number of patches in the trainset
┣ videos_final_testset
┃ ┣ 00000
┃ ┃ ┣ patch0.png
┃ ┃ ┣ ...
┃ ┃ ┗ patchT.png
┃ ┣ ...
┃ ┗ N_testset
┣ videos_final_trainset
┃ ┣ 00000
┃ ┣ ...
┃ ┗ N_trainset
Notes:
- The images can be either .png or .jpeg.
- The images in the three datasets mentioned, are in RGB format with pixel intensities in range [0, 255], represented as unsigned 8-bit integers.
- The associated cloud masks are binary.
- The time-series patches ans their cloud masks are in complete correspondence based on the set (trainset or testset).
- Each indexed folder contains T patches of the same region, where T is the number of available dates in the specified dataset.
Patch name variations:
There could be some variations, regarding the patch names, based on the selected dataset:
- In-house dataset: YYYYMMDD_IDX.jpeg (IDX is a padded 3-digit index)
- SEN12MS-CR-TS (L2A RGB version): TILE_YYYYMMDD_IDX.png (IDX is a padded 5-digit index)
- SEN12MS-CR-TS (L1C RGB version): original-patch-name_rgb_time-index.png
Custom dataset creation
To integrate a new custom dataset, it is recommended to follow one of the formats: either the SEN12MS-CR-TS (L2A version) or the in-house dataset (if there is no info about the tile name).
(OPTIONAL) To implement custom json files for adaptive indexing of the dataset then they should have the following format:
-
JSON file used for trainset:
{ "tile-name": { "index":"reference-index", "region":"roi-(optional)" }, ... }where
indexrefers to the, cloud-free target image, reference time index. -
JSON file used for testset:
{ "tile-name": { "val_index":"validation-reference-index", "test_index":"test-reference-index", "region":"roi-(optional)" }, ... }where
val_indexcorresponds to the reference time index of the cloud-free target image andtest_indexcorresponds to the reference time index of the cloudy image.
Configuration Setup
Ensure proper configuration adjustments are made, according to the desired dataset version and the base network to be utilized. Some exemplary configurations considering configs/cloudtran_core_train.py and configs/cloudtran_core_test.py are provided below for reference.
In-house dataset
# Training
config.random_channel = True
config.resolution = [256, 256]
config.timeline = 6
config.ref_index = 55
config.max_coverage = 30
config.flip_masks = False
config.use_discriminator = True
config.use_augmentation = False
# Validation/Sampling
config.random_channel = False
config.resolution = [256, 256]
config.timeline = 6
config.ref_index = 30
config.max_coverage = 30
config.flip_masks = False
config.use_discriminator = False
SEN12-MS-CR-TS dataset
# Training
config.random_channel = True
config.resolution = [224, 224]
config.timeline = 6
config.ref_index = 16
config.jsonf = './path/to/trainset.json'
config.max_coverage = 50
config.flip_masks = False # True for L1C version
config.use_discriminator = True
config.use_augmentation = True
# Validation/Sampling
config.random_channel = True
config.resolution = [224, 224]
config.timeline = 6
config.ref_index = 21
config.jsonf = './path/to/testset.json'
config.max_coverage = 30
config.flip_masks = False # True for L1C version
config.use_discriminator = False
where config.ref_index in training and validation corresponds to the time index of the target, clear from clouds, image. However, during testing, it corresponds to the time index of the cloudy image that needs to be restored.
If config.jsonf is defined, then config.ref_index is ignored. This adaptive indexing method is only supported for SEN12MS-CR-TS L2A RGB version.
:exclamation: NOTE: Configuration parameters for the upsampler network,
configs/upsampler_train.pyandconfigs/upsampler_test.py, follow a similar approach as the core network configs shown above.
Training
Run the following command to train the core network
MODEL_SIZE=128 python run.py --config ./configs/cloudtran_core_train.py --mode=train --logdir $LOGDIR
To train the Upsampler, replace configs/cloudtran_core_train.py with configs/upsampler_train.py:
MODEL_SIZE=512 python run.py --config ./configs/upsampler_train.py --mode=train --logdir $LOGDIR
:exclamation: NOTE: Before running the training commands make sure to set
config.data_dirandconfig.mask_dirfound in bothconfigs/cloudtran_core_train.pyandconfigs/upsampler_train.pywith the videos_final_trainset and masks_final_trainset paths as shown in the Dataset Format section. Upsampler and Core network can be trained independenlty in parallel.
Evaluation/Validation
For evaluation, which can be concurrent with the training process,
MODEL_SIZE=128 python run.py --config ./configs/cloudtran_core_test.py --mode=eval_valid --logdir /core_ckpt_dir
For validation, after each network has trained,
MODEL_SIZE=128 python sample.py --config ./configs/cloudtran_core_test.py --mode=sample_valid --logdir /core_ckpt_dir
During validation if config.sample.only_parallel = True metrics and the respective resulted images consider only the auxilliary Transformer Encoder outputs.
In order to run these processes for the Upsampler network, replace configs/cloudtran_core_test.py with configs/upsampler_test.py and core_ckpt_dir with up_ckpt_dir.
:exclamation: NOTE: Before running the evaluation/validation and sampling commands make sure to set
config.data_dirandconfig.mask_dirfound in bothconfigs/cloudtran_core_test.pyandconfigs/upsampler_test.pywith the videos_final_testset and masks_final_testset paths as shown in the Dataset Format section.
Sampling
Sampling configurations for each model are described by config.sample ConfigDict at configs/{network}_test.py
- sample.num_outputs - Total number of time-series patches
- sample.logdir - Sample write directory.
- sample.skip_batches - The first
skip_batches*batch_sizeimages are skipped.
Ensure that num_outputs and skip_batches are the same between both components (core & upsampler).
The generated samples are written as TFRecords and as images if config.sample.im_outputs = True to $logdir/${config.sample.logdir}
Core Network
The following command samples low resolution cloud-free 64x64 images.
MODEL_SIZE=128 python sample.py --config ./configs/cloudtran_core_test.py --mode=sample_test --logdir /core_ckpt_dir
Upsampler
The following command superresolves the previous output to high resolution 256x256 output.
MODEL_SIZE=512 python sample.py --config ./configs/upsampler_test.py --mode=sample_test --logdir /up_ckpt_dir
:exclamation: NOTE: Please set
config.sample.gen_data_dirof theconfigs/upsampler_test.pyconfig to$/core_ckpt_dir/${config.sample.logdir}
Multi GPU Sampling
Sampling can be parallelized across batches in a multi-GPU setup with the flag config.sample.skip_batches. For example, in a setup with 2 machines and a batch-size of 20, to generate 100 cloud-free images per-machine set config.sample.skip_batches of the first and second machine to 0 and 5 respectively.
Citation
If you find this work useful, please cite our paper:
@Article{rs17010086,
AUTHOR = {Christopoulos, Dionysis and Ntouskos, Valsamis and Karantzalos, Konstantinos},
TITLE = {CloudTran++: Improved Cloud Removal from Multi-Temporal Satellite Images Using Axial Transformer Networks},
JOURNAL = {Remote Sensing},
VOLUME = {17},
YEAR = {2025},
NUMBER = {1},
ARTICLE-NUMBER = {86},
URL = {https://www.mdpi.com/2072-4292/17/1/86},
ISSN = {2072-4292},
DOI = {10.3390/rs17010086}
}
Extended work from:
@Article{isprs-archives-XLIII-B2-2022-1125-2022,
AUTHOR = {Christopoulos, D. and Ntouskos, V. and Karantzalos, K.},
TITLE = {CLOUDTRAN: CLOUD REMOVAL FROM MULTITEMPORAL SATELLITE IMAGES USING AXIAL TRANSFORMER NETWORKS},
JOURNAL = {The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences},
VOLUME = {XLIII-B2-2022},
YEAR = {2022},
PAGES = {1125--1132},
URL = {https://isprs-archives.copernicus.org/articles/XLIII-B2-2022/1125/2022/},
DOI = {10.5194/isprs-archives-XLIII-B2-2022-1125-2022}
}