MBCTD

May 25, 2026 Β· View on GitHub

A deep learning model for per-pixel, multi-label detection of building changes in bi-temporal satellite and aerial imagery. MBCTD classifies each pixel independently into three change categories, allowing overlapping labels (e.g., a replacement site marked as both demolished and new simultaneously).


Overview

Given a before image and an after image of the same geographic area, MBCTD produces per-pixel predictions for three classes:

ClassColorMeaning
UnchangedπŸ”΅Building present in both images
DemolishedπŸ”΄Building present in before, absent in after
New🟒Building absent in before, present in after
Replacement (demolished + new)🟑Both demolished and new labels active simultaneously

Because the model is multi-label, a single pixel can belong to more than one class. This makes it possible to represent complex urban transitions that single-label models cannot express.

Visualization samples

Inference samples

LEVIR-CD+
FOTBCD

Architecture

MBCTD uses a Siamese ConvNeXt-Base encoder paired with a full-resolution U-Net decoder:

Before image ──┐
               β”œβ”€β–Ί Shared ConvNeXt-Base encoder ──► Change fusion at each scale
After image  β”€β”€β”˜                                           β”‚
                                                           β–Ό
                                               Full-resolution U-Net decoder
                                               (PixelShuffle upsampling)
                                                           β”‚
                                                           β–Ό
                                          3 independent sigmoid heads per pixel

Key design decisions:

  • Shared encoder weights β€” the same ConvNeXt trunk processes both images, making the feature space comparable by construction.
  • Change fusion β€” at each encoder scale, before/after features are combined as [before, after, beforeβˆ’after, |beforeβˆ’after|] and projected through 1Γ—1β†’3Γ—3 convolutions.
  • High-resolution skip connections β€” in addition to encoder skips (1/32 β†’ 1/4), raw input images are injected at 1/2 and 1/1 resolution to preserve fine-grained boundary information.
  • PixelShuffle upsampling β€” learned upsampling at every decoder stage avoids checkerboard artifacts.
  • Pre-trained backbone β€” ConvNeXt-Base initialised with DINOv3 LVD1689M weights.

Project Structure

MBCTD/
β”œβ”€β”€ model.py               # Model definition (encoder, fusion, decoder)
β”œβ”€β”€ config.py              # MBCTDConfig dataclass
β”œβ”€β”€ inference.py           # load_model, predict_patch, visualisation helpers
β”œβ”€β”€ demo.py                # Interactive Gradio web demo
└── environment.yml        # Conda environment spec

Installation

1. Clone the repository

git clone git@github.com:abdelpy/MBCTD
cd MBCTD

2. Create the Conda environment

conda env create -f environment.yml
conda activate mbctd

3. Install PyTorch

Follow the official instructions to install PyTorch matching your CUDA version.

4. Download model weights

Pre-trained weights are available on Google Drive.

Usage

Interactive demo

The fastest way to try the model is the Gradio web interface:

python demo.py path/to/model.pth

Open the URL printed in your terminal. The UI lets you:

  • Upload a before and after image pair
  • Adjust the confidence threshold (0.1 – 0.9, default 0.7)
  • Choose an inference mode:
    • patch β€” tile the image into 256 px patches and stitch predictions back (handles large images)
    • full β€” run at the image's original resolution
  • Inspect the overlay on the after image, the colour mask, and per-class pixel coverage statistics

Programmatic inference

from PIL import Image
import numpy as np
import torch
from inference import load_model, predict_patch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = load_model("model.pth", device)

# Load images as uint8 RGB numpy arrays
before = np.array(Image.open("before.png").convert("RGB"))
after  = np.array(Image.open("after.png").convert("RGB"))

result = predict_patch(before, after, model, threshold=0.7)

predict_patch returns a dictionary:

KeyShapedtypeDescription
binary(3, H, W)uint8Per-class binary masks (unchanged / demolished / new)
class_map(H, W)uint8Collapsed single-label class ID (0 = background, 1–4 = see colour table above)
overlay(H, W, 3)uint8Semi-transparent colour overlay drawn on the after image
mask_rgb(H, W, 3)uint8Solid-colour mask visualisation

Training

MBCTD was trained exclusively on FOTBCD β€” a large-scale, multi-label building change dataset β€” using 256 Γ— 256 px patches drawn from over 220,000 before/after aerial image pairs across France.

FOTBCD β€” the dataset behind the model

FOTBCD is the first dataset with multi-label building change annotations as vector polygons covering demolished, new, and unchanged structures simultaneously. At 220k+ georeferenced pairs spanning diverse urban environments, it is an order of magnitude larger than existing change detection benchmarks β€” and the richness of its labels is what makes a model like MBCTD possible.

The dataset is available for licensing.
Whether you are building an urban monitoring platform, a real-estate analytics product, or a geospatial AI pipeline, FOTBCD gives you the ground truth that generic benchmarks cannot provide. Get in touch to discuss licensing terms.


Results

Binary change detection metrics (demolished OR new β†’ "changed") are reported for both benchmarks to enable comparison; LEVIR-CD+ provides only binary ground truth so no per-class breakdown is available for it. For FOTBCD, which supports multi-label annotations, per-class IoU is also reported.
Inference threshold selected by best F1 on each benchmark.

LEVIR-CD+ (full-resolution inference, threshold = 0.75)

MetricValue
Precision0.7694
Recall0.8137
F10.7909
IoU change0.6541
mIoU0.8180
OA0.9825

FOTBCD (full-resolution inference, threshold = 0.70)

Binary change detection

MetricValue
Precision0.8948
Recall0.9201
F10.9073
IoU change0.8303
mIoU0.9094
OA0.9891

Per-class IoU

ClassIoU
Unchanged0.7774
Demolished0.8166
New0.8198

License

This project is licensed under CC BY-NC 4.0