Emb2Heights: Urban Structure and Land Cover Prediction

April 1, 2026 · View on GitHub

This repository is a baseline for the Emb2Heights challenge. It trains and runs inference for a model that predicts sub-pixel land cover percentages (Building, Vegetation, Water) and continuous structure heights (nDSM) directly from Earth Observation embeddings. Predictions are saved as .npy files with 4 output channels: [% Building, % Vegetation, % Water, Height (m)].

Project Overview

Predicting urban morphology from satellite imagery is challenging: building footprints are sparse, and height values operate on a different scale than land-cover probabilities. This project addresses these challenges through a composite loss with 4 terms:

  • MAE (with background/foreground split): direct pixel-level regression.
  • SSIM + Gradient Loss: enforces sharp structural boundaries on land-cover channels.
  • Tversky Loss: penalizes false negatives heavily, forcing the model to capture sparse building footprints (α=0.3, β=0.7).
  • Structure-Boosted Height Loss: height errors on building pixels are penalized 2x more than background pixels.

Training is further stabilized with AdamW (weight decay) and gradient clipping to prevent collapse on complex urban patches.


Repository Structure

emb2heights_baselines/
├── core/
│   ├── __init__.py
│   ├── model.py        # LightUNet + Decoder model factory
│   ├── dataset.py      # Dataset classes + embedding/label pairing utilities
│   └── losses.py       # ImprovedCompositeLoss (MAE, SSIM, Gradient, Tversky)
├── train.py            # Training entrypoint (fully CLI-configurable)
├── predict.py          # Inference entrypoint (loads checkpoint, saves .npy predictions)
├── environment.yml     # Conda environment definition
├── readme.md
└── runs/               # Auto-generated experiment outputs
    └── <experiment_name>/
        ├── model_best.pth
        ├── model_last.pth
        ├── loss_curve.png
        ├── training_params.txt
        ├── visualizations/
        └── predictions/

Setup

Create and activate the conda environment:

conda env create -f environment.yml
conda activate emb2heights

Model Architecture

Architecture is selected via --model-type:

ValueDescription
lightunetLightweight encoder-decoder with skip connections
decoderTransposed-convolution decoder
decoder_residualDeeper decoder with residual blocks + global embedding skip fusion (recommended for high-channel embeddings)
autoSelects decoder when input channels = 768, otherwise lightunet

Output: a 4-channel tensor — [0: % Building, 1: % Vegetation, 2: % Water, 3: Height (m)].

Loss function: ImprovedCompositeLoss with 4 terms — see Project Overview.


Training

Run training from the CLI — no file edits needed.

python train.py \
    --model-type decoder_residual \
    --train-embeddings-dir /path/to/train/embeddings \
    --train-targets-dir /path/to/train/labels \
    --test-embeddings-dir /path/to/test/embeddings \
    --test-targets-dir /path/to/test/labels \
    --experiment-name my_run \
    --epochs 30 \
    --batch-size 8 \
    --patch-size 256

Arguments

ArgumentDefaultDescription
--model-typeautoArchitecture: auto, lightunet, decoder, decoder_residual
--train-embeddings-dirPath to training embedding .tif files
--train-targets-dirPath to training label .tif files
--test-embeddings-dirPath to test embeddings (used for post-training visualization)
--test-targets-dirPath to test labels (used for post-training visualization)
--experiment-nameterramid_run02Subfolder name under ./runs/
--epochs30Number of training epochs
--batch-size32Batch size
--patch-size256Spatial crop size for dataset loader

Outputs are written to ./runs/<experiment_name>/: hyperparameter log, model_best.pth, model_last.pth, loss curve, and sample visualizations.


Inference

Load a trained checkpoint and save predictions as .npy files (shape [4, H, W], channels: building %, vegetation %, water %, height in meters).

python predict.py \
    --experiment-name my_run \
    --model-type decoder_residual \
    --test-embeddings-dir /path/to/test/embeddings \
    --test-targets-dir /path/to/test/labels

Arguments

ArgumentDefaultDescription
--experiment-nameterramind_decoder_run01Experiment folder under --base-dir
--base-dir./runsRoot directory of experiment folders
--model-typedecoder_residualArchitecture (must match training)
--model-path<base-dir>/<experiment-name>/model_best.pthPath to .pth checkpoint
--test-embeddings-dirrequiredDirectory with embedding .tif files
--test-targets-dirrequiredDirectory with label .tif files (used only for file pairing)
--predictions-dir<base-dir>/<experiment-name>/predictionsOutput directory for .npy files
--patch-size256Spatial crop size
--max-samples0 (all)Limit inference to N samples

Each output file is named pred_<core_id>.npy and contains a float32 array of shape [4, H, W]:

  • Channel 0: Building coverage (0–1)
  • Channel 1: Vegetation coverage (0–1)
  • Channel 2: Water coverage (0–1)
  • Channel 3: Normalized surface height in meters