pseudo-fields

September 18, 2024 ยท View on GitHub

Generating pseudo labels for satellite-based crop field delineation

This repository contains Jupyter notebooks for generating pseudo-labels from pre-trained field delineation model described in corresponding paper.

Requirements:

The pseudo label selection routine grounds on predictions on unlabeled data and uses an array of quality criteria including semantic confidence, instance confidence, and object size to filter high-quality predictions which can be used for weakly supervised fine-tuning of the pre-trained model architecture. The notebook requires access to GDrive to store model weights, sample data, and the DECODE repository.

image

Our example builds on a model pre-trained in France and India, but the approach can be used for any field delineation architecture yielding pixel-level probabilities of cropland extent (for semantic confidence scores) and field boundary (basis for instance confidence scores). We use the model to identify fields in Mozambique, arguably a more complex and heterogeneous region. In our study, we find that using the 99th percentile of the semantic confidence score yields the best results, and that combining pseudo labels with human annotations results in the best performance. Users can define custom thresholds for testing purposes. Pseudo labels can optionally be stored in the format needed to train the FracTAL ResUNet as described in Wang et al. 2022, i.e. three band raster files containing 1) cropland extent, 2) field boundaries, 3) normalized within-field distance to nearest boundary.

For running the code, use Google Colab to execute all cells in pseudo-fields-setip.ipynb. This will generate a folder in your Google Drive (modify path as needed), clone the repository with our sample data, clone the DECODE repository, and download the pre-trained model weights. The routine for generating pseudo labels is provided in pseudo_fields_generate_labels.ipynb. Running all cells sequentially will import and install dependencies (this may take a few minutes), specify all parameters and run the selection routine. Results will be written to output folder by default.