PyCropYieldPrediction-withTransfer

February 12, 2024 · View on GitHub

An extension of Gabriel Tseng's PyTorch implementation of Jiaxuan You's Deep Gaussian Process built on a CNN for soybean crop forecasting in Argentina. In addition, code components from the work "Deep Transfer Learning for Crop Yield Prediction with Remote Sensing Data" are used to export Argentine satellite data.

The code was used to produce the results published in the publication "Leveraging Remote Sensing Data for Yield Prediction with Deep Transfer Learning". The document is open access and can be found at https://www.mdpi.com/1424-8220/24/3/770 . If you find our code helpful, please cite our work as follows:

@Article{s24030770,
AUTHOR = {Huber, Florian and Inderka, Alvin and Steinhage, Volker},
TITLE = {Leveraging Remote Sensing Data for Yield Prediction with Deep Transfer Learning},
JOURNAL = {Sensors},
VOLUME = {24},
YEAR = {2024},
NUMBER = {3},
ARTICLE-NUMBER = {770},
URL = {https://www.mdpi.com/1424-8220/24/3/770},
ISSN = {1424-8220},
DOI = {10.3390/s24030770}
}

Pipeline

USA

Exporting

Run

python run.py export

to export the US satellite data into your Google Drive. You will need up to 165 Gb of storage. The export class allows checkpointing. The Earth Engine Task Manager shows your ongoing tasks. This may take longer. Once all the data has been exported to your Google Drive, you can drag the folders crop_yield-data_image, crop_yield-data_mask and crop_yield-data_temperature into your local data folder (Google Drive Desktop is recommended, otherwise the data will be downloaded in a lot of ZIP files). The yield data can be downloaded from the USDA. Examples of the format can be found in the data directory.

(Optional) Data Cleansing

If you want to use our data cleansing (>2000 cropland pixel) on your own data, you have to run

python run.py data_cleansing

and

python cyp/data/merge_yield_pix-count_usa.py

Note here that the corresponding csv are addressed according to their column orders. The formatting of our data can be found in the data directory.

Preprocessing

python run.py process

Merges data and splits them by year. Saves files as .npy files.

Feature Engineering

python run.py engineer

Generates histograms from the processed .npy files.

(Optional) Hyperparameter tuning

python run.py run_optuna_usa

Non cross-validated hyperparameter search (run hyp_multi_trans_cnn_usa for a ten-fold cross validation, but it's runtime is immense). Results are saved in the data folder with the name given by out_hyp_csv.

Model Training

python run.py train_cnn

Trains the CNN and saves the model and the results in data/models/<new_model>. Additional information are saved into your Weights and Biases account.

Argentina

The basic procedure in Argentina is the same, but in some places paths or names need to be adjusted. The descriptions can be taken from the US Pipeline and are not repeated here.

Exporting

python cyp/data/argentina_export.py

The yield data can be downloaded from the Ministerio de Agricultura. Examples of the format can be found in the data directory.

(Optional) Data Cleansing

python run.py data_cleansing

Adjust the names and paths inside run.py to the Argentinian values as it is commented.

python cyp/data/yield-csv_to-utf_with-buxacre.py

This removes Spanish characters, converts tons per acre to bushels per acre, and applies data cleansing of at least 2000 cropland pixels. The variable YIELDFILE in the head of the script can be changed to the name of your yield data file.

Preprocessing

python run.py process_argentina

Feature Engineering

python run.py arg_engineer

(Optional) Hyperparameter tuning

python run.py run_optuna

Model Training

python run.py train_trans_cnn

To change the referenced US Model, the paths within models/transfer_base.py and models/transfer_convnet.py must be adjusted.

Setup

To set up the environment, the package manager Anaconda with Python 3.7 is required. Run

conda env create -f crop_yield_prediction.yml

to create an environment named crop_yield_prediction and run

conda activate crop_yield_prediction

to activate the environment.
Additionally you need to sign up to Google Earth Engine and authenticate yourself within the crop_yield_prediction environment by runnning

earthengine authenticate

and following the instructions.
Weights and Biases is used to track experiments. Run

wandb login

and follow the instructions to activate wandb. You can also disable it by running

wandb disabled