PyCropYieldPrediction-withTransfer
February 12, 2024 ยท View on GitHub
An extension of Gabriel Tseng's PyTorch implementation of Jiaxuan You's Deep Gaussian Process built on a CNN for soybean crop forecasting in Argentina. In addition, code components from the work "Deep Transfer Learning for Crop Yield Prediction with Remote Sensing Data" are used to export Argentine satellite data.
The code was used to produce the results published in the publication "Leveraging Remote Sensing Data for Yield Prediction with Deep Transfer Learning". The document is open access and can be found at https://www.mdpi.com/1424-8220/24/3/770 . If you find our code helpful, please cite our work as follows:
@Article{s24030770,
AUTHOR = {Huber, Florian and Inderka, Alvin and Steinhage, Volker},
TITLE = {Leveraging Remote Sensing Data for Yield Prediction with Deep Transfer Learning},
JOURNAL = {Sensors},
VOLUME = {24},
YEAR = {2024},
NUMBER = {3},
ARTICLE-NUMBER = {770},
URL = {https://www.mdpi.com/1424-8220/24/3/770},
ISSN = {1424-8220},
DOI = {10.3390/s24030770}
}
Pipeline
USA
Exporting
Run
python run.py export
to export the US satellite data into your Google Drive. You will need up to 165 Gb of storage. The export class allows checkpointing.
The Earth Engine Task Manager shows your ongoing tasks. This may take longer.
Once all the data has been exported to your Google Drive, you can drag the folders crop_yield-data_image, crop_yield-data_mask and
crop_yield-data_temperature into your local data folder (Google Drive Desktop is recommended,
otherwise the data will be downloaded in a lot of ZIP files).
The yield data can be downloaded from the USDA. Examples of the format can be found in the data directory.
(Optional) Data Cleansing
If you want to use our data cleansing (>2000 cropland pixel) on your own data, you have to run
python run.py data_cleansing
and
python cyp/data/merge_yield_pix-count_usa.py
Note here that the corresponding csv are addressed according to their column orders. The formatting of our data can be found in the data directory.
Preprocessing
python run.py process
Merges data and splits them by year. Saves files as .npy files.
Feature Engineering
python run.py engineer
Generates histograms from the processed .npy files.
(Optional) Hyperparameter tuning
python run.py run_optuna_usa
Non cross-validated hyperparameter search (run hyp_multi_trans_cnn_usa for a ten-fold cross validation, but it's runtime is immense).
Results are saved in the data folder with the name given by out_hyp_csv.
Model Training
python run.py train_cnn
Trains the CNN and saves the model and the results in data/models/<new_model>. Additional information are saved into your Weights and Biases account.
Argentina
The basic procedure in Argentina is the same, but in some places paths or names need to be adjusted. The descriptions can be taken from the US Pipeline and are not repeated here.
Exporting
python cyp/data/argentina_export.py
The yield data can be downloaded from the Ministerio de Agricultura. Examples of the format can be found in the data directory.
(Optional) Data Cleansing
python run.py data_cleansing
Adjust the names and paths inside run.py to the Argentinian values as it is commented.
python cyp/data/yield-csv_to-utf_with-buxacre.py
This removes Spanish characters, converts tons per acre to bushels per acre, and applies data cleansing of at least 2000 cropland pixels.
The variable YIELDFILE in the head of the script can be changed to the name of your yield data file.
Preprocessing
python run.py process_argentina
Feature Engineering
python run.py arg_engineer
(Optional) Hyperparameter tuning
python run.py run_optuna
Model Training
python run.py train_trans_cnn
To change the referenced US Model, the paths within models/transfer_base.py and models/transfer_convnet.py must be adjusted.
Setup
To set up the environment, the package manager Anaconda with Python 3.7 is required. Run
conda env create -f crop_yield_prediction.yml
to create an environment named crop_yield_prediction and run
conda activate crop_yield_prediction
to activate the environment.
Additionally you need to sign up to Google Earth Engine
and authenticate yourself within the crop_yield_prediction environment by runnning
earthengine authenticate
and following the instructions.
Weights and Biases is used to track experiments. Run
wandb login
and follow the instructions to activate wandb. You can also disable it by running
wandb disabled