modis2scidb

February 8, 2016 ยท View on GitHub

Python scripts for uploading MODIS images to SciDB. These scripts provide the way to load several MODIS HDF files to a 3-dimension SciDB array by making calls to SciDB data loading tools.

Loading MODIS data to SciDB is a 3 step process:

  1. Export the HDF image file to SciDB's binary. MODIS images are available in HDF format; on the other hand, SciDB is able to load data using SciDB's binary format.
  2. Load the binary image to a 1-dimension SciDB array.
  3. Redimension the array from 1 to 3 dimensions.

The script checkFolder.py monitors a folder looking for SciDB binary data. Each time a new file is found it calls the script load2scidb.py which loads the data into a SciDB 3D array (steps 2 & 3). Loading data to a 3D array is not straight forward, instead, the binary data is loaded first into a temporal 1D array which is re-dimensioned later into a 3D array. Then, the 1D array is deleted. These temporal 1D arrays are named following the pattern load_XXXXXXXXX and they are deleted by the script once the re-dimension is done.

The script hdfs2sdbbin.py exports MODIS data to SciDB binary format into a specific folder. For this, it calls the binary tool for exporting HDF to SciDB binary modis2scidb.

Since the exporting is independent from the loading script, the HDf-to-binary script can be executed on several servers simultaneously while loading is only done by the SciDB's coordinator instance.

Pre-requisites

  • git.
  • Python.
  • SciDB 14.3. SciDB must be installed in the default location
  • These scripts must be installed on the SciDB coordinator instance and they must be ran using an user enabled to execute IQUERY.
  • The binary tool for exporting HDF to SciDB binary called: modis2scidb

Files:

  • LICENSE - License file.
  • README.md - This file.
  • addHdfs2bin.py - Script that export/adds an HDF file to SciDB's binary format.
  • checkFolder.py - Script that checks a folder for SciDB's binary files.
  • load2scidb.py - Script that loads a binary file to a SciDB database.
  • install_pyhdf.sh - Script for installing pyhdf.
  • run.py - It builds the path to the MODIS files and then it calls addHdfs2bin.py.

Instructions:

  1. Download the scripts to the script-folder. Use: git clone https://github.com/albhasan/modis2scidb.git
  2. Use the install_pyhdf.sh script to install pyhdf on the SciDB coordinator instance. For example sudo ./install_pyhdf.sh
  3. Create a destination array in SciDB. This is the dest-array
    • For MOD13Q1: CREATE ARRAY MOD09Q1 <red:int16, nir:int16, quality:uint16> [col_id=48000:72000,1014,5,row_id=38400:62400,1014,5,time_id=0:9200,1,0];
    • For MOD13Q1: CREATE ARRAY MOD13Q1 <ndvi:int16, evi:int16, quality:uint16, red:int16, nir:int16, blue:int16, mir:int16, viewza:int16, sunza:int16, relaza:int16, cdoy:int16, reli:int16> [col_id=48000:72000,502,5,row_id=38400:62400,502,5,time_id=0:9200,1,0];
  4. Create a folder accessible by SciDB. This is the check-folder from where data is loaded to SciDB.
  5. Run checkFolder.py pointing to the check-folder; the files found here will be uploaded to SciDB. For example: python checkFolder.py /home/scidb/toLoad/ /home/scidb/modis2scidb/ MOD09Q1 &
  6. Run addHdfs2bin.py to export MODIS HDFs to binary files. After finishing, the file can be copied to the check-folder. For example:
    • python addHdfs2bin.py /home/scidb/MODIS_ARC/MODIS/MOD09Q1.005/2000.02.18/MOD09Q1.A2000049.h10v08.005.2006268191328.hdf /home/scidb/MOD09Q1.A2000049.h10v08.005.2006268191328.sdbbin
    • mv /home/scidb/MOD09Q1.A2000049.h10v08.005.2006268191328.sdbbin /home/scidb/toLoad/MOD09Q1.A2000049.h10v08.005.2006268191328.sdbbin
  7. NOTE: Alternatively, you can use run.py to make calls to addHdfs2bin.py on many HDFs.