disaster-image-processing
October 26, 2025 · View on GitHub
Interactive Jupyter Notebooks
Full Image Bounding Box Visualization
Chip Bounding Box Visualization
For more information on this project, please visit the project website.
This is the pipeline for processing the image data, tiling the images, preparing the training, validation and test data and training the model in tensorflow. There are separate processes for DigitalGlobe data and for NOAA data. More details on the data used for this project can be found here.
Process Flow
1. Download data
Scrape the image files from source websites and save them in a folder. For DigitalGLobe sorting the image files into 3 band and 1 band folders is required.
The following instructions are for NOAA only:
Run sudo bash downloadTiffs.sh on Ubuntu to download the image files after installing the provided file.
All the files combined will be around 60GB; it is recommended to use a hard drive to ensure that you have enough storage. If you have not used Ubuntu before on your device, you will need to run sudo apt-get install wget in order to be able to run the file.
You may need to delete carriage return characters if the files are not being downloaded properly (e.g. the files are downloaded instantly). To do this, run sed "s/$(printf '\r')\$//" downloadTiffs.sh > downloadTiffs2.sh && mv downloadTiffs2.sh downloadTiffs.sh on Ubuntu.
Each file takes roughly 5-7 minutes to download; please be patient!
2. Compress images
Takes image files. For DigitalGlobe this takes 3 TB and compresses to 60 GB.
Please install both files and ensure that they are in the same directory. You will only need to run compressTiffs.sh, as it will call compressTiffs.py.
You will need Miniconda in your Ubuntu terminal in order to run Python files (we chose Miniconda over Anaconda since Miniconda does not come with any Python packages, which saves us file space which would otherwise be taken up by unnecessary packages). Please go here for instructions on how to install Miniconda on Ubuntu. We recommend downloading the latest version.
Once you have installed Miniconda, you must take the following steps to activate it in Ubuntu:
- Run
sudo -s. - Enter the password associated with your account in Ubuntu.
- First-time only: Run
export PATH=”root/miniconda3/bin:$PATH”in the terminal. This will ensure that Ubuntu points to your Miniconda directory.
You then must install the GDAL package (preferably in a virtual environment). To set up your virtual environment, run conda create -n [env_name] python=[version] (we recommend Python 3.9). You can then activate it anytime by running source activate [env_name] and disable it with source deactivate.
Install the GDAL package by running conda install -c conda-forge gdal while your virtual environment is active.
You may get a syntax error when you run downloadTiffs.sh. To fix this, run vi compressTiffs.sh -> :set ff=unix -> wq!
compressTiffs.sh will automatically go to the folder where the tar files are located, so ensure that the Shell and Python files are located in the directory directly before the noaa_images folder, which should have been created when you ran downloadTiffs.sh.
3. Processing image files
Apply appropriate utility script as necessary based on observations of the data.
4. Tile images
Clip the big tif images into smaller tiles (2048 x 2048) from left to right and top to bottom including a csv of the lat long ranges for each tif image.
5. Index tiles to geojson
From the csv of lat long ranges per tif image and the geojson file of lat longs of bounding boxes with attached tif id produce a geojson of pixel ranges per bounding box with small tif id.
6. Convert lat long to pixel coordinates
SSD requires the training data input as pixel coordinates.
You can verify that both the geospatial and pixel coordinates result in the correct bounding boxes being plotted in BuildingMarker.ipynb. You can manually select a tile of your choice, or test a random tile from your tiles folder.
7. Chip tif files
Convert the tif files to chips-- smaller images that can be used for train/test sets.
8. Split training data
Split the images and geojson file into training, validation and test subsets (8:1:1).
9. Debug dataset
Use ipython notebook to plot bounding boxes over the images (tiff files) to check for accuracy, render the bounding boxes over the tiff files to manually inspect, record bad labels, remove those bounding boxes from the geojson file.
10. Data augmentation
Shift, flip and rotate the images as a way to add more training data.
11. Feed training data to algorithm
Prepare input for the network.