DEA Prototype Code
November 18, 2025 ยท View on GitHub
DEA Prototype Code
This repository provides developmental libraries and CLI tools for Open Datacube.
- AWS S3 tools
- CLIs for using ODC data from AWS S3 and SQS
- Utilities for data visualizations in notebooks
- Experiments on optimising Rasterio usage on AWS S3
Full list of libraries, and install instructions:
odc.uitools for data visualization in notebook/labodc.iocommon IO utilities, used by apps mainlyodc-cloud[ASYNC,AZURE,THREDDS]cloud crawling support packageodc.awsAWS/S3 utilities, used by apps mainlyodc.aiofaster concurrent fetching from S3 with async, used by appsodc-cloud[ASYNC]odc.{thredds,azure}internal libs for cloud IOodc-cloud[THREDDS,AZURE]
Promoted to their own repositories
odc.statslarge scale processing framework (Moved to odc-stats)odc.stacSTAC to ODC conversion tools (Moved to odc-stac)odc.dscacheexperimental key-value store wherekey=UUID,value=Dataset(moved to odc-dscache)
Installation
Libraries and applications in this repository are published to PyPI, and can be installed
with pip like so:
pip install \
odc-ui \
odc-io \
odc-cloud[ASYNC]
For Conda Users
Some odc-tools are available via conda from the conda-forge channel.
conda install -c conda-forge odc-apps-dc-tools odc-io odc-cloud
Cloud Tools
Installation
Cloud tools depend on the aiobotocore package, which depends on specific
versions of botocore. Another package we use, boto3, also depends on
specific versions of botocore. As a result, having both aiobotocore and
boto3 in one environment can be a bit tricky. The way to solve this
is to install aiobotocore[awscli,boto3] before anything else, which will install
compatible versions of boto3 and awscli into the environment.
pip install -U "aiobotocore[awscli,boto3]==1.3.3"
# OR for conda setups
conda install "aiobotocore==1.3.3" boto3 awscli
- For cloud (AWS only)
pip install odc-apps-dc-tools - For cloud (AZURE, GCP, THREDDS and AWS)
pip install odc-apps-dc-tools[AZURE,GCP,THREDDS] - For
dc-index-from-tar(indexing to datacube from tar archive)pip install odc-apps-dc-tools
Apps
s3-findlist S3 bucket with wildcards3-to-tarfetch documents from S3 and dump them to a tar archivegs-to-tarsearch GS for documents and dump them to a tar archivedc-index-from-tarread yaml documents from a tar archive and add them to datacube
Example:
#!/bin/bash
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/**/*.yaml'
s3-find "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineage
Fastest way to list regularly placed files is to use fixed depth listing:
#!/bin/bash
# only works when your metadata is same depth and has fixed file name
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/*/*/ARD-METADATA.yaml'
s3-find --skip-check "${s3_src}" | \
s3-to-tar | \
dc-index-from-tar --env s2 --ignore-lineage
When using Google Storage:
#!/bin/bash
# Google Storage support
gs-to-tar --bucket data.deadev.com --prefix mangrove_cover
dc-index-from-tar --protocol gs --env mangroves --ignore-lineage metadata.tar.gz
Local Development
The following steps are used in the GitHub Actions workflow main.yml
# install all packages in edit mode
./scripts/dev-install.sh --extra tests
# setup database for testing
./scripts/setup-test-db.sh
# run test
echo "Running Tests"
uv run pytest --cov=. \
--cov-report=html \
--cov-report=xml:coverage.xml \
--timeout=30 \
libs apps
Release Process
- Manually edit
{lib,app}/{pkg}/odc/{pkg}/_version.pyfile to increase version number - Merge changes to the
developbranch via a Pull Request - Fast-forward the
pypi/publishbranch to matchdevelop - Push to GitHub
Steps 3 and 4 can be done by an authorized user with
./scripts/sync-publish-branch.sh script.
Publishing to PyPi happens automatically when changes are
pushed to the protected pypi/publish branch. Only members of Open Datacube
Admins group have the
permission to push to this branch.