🌾 Sentinel-Yield

April 16, 2026 · View on GitHub

Unsupervised agricultural anomaly detection using satellite foundation model embeddings.

🚨 The Problem

Monitoring crop stress in farms is fundamentally broken using traditional methods:

Optical satellites (like Sentinel-2) are blinded by monsoon cloud cover when monitoring is needed most.
Traditional vegetation indices (NDVI) detect damage after it happens.
Field-level variability is invisible at the coarse 30m resolution of free optical satellites.

💡 The Solution

Instead of diagnosing "stress" via greenness, this project detects statistical deviation in latent space.

It uses Google’s AlphaEarth Satellite Embeddings (10m resolution, multi-modal SAR/Optical fusion) to identify pixels that behave differently from their historical norm bypassing cloud cover and requiring absolutely no ground-truth labels.

🖥️ Demo

Sentinel-Yield Demo

⚙️ Methodology

Multi-Year Baseline (2018–2023): Stack 6 years of annual satellite embeddings for the Ahmednagar district and learn the regional agricultural latent manifold via K-Means clustering.
Latent-Space Anomaly Detection: Compute the Euclidean distance from each 2024 pixel to its nearest historical cluster centroid. The score represents the degree of deviation from the norm.
Proxy Validation: Compare the unsupervised anomaly scores against CHIRPS precipitation data to prove the latent deviations correlate with actual climatic variability.

🏗️ Architecture

[AlphaEarth Embeddings (GEE)]
        ↓
[Offline Clustering (Jupyter)] → Extracts Baseline Manifold
        ↓
[baseline_centroids.json]
        ↓
[Streamlit App (Inference)] → Computes Distance on the Fly
        ↓
[Folium Map + Altair Validation Chart]

🛠️ Tech Stack

Compute: Google Earth Engine (GEE) Python API
Data: AlphaEarth Satellite Embeddings, Copernicus Global Land Cover, UCSB CHIRPS Daily
Frontend: Streamlit, Folium
Math/Data: NumPy, Pandas, Altair

🧠 Key Design Decisions

Why Unsupervised? No much labeled crop stress data exists at scale. We rely on distribution shifts instead of supervised learning.
Why Embeddings instead of NDVI? Embeddings natively fuse SAR (Radar) and optical data. This allows us to detect structural changes in the crops even when heavy monsoon clouds block standard optical sensors.
Why Precomputed Clusters? Training a clustering algorithm over millions of 64-dimensional pixels inside a web app is too slow and hits memory limits. Moving training offline and serving the centroid JSON enables near-instant runtime inference.

⚠️ Limitations & Future Work

Annual Constraint: The current embedding dataset is annual, limiting sub-seasonal phenology tracking.
Proxy Validation: Lacks direct farmer-reported ground truth; relies heavily on meteorological proxies (rainfall).

🚀 Quickstart

# Clone the repo
git clone https://github.com/sanatladkat/sentinel-yield.git
cd sentinel-yield

# Set up the environment
conda env create -f environment.yml
conda activate sentinel-yield

# Add your Google Cloud Project ID
cp .env.example .env
# Edit .env and add: EE_PROJECT_ID="your-project-id"

# Run the dashboard
streamlit run app.py