Grok S3 / Cloud Storage Support
May 9, 2026 · View on GitHub
Grok supports reading JPEG 2000 files from AWS S3 and S3-compatible object storage
(MinIO, DigitalOcean Spaces, Backblaze B2, Cloudflare R2, etc.) via the S3Fetcher.
All AWS environment variables are compatible with standard AWS SDK conventions.
Grok-specific configuration uses the GRK_ prefix.
URL Formats
| Format | Example |
|---|---|
| VSI path | /vsis3/bucket/path/to/file.jp2 |
| VSI streaming | /vsis3_streaming/bucket/path/to/file.jp2 |
| HTTPS URL | https://s3.us-east-1.amazonaws.com/bucket/file.jp2 |
| HTTP URL | http://localhost:9000/bucket/file.jp2 |
| Virtual-hosted URL | https://bucket.s3.us-east-1.amazonaws.com/file.jp2 |
Credential Chain
Credentials are resolved in the following order (first match wins). This matches GDAL's credential resolution order.
1. Anonymous Access
| Variable | Values | Description |
|---|---|---|
AWS_NO_SIGN_REQUEST | YES / NO | Skip authentication entirely (public buckets) |
2. Environment Variables
| Variable | Description |
|---|---|
AWS_ACCESS_KEY_ID | AWS access key |
AWS_SECRET_ACCESS_KEY | AWS secret key |
AWS_SESSION_TOKEN | Temporary session token (STS, SSO, etc.) |
3. Cached Temporary Credentials
Previously obtained temporary credentials (from STS, SSO, EC2, etc.) are cached in memory and reused until they expire, with a 60-second safety margin. The cache is thread-safe and shared across all S3Fetcher instances.
4. AWS Config Files
Reads ~/.aws/credentials and ~/.aws/config (or overridden paths).
| Variable | Default | Description |
|---|---|---|
GRK_AWS_CREDENTIALS_FILE | ~/.aws/credentials | Override credentials file path |
AWS_CONFIG_FILE | ~/.aws/config | Override config file path |
AWS_PROFILE | default | AWS profile to use |
AWS_DEFAULT_PROFILE | default | Deprecated alias for AWS_PROFILE |
The config file supports these advanced credential sources:
4a. Web Identity Token (role_arn + web_identity_token_file)
For EKS/Kubernetes service accounts (IRSA) configured in the AWS config file:
[profile my-profile]
role_arn = arn:aws:iam::123456789012:role/my-role
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
4b. STS Assume Role (role_arn + source_profile)
For cross-account access or role chaining:
[profile cross-account]
role_arn = arn:aws:iam::987654321098:role/target-role
source_profile = default
external_id = optional-external-id
role_session_name = optional-session-name
The source profile can itself use web identity credentials (role chaining).
4c. SSO (sso_start_url + sso_account_id + sso_role_name)
For AWS Single Sign-On / IAM Identity Center:
[profile sso-profile]
sso_start_url = https://my-org.awsapps.com/start
sso_account_id = 123456789012
sso_role_name = MyRole
[profile sso-session-profile]
sso_session = my-session
sso_account_id = 123456789012
sso_role_name = MyRole
[sso-session my-session]
sso_start_url = https://my-org.awsapps.com/start
Reads cached SSO tokens from ~/.aws/sso/cache/. Run aws sso login to refresh.
| Variable | Default | Description |
|---|---|---|
GRK_AWS_SSO_ENDPOINT | portal.sso.<region>.amazonaws.com | Override SSO endpoint |
4d. Credential Process
For external credential providers:
[profile custom]
credential_process = /path/to/credential-provider --arg
The command must output JSON with Version, AccessKeyId, SecretAccessKey,
SessionToken, and optionally Expiration fields.
See: https://docs.aws.amazon.com/sdkref/latest/guide/feature-process-credentials.html
5. Web Identity Token (from environment)
For EKS pods and OIDC-federated workloads where credentials come from env vars:
| Variable | Description |
|---|---|
AWS_ROLE_ARN | IAM role ARN to assume |
AWS_WEB_IDENTITY_TOKEN_FILE | Path to OIDC token file |
AWS_ROLE_SESSION_NAME | Session name (default: grok-session) |
GRK_AWS_WEB_IDENTITY_ENABLE | YES (default) / NO — disable this method |
6. ECS Container Credentials
For tasks running on Amazon ECS or AWS Fargate:
| Variable | Description |
|---|---|
AWS_CONTAINER_CREDENTIALS_FULL_URI | Full URL to credential endpoint |
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI | Relative path (uses http://169.254.170.2) |
AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE | Path to auth token file |
AWS_CONTAINER_AUTHORIZATION_TOKEN | Auth token value |
7. EC2 Instance Metadata
Last resort: fetches temporary credentials from the EC2 instance metadata service. Uses IMDSv2 (PUT token request) with automatic fallback to IMDSv1.
| Variable | Default | Description |
|---|---|---|
GRK_AWS_EC2_API_ROOT_URL | http://169.254.169.254 | Override metadata endpoint |
GRK_AWS_AUTODETECT_EC2_DISABLE | NO | Set to YES to skip EC2 metadata |
Region Configuration
Resolved in this order:
| Variable | Description |
|---|---|
AWS_REGION | GDAL-compatible region setting (highest precedence) |
AWS_DEFAULT_REGION | Standard AWS SDK region variable |
Config file region | From [profile X] section in ~/.aws/config |
| (fallback) | us-east-1 |
Endpoint Configuration
AWS S3
By default, requests go to s3.<region>.amazonaws.com.
S3-Compatible Storage (MinIO, R2, etc.)
| Variable | Example | Description |
|---|---|---|
AWS_S3_ENDPOINT | http://localhost:9000 | Custom S3 endpoint |
AWS_HTTPS | YES / NO | Use HTTPS (default: YES) |
AWS_VIRTUAL_HOSTING | TRUE / FALSE | Virtual-hosted style URLs (default: FALSE) |
MinIO Example
export AWS_S3_ENDPOINT=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_HTTPS=NO
export AWS_VIRTUAL_HOSTING=FALSE
grk_decompress -i /vsis3/mybucket/image.jp2 -o output.tif
STS Endpoint
| Variable | Default | Description |
|---|---|---|
AWS_STS_REGIONAL_ENDPOINTS | regional | regional → sts.<region>.amazonaws.com, other → sts.amazonaws.com |
GRK_AWS_STS_ROOT_URL | (auto) | Override STS endpoint entirely |
Requester Pays
| Variable | Values | Description |
|---|---|---|
AWS_REQUEST_PAYER | requester | Adds x-amz-request-payer: requester header to all requests |
HTTP / Curl Configuration
These variables are handled in the auth() method and apply to all S3 requests.
SSL / TLS
| Variable | Default | Description |
|---|---|---|
GRK_CURL_ALLOW_INSECURE | NO | Disable SSL certificate verification |
Timeouts
| Variable | Default | Description |
|---|---|---|
GRK_CURL_TIMEOUT | (none) | Request timeout in seconds |
Caching
| Variable | Default | Description |
|---|---|---|
GRK_CURL_CACHE_SIZE | (none) | Curl buffer size in bytes |
Connection Reuse
| Variable | Description |
|---|---|
GRK_CURL_NON_CACHED | Colon-separated list of prefixes to disable connection reuse for (e.g. /vsis3/) |
Proxy
| Variable | Description |
|---|---|
GRK_CURL_PROXY | Proxy URL |
GRK_CURL_PROXYUSERPWD | Proxy credentials (user:password) |
GRK_CURL_PROXYAUTH | Proxy auth type (any value enables CURLAUTH_ANY) |
Retry
The CurlFetcher base class provides retry logic with configurable limits. Default: 3 retries with 1-second delay between attempts.
File Paths
| Variable | Default | Description |
|---|---|---|
GRK_AWS_ROOT_DIR | ~/.aws | Override AWS config root directory |
Request Signing
S3 requests are signed using AWS Signature Version 4 via libcurl's built-in
CURLOPT_AWS_SIGV4 support. The signing region is derived from the resolved
region configuration. The x-amz-date and x-amz-security-token headers
are added automatically.
Architecture
Credential Caching
Temporary credentials (from STS AssumeRole, Web Identity, SSO, ECS, EC2) are
cached in a static, thread-safe CredentialCache shared across all S3Fetcher
instances. Credentials are reused until 60 seconds before expiration, then
automatically refreshed on the next request.
Connection Pooling
The CurlFetcher base class maintains a curl_multi handle with up to 100
concurrent connections. Tile and chunk fetch requests are batched and processed
by a background worker thread using a producer/consumer pattern.
Supported Fetcher Types
In addition to S3, Grok supports these cloud storage fetchers (each with its own environment variable configuration):
| Prefix | Fetcher | Service |
|---|---|---|
/vsis3/ | S3Fetcher | AWS S3, MinIO, R2, B2, etc. |
/vsigs/ | GSFetcher | Google Cloud Storage |
/vsiaz/ | AZFetcher | Azure Blob Storage |
/vsiadls/ | ADLSFetcher | Azure Data Lake Gen2 |
/vsicurl/ | HTTPFetcher | Generic HTTP/HTTPS |
https:// | HTTPFetcher | Direct HTTPS URL |