Grok S3 / Cloud Storage Support

May 9, 2026 · View on GitHub

Grok supports reading JPEG 2000 files from AWS S3 and S3-compatible object storage (MinIO, DigitalOcean Spaces, Backblaze B2, Cloudflare R2, etc.) via the S3Fetcher.

All AWS environment variables are compatible with standard AWS SDK conventions. Grok-specific configuration uses the GRK_ prefix.

URL Formats

FormatExample
VSI path/vsis3/bucket/path/to/file.jp2
VSI streaming/vsis3_streaming/bucket/path/to/file.jp2
HTTPS URLhttps://s3.us-east-1.amazonaws.com/bucket/file.jp2
HTTP URLhttp://localhost:9000/bucket/file.jp2
Virtual-hosted URLhttps://bucket.s3.us-east-1.amazonaws.com/file.jp2

Credential Chain

Credentials are resolved in the following order (first match wins). This matches GDAL's credential resolution order.

1. Anonymous Access

VariableValuesDescription
AWS_NO_SIGN_REQUESTYES / NOSkip authentication entirely (public buckets)

2. Environment Variables

VariableDescription
AWS_ACCESS_KEY_IDAWS access key
AWS_SECRET_ACCESS_KEYAWS secret key
AWS_SESSION_TOKENTemporary session token (STS, SSO, etc.)

3. Cached Temporary Credentials

Previously obtained temporary credentials (from STS, SSO, EC2, etc.) are cached in memory and reused until they expire, with a 60-second safety margin. The cache is thread-safe and shared across all S3Fetcher instances.

4. AWS Config Files

Reads ~/.aws/credentials and ~/.aws/config (or overridden paths).

VariableDefaultDescription
GRK_AWS_CREDENTIALS_FILE~/.aws/credentialsOverride credentials file path
AWS_CONFIG_FILE~/.aws/configOverride config file path
AWS_PROFILEdefaultAWS profile to use
AWS_DEFAULT_PROFILEdefaultDeprecated alias for AWS_PROFILE

The config file supports these advanced credential sources:

4a. Web Identity Token (role_arn + web_identity_token_file)

For EKS/Kubernetes service accounts (IRSA) configured in the AWS config file:

[profile my-profile]
role_arn = arn:aws:iam::123456789012:role/my-role
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token

4b. STS Assume Role (role_arn + source_profile)

For cross-account access or role chaining:

[profile cross-account]
role_arn = arn:aws:iam::987654321098:role/target-role
source_profile = default
external_id = optional-external-id
role_session_name = optional-session-name

The source profile can itself use web identity credentials (role chaining).

4c. SSO (sso_start_url + sso_account_id + sso_role_name)

For AWS Single Sign-On / IAM Identity Center:

[profile sso-profile]
sso_start_url = https://my-org.awsapps.com/start
sso_account_id = 123456789012
sso_role_name = MyRole

[profile sso-session-profile]
sso_session = my-session
sso_account_id = 123456789012
sso_role_name = MyRole

[sso-session my-session]
sso_start_url = https://my-org.awsapps.com/start

Reads cached SSO tokens from ~/.aws/sso/cache/. Run aws sso login to refresh.

VariableDefaultDescription
GRK_AWS_SSO_ENDPOINTportal.sso.<region>.amazonaws.comOverride SSO endpoint

4d. Credential Process

For external credential providers:

[profile custom]
credential_process = /path/to/credential-provider --arg

The command must output JSON with Version, AccessKeyId, SecretAccessKey, SessionToken, and optionally Expiration fields. See: https://docs.aws.amazon.com/sdkref/latest/guide/feature-process-credentials.html

5. Web Identity Token (from environment)

For EKS pods and OIDC-federated workloads where credentials come from env vars:

VariableDescription
AWS_ROLE_ARNIAM role ARN to assume
AWS_WEB_IDENTITY_TOKEN_FILEPath to OIDC token file
AWS_ROLE_SESSION_NAMESession name (default: grok-session)
GRK_AWS_WEB_IDENTITY_ENABLEYES (default) / NO — disable this method

6. ECS Container Credentials

For tasks running on Amazon ECS or AWS Fargate:

VariableDescription
AWS_CONTAINER_CREDENTIALS_FULL_URIFull URL to credential endpoint
AWS_CONTAINER_CREDENTIALS_RELATIVE_URIRelative path (uses http://169.254.170.2)
AWS_CONTAINER_AUTHORIZATION_TOKEN_FILEPath to auth token file
AWS_CONTAINER_AUTHORIZATION_TOKENAuth token value

7. EC2 Instance Metadata

Last resort: fetches temporary credentials from the EC2 instance metadata service. Uses IMDSv2 (PUT token request) with automatic fallback to IMDSv1.

VariableDefaultDescription
GRK_AWS_EC2_API_ROOT_URLhttp://169.254.169.254Override metadata endpoint
GRK_AWS_AUTODETECT_EC2_DISABLENOSet to YES to skip EC2 metadata

Region Configuration

Resolved in this order:

VariableDescription
AWS_REGIONGDAL-compatible region setting (highest precedence)
AWS_DEFAULT_REGIONStandard AWS SDK region variable
Config file regionFrom [profile X] section in ~/.aws/config
(fallback)us-east-1

Endpoint Configuration

AWS S3

By default, requests go to s3.<region>.amazonaws.com.

S3-Compatible Storage (MinIO, R2, etc.)

VariableExampleDescription
AWS_S3_ENDPOINThttp://localhost:9000Custom S3 endpoint
AWS_HTTPSYES / NOUse HTTPS (default: YES)
AWS_VIRTUAL_HOSTINGTRUE / FALSEVirtual-hosted style URLs (default: FALSE)

MinIO Example

export AWS_S3_ENDPOINT=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_HTTPS=NO
export AWS_VIRTUAL_HOSTING=FALSE

grk_decompress -i /vsis3/mybucket/image.jp2 -o output.tif

STS Endpoint

VariableDefaultDescription
AWS_STS_REGIONAL_ENDPOINTSregionalregionalsts.<region>.amazonaws.com, other → sts.amazonaws.com
GRK_AWS_STS_ROOT_URL(auto)Override STS endpoint entirely

Requester Pays

VariableValuesDescription
AWS_REQUEST_PAYERrequesterAdds x-amz-request-payer: requester header to all requests

HTTP / Curl Configuration

These variables are handled in the auth() method and apply to all S3 requests.

SSL / TLS

VariableDefaultDescription
GRK_CURL_ALLOW_INSECURENODisable SSL certificate verification

Timeouts

VariableDefaultDescription
GRK_CURL_TIMEOUT(none)Request timeout in seconds

Caching

VariableDefaultDescription
GRK_CURL_CACHE_SIZE(none)Curl buffer size in bytes

Connection Reuse

VariableDescription
GRK_CURL_NON_CACHEDColon-separated list of prefixes to disable connection reuse for (e.g. /vsis3/)

Proxy

VariableDescription
GRK_CURL_PROXYProxy URL
GRK_CURL_PROXYUSERPWDProxy credentials (user:password)
GRK_CURL_PROXYAUTHProxy auth type (any value enables CURLAUTH_ANY)

Retry

The CurlFetcher base class provides retry logic with configurable limits. Default: 3 retries with 1-second delay between attempts.

File Paths

VariableDefaultDescription
GRK_AWS_ROOT_DIR~/.awsOverride AWS config root directory

Request Signing

S3 requests are signed using AWS Signature Version 4 via libcurl's built-in CURLOPT_AWS_SIGV4 support. The signing region is derived from the resolved region configuration. The x-amz-date and x-amz-security-token headers are added automatically.

Architecture

Credential Caching

Temporary credentials (from STS AssumeRole, Web Identity, SSO, ECS, EC2) are cached in a static, thread-safe CredentialCache shared across all S3Fetcher instances. Credentials are reused until 60 seconds before expiration, then automatically refreshed on the next request.

Connection Pooling

The CurlFetcher base class maintains a curl_multi handle with up to 100 concurrent connections. Tile and chunk fetch requests are batched and processed by a background worker thread using a producer/consumer pattern.

Supported Fetcher Types

In addition to S3, Grok supports these cloud storage fetchers (each with its own environment variable configuration):

PrefixFetcherService
/vsis3/S3FetcherAWS S3, MinIO, R2, B2, etc.
/vsigs/GSFetcherGoogle Cloud Storage
/vsiaz/AZFetcherAzure Blob Storage
/vsiadls/ADLSFetcherAzure Data Lake Gen2
/vsicurl/HTTPFetcherGeneric HTTP/HTTPS
https://HTTPFetcherDirect HTTPS URL