Functions

November 14, 2025 ยท View on GitHub

By default, functions are installed into the public schema. You can choose an alternate location by running CREATE EXTENSION pg_duckdb WITH SCHEMA your_schema_name.

Note: ALTER EXTENSION is not currently supported for moving the extension to a different schema.

Data Lake Functions

NameDescription
read_parquetRead a Parquet file
read_csvRead a CSV file
read_jsonRead a JSON file
iceberg_scanRead an Iceberg dataset
iceberg_metadataRead Iceberg metadata
iceberg_snapshotsRead Iceberg snapshot information
delta_scanRead a Delta dataset

JSON Functions

All of the DuckDB json functions and aggregates. Postgres JSON/JSONB functions are not supported.

Union Type Functions

NameDescription
union_extractExtracts a value from a union type by tag name.
union_tagGets the tag name of the active member in a union type.

MAP Functions

All of the DuckDB map functions.

NameDescription
cardinalityReturn the size of the map
element_atReturn the value for a given key as a list
map_concatMerge multiple maps
map_containsCheck if a map contains a given key
map_contains_entryCheck if a map contains a given key-value pair
map_contains_valueCheck if a map contains a given value
map_entriesReturn a list of struct(k, v) for each key-value pair
map_extractExtract a value from a map using a key
map_extract_valueReturn the value for a given key or NULL
map_from_entriesCreate a map from an array of struct(k, v)
map_keysGet all keys from a map as a list
map_valuesGet all values from a map as a list

Aggregates

NameDescription
approx_count_distinctApproximates the count of distinct elements using HyperLogLog.

Sampling Functions

NameDescription
TABLESAMPLESamples a subset of rows from a table or query result.

Time Functions

NameDescription
time_bucketBuckets timestamps into time intervals for time-series analysis.
strftimeFormats timestamps as strings using format codes.
strptimeParses strings into timestamps using format codes.
epochConverts timestamps to Unix epoch seconds.
epoch_msConverts timestamps to Unix epoch milliseconds.
epoch_usConverts timestamps to Unix epoch microseconds.
epoch_nsConverts timestamps to Unix epoch nanoseconds.
make_timestampCreates a timestamp from microseconds since epoch.
make_timestamptzCreates a timestamp with timezone from microseconds since epoch.

DuckDB Administration Functions

NameDescription
duckdb.install_extensionInstalls a DuckDB extension.
duckdb.load_extensionLoads a DuckDB extension for the current session.
duckdb.autoload_extensionConfigures whether an extension should be auto-loaded.
duckdb.queryRuns a SELECT query directly against DuckDB.
duckdb.raw_queryRuns any query directly against DuckDB (for debugging).
duckdb.recycle_ddbResets the DuckDB instance in the current connection (for debugging).

Secrets Management Functions

NameDescription
duckdb.create_simple_secretCreates a simple secret for cloud storage access.
duckdb.create_azure_secretCreates an Azure secret using a connection string.

Motherduck Functions

NameDescription
duckdb.enable_motherduckEnables MotherDuck integration with a token.
duckdb.is_motherduck_enabledChecks if MotherDuck integration is enabled.
duckdb.force_motherduck_syncForces a full resync of MotherDuck databases and schemas to Postgres (for debugging).

Detailed Descriptions

read_parquet(path TEXT or TEXT[], ...) -> SETOF duckdb.row

Reads a parquet file, either from a remote location (via httpfs) or a local file.

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM read_parquet('file.parquet');
SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM read_parquet('file.parquet');

Further information:

Required Arguments
NameTypeDescription
pathtext or text[]The path, either to a remote httpfs file or a local file (if enabled), of the parquet file(s) to read. The path can be a glob or array of files to read.
Optional Parameters

Optional parameters mirror DuckDB's read_parquet function. To specify optional parameters, use parameter := 'value'.

read_csv(path TEXT or TEXT[], ...) -> SETOF duckdb.row

Reads a CSV file, either from a remote location (via httpfs) or a local file.

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM read_csv('file.csv');
SELECT r['id'], r['name'] FROM read_csv('file.csv') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM read_csv('file.csv');

Further information:

Required Arguments
NameTypeDescription
pathtext or text[]The path, either to a remote httpfs file or a local file (if enabled), of the CSV file(s) to read. The path can be a glob or array of files to read.
Optional Parameters

Optional parameters mirror DuckDB's read_csv function. To specify optional parameters, use parameter := 'value'.

Compatibility notes:

  • columns is not currently supported.
  • nullstr must be an array (TEXT[]).

read_json(path TEXT or TEXT[], ...) -> SETOF duckdb.row

Reads a JSON file, either from a remote location (via httpfs) or a local file.

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM read_json('file.json');
SELECT r['id'], r['name'] FROM read_json('file.json') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM read_json('file.json');

Further information:

Required Arguments
NameTypeDescription
pathtext or text[]The path, either to a remote httpfs file or a local file (if enabled), of the JSON file(s) to read. The path can be a glob or array of files to read.
Optional Parameters

Optional parameters mirror DuckDB's read_json function. To specify optional parameters, use parameter := 'value'.

Compatibility notes:

  • columns is not currently supported.

iceberg_scan(path TEXT, ...) -> SETOF duckdb.row

Reads an Iceberg table, either from a remote location (via httpfs) or a local directory.

To use iceberg_scan, you must enable the iceberg extension:

SELECT duckdb.install_extension('iceberg');

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM iceberg_scan('data/iceberg/table');
SELECT r['id'], r['name'] FROM iceberg_scan('data/iceberg/table') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM iceberg_scan('data/iceberg/table');

Further information:

Required Arguments
NameTypeDescription
pathtextThe path, either to a remote httpfs location or a local location (if enabled), of the Iceberg table to read.
Optional Arguments

Optional parameters mirror DuckDB's iceberg_scan function based on the DuckDB source code. However, documentation on these parameters is limited. To specify optional parameters, use parameter := 'value'.

NameTypeDefaultDescription
allowed_moved_pathsbooleanfalseEnsures that some path resolution is performed, which allows scanning Iceberg tables that are moved.
modetext''
metadata_compression_codectext'none'
skip_schema_inferencebooleanfalse
versiontext'version-hint.text'
version_name_formattext'v%s%s.metadata.json,%s%s.metadata.json'

iceberg_metadata(path TEXT, ...) -> SETOF iceberg_metadata_record

To use iceberg_metadata, you must enable the iceberg extension:

SELECT duckdb.install_extension('iceberg');

Return metadata about an iceberg table. Data is returned as a set of icerberg_metadata_record, which is defined as:

CREATE TYPE duckdb.iceberg_metadata_record AS (
  manifest_path TEXT,
  manifest_sequence_number NUMERIC,
  manifest_content TEXT,
  status TEXT,
  content TEXT,
  file_path TEXT
);

Further information:

Required Arguments
NameTypeDescription
pathtextThe path, either to a remote httpfs location or a local location (if enabled), of the Iceberg table to read.
Optional Arguments

Optional parameters mirror DuckDB's iceberg_metadata function based on the DuckDB source code. However, documentation on these parameters is limited. To specify optional parameters, use parameter := 'value'.

NameTypeDefaultDescription
allowed_moved_pathsbooleanfalseEnsures that some path resolution is performed, which allows scanning Iceberg tables that are moved.
metadata_compression_codectext'none'
skip_schema_inferencebooleanfalse
versiontext'version-hint.text'
version_name_formattext'v%s%s.metadata.json,%s%s.metadata.json'

iceberg_snapshots(path TEXT, ...) -> SETOF iceberg_snapshot_record

Reads Iceberg snapshot information from an Iceberg table.

To use iceberg_snapshots, you must enable the iceberg extension:

SELECT duckdb.install_extension('iceberg');

This function returns snapshot metadata for an Iceberg table, which can be useful for time travel queries and understanding table history.

SELECT * FROM iceberg_snapshots('data/iceberg/table');

Further information:

Required Arguments
NameTypeDescription
pathtextThe path, either to a remote httpfs location or a local location (if enabled), of the Iceberg table to read.
Optional Arguments

Optional parameters mirror DuckDB's iceberg_snapshots function. To specify optional parameters, use parameter := 'value'.

NameTypeDefaultDescription
metadata_compression_codectext'none'
skip_schema_inferencebooleanfalse
versiontext'version-hint.text'
version_name_formattext'v%s%s.metadata.json,%s%s.metadata.json'

delta_scan(path TEXT) -> SETOF duckdb.row

Reads a delta dataset, either from a remote (via httpfs) or a local location.

To use delta_scan, you must enable the delta extension:

SELECT duckdb.install_extension('delta');

This returns DuckDB rows, you can expand them using * or you can select specific columns using the r['mycol'] syntax. If you want to select specific columns you should give the function call an easy alias, like r. For example:

SELECT * FROM delta_scan('/path/to/delta/dataset');
SELECT r['id'], r['name'] FROM delta_scan('/path/to/delta/dataset') r WHERE r['age'] > 21;
SELECT COUNT(*) FROM delta_scan('/path/to/delta/dataset');

Further information:

Required Arguments
NameTypeDescription
pathtextThe path, either to a remote httpfs location or a local location (if enabled) of the delta dataset to read.

duckdb.install_extension(extension_name TEXT, repository TEXT DEFAULT 'core') -> bool

Installs a DuckDB extension and configures it to be loaded automatically in every session that uses pg_duckdb.

SELECT duckdb.install_extension('iceberg');
SELECT duckdb.install_extension('avro', 'community');
Security

Since this function can be used to install and download any extensions it can only be executed by a superuser by default. To allow execution by some other admin user, such as my_admin, you can grant such a user the following permissions:

GRANT ALL ON FUNCTION duckdb.install_extension(TEXT, TEXT) TO my_admin;
Required Arguments
NameTypeDescription
extension_nametextThe name of the extension to install

duckdb.load_extension(extension_name TEXT) -> void

Loads a DuckDB extension for the current session only. Unlike install_extension, this doesn't configure the extension to be loaded automatically in future sessions.

SELECT duckdb.load_extension('iceberg');
Required Arguments
NameTypeDescription
extension_nametextThe name of the extension to load

duckdb.autoload_extension(extension_name TEXT, autoload BOOLEAN) -> void

Configures whether an installed extension should be automatically loaded in new sessions.

-- Disable auto-loading for an extension
SELECT duckdb.autoload_extension('iceberg', false);

-- Enable auto-loading for an extension
SELECT duckdb.autoload_extension('iceberg', true);
Required Arguments
NameTypeDescription
extension_nametextThe name of the extension to configure
autoloadbooleanWhether the extension should be auto-loaded

duckdb.query(query TEXT) -> SETOF duckdb.row

Executes the given SELECT query directly against DuckDB. This can be useful if DuckDB syntax makes the query easier to write or if you want to use a function that is not exposed by pg_duckdb yet. If you use it because of a missing function in pg_duckdb, please also open an issue on the GitHub repository so that we can add support. For example the below query shows a query that puts FROM before SELECT and uses a list comprehension. Both of those features are not supported in Postgres.

SELECT * FROM duckdb.query('FROM range(10) as a(a) SELECT [a for i in generate_series(0, a)] as arr');

duckdb.raw_query(query TEXT) -> void

Runs an arbitrary query directly against DuckDB. Compared to duckdb.query, this function can execute any query, not just SELECT queries. The main downside is that it doesn't return its result as rows, but instead sends the query result to the logs. So the recommendation is to use duckdb.query when possible, but if you need to run e.g. some DDL you can use this function.

duckdb.recycle_ddb() -> void

pg_duckdb keeps the DuckDB instance open inbetween transactions. This is done to save session level state, such as manually done SET commands. If you want to clear this session level state for some reason you can close the currently open DuckDB instance using:

CALL duckdb.recycle_ddb();

duckdb.enable_motherduck(token TEXT, database_name TEXT) -> void

Enables MotherDuck integration with the provided authentication token.

-- Enable MotherDuck with default database
SELECT duckdb.enable_motherduck('your_token_here');

-- Enable MotherDuck with specific database
SELECT duckdb.enable_motherduck('your_token_here', 'my_database');
Required Arguments
NameTypeDescription
tokentextYour MotherDuck authentication token
Optional Arguments
NameTypeDescription
database_nametextSpecific MotherDuck database to connect to

duckdb.is_motherduck_enabled() -> boolean

Checks whether MotherDuck integration is currently enabled for this session.

SELECT duckdb.is_motherduck_enabled();

duckdb.create_simple_secret(type TEXT, key_id TEXT, secret TEXT, region TEXT, ...) -> void

Creates a simple secret for accessing cloud storage services like S3, GCS, or R2.

-- Create an S3 secret
SELECT duckdb.create_simple_secret(
    type := 'S3',
    key_id := 'your_access_key',
    secret := 'your_secret_key',
    region := 'us-east-1'
);

-- Create an S3 secret with session token
SELECT duckdb.create_simple_secret(
    type := 'S3',
    key_id := 'your_access_key',
    secret := 'your_secret_key',
    region := 'us-east-1',
    session_token := 'your_session_token'
);
Required Arguments
NameTypeDescription
typetextThe type of secret ('S3', 'GCS', 'R2', etc.)
key_idtextThe access key ID or equivalent
secrettextThe secret key or equivalent
regiontextThe region for the service
Optional Arguments
NameTypeDescription
session_tokentextSession token for temporary credentials
endpointtextCustom endpoint URL
url_styletextURL style ('vhost' or 'path')
use_ssltextWhether to use SSL ('true' or 'false')
scopetextScope for the secret (default: '')

duckdb.create_azure_secret(connection_string TEXT, scope TEXT DEFAULT '') -> TEXT

Creates an Azure secret using an Azure Blob Storage connection string.

-- Create an Azure secret
SELECT duckdb.create_azure_secret(
    'DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net'
);

-- Create an Azure secret with specific scope
SELECT duckdb.create_azure_secret(
    'DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net',
    'my_scope'
);
Required Arguments
NameTypeDescription
connection_stringtextThe Azure Blob Storage connection string
Optional Arguments
NameTypeDescription
scopetextScope for the secret (default: '')

duckdb.force_motherduck_sync(drop_with_cascade BOOLEAN DEFAULT false)

Warning: There are known issues with this function. To re-trigger a sync, it is recommended to use the following command instead:

SELECT * FROM pg_terminate_backend((
  SELECT pid FROM pg_stat_activity WHERE backend_type = 'pg_duckdb sync worker'
));

pg_duckdb will normally automatically synchronize your MotherDuck tables with Postgres using a Postgres background worker. Sometimes this synchronization fails. This can happen for various reasons, but often this is due to permission issues or users having created dependencies on MotherDuck tables that need to be updated. In those cases this function can be helpful for a few reasons:

  1. To show the ERRORs that happen during syncing
  2. To retrigger a sync after fixing the issue
  3. To drop the MotherDuck tables with CASCADE to drop all objects that depend on it.

For the first two usages you can simply call this procedure like follows:

CALL duckdb.force_motherduck_sync();

But for the third usage you need to run pass it the drop_with_cascade parameter:

CALL duckdb.force_motherduck_sync(drop_with_cascade := true);

NOTE: Dropping with cascade will drop all objects that depend on the MotherDuck tables. This includes all views, functions, and tables that depend on the MotherDuck tables. This can be a destructive operation, so use with caution.

time_bucket(bucket_width INTERVAL, timestamp_col TIMESTAMP, origin TIMESTAMP) -> TIMESTAMP

Buckets timestamps into time intervals for time-series analysis. This function is compatible with TimescaleDB's time_bucket function, allowing for easier migration and interoperability.

-- Group events by hour
SELECT time_bucket(INTERVAL '1 hour', created_at) as hour_bucket, COUNT(*)
FROM events
GROUP BY hour_bucket
ORDER BY hour_bucket;

-- Group by 15-minute intervals
SELECT time_bucket(INTERVAL '15 minutes', timestamp_col), AVG(value)
FROM sensor_data
WHERE timestamp_col >= '2024-01-01'
GROUP BY 1
ORDER BY 1;

Further information:

Required Arguments
NameTypeDescription
bucket_widthintervalThe interval size for bucketing (e.g., '1 hour', '15 minutes')
timestamp_coltimestampThe timestamp column to bucket
Optional Arguments
NameTypeDescription
origintimestampThe origin point for bucketing. Buckets are aligned to this timestamp.

Note: The time_bucket function also supports timezone and time offset parameters for more advanced time bucketing scenarios.

strftime(timestamp_expr, format_string) -> TEXT

Formats timestamps as strings using standard format codes. This function provides flexible timestamp formatting for display and export purposes.

-- Format current timestamp
SELECT strftime(NOW(), '%Y-%m-%d %H:%M:%S') AS formatted_time;

-- Format timestamps in different formats
SELECT
    order_id,
    strftime(created_at, '%Y-%m-%d') AS order_date,
    strftime(created_at, '%H:%M') AS order_time,
    strftime(created_at, '%A, %B %d, %Y') AS readable_date
FROM orders;

-- Use for partitioning file exports
COPY (SELECT * FROM events WHERE event_date = '2024-01-01')
TO 's3://bucket/events/' || strftime('2024-01-01'::timestamp, '%Y/%m/%d') || '/events.parquet';

Common format codes:

  • %Y - 4-digit year (2024)
  • %m - Month as number (01-12)
  • %d - Day of month (01-31)
  • %H - Hour (00-23)
  • %M - Minute (00-59)
  • %S - Second (00-59)
  • %A - Full weekday name (Monday)
  • %B - Full month name (January)

Further information:

Required Arguments
NameTypeDescription
timestamp_exprtimestampThe timestamp value to format
format_stringtextThe format string with format codes

strptime(string_expr, format_string) -> TIMESTAMP

Parses strings into timestamps using format codes. This is the inverse of strftime and is useful for parsing timestamps from various string formats.

-- Parse date strings
SELECT strptime('2024-01-15 14:30:00', '%Y-%m-%d %H:%M:%S') AS parsed_timestamp;

-- Parse different formats
SELECT
    strptime('Jan 15, 2024', '%b %d, %Y') AS date1,
    strptime('15/01/2024', '%d/%m/%Y') AS date2,
    strptime('2024-01-15T14:30:00Z', '%Y-%m-%dT%H:%M:%SZ') AS iso_date;

-- Parse log timestamps
SELECT
    log_id,
    strptime(timestamp_string, '%Y-%m-%d %H:%M:%S') AS parsed_time,
    message
FROM raw_logs;

Further information:

Required Arguments
NameTypeDescription
string_exprtextThe string to parse as a timestamp
format_stringtextThe format string describing the input format

epoch(timestamp_expr) -> BIGINT

Converts timestamps to Unix epoch seconds (seconds since 1970-01-01 00:00:00 UTC).

-- Get current epoch time
SELECT epoch(NOW()) AS current_epoch;

-- Convert timestamps for API usage
SELECT
    event_id,
    epoch(event_timestamp) AS epoch_seconds
FROM events;

-- Filter using epoch time
SELECT * FROM events
WHERE epoch(created_at) > 1640995200; -- After 2022-01-01
Required Arguments
NameTypeDescription
timestamp_exprtimestampThe timestamp to convert to epoch seconds

map_extract(map_col duckdb.map, key duckdb.unresolved_type) -> duckdb.unresolved_type

Extracts a value from a map using the specified key. If the key doesn't exist, returns an empty array.

-- Extract value from a map
SELECT map_extract(r['map_col'], 'a') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {1}

-- Extract non-existent key
SELECT map_extract(r['map_col'], 'c') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {}
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to extract from
keyduckdb.unresolved_typeThe key to look up in the map

map_keys(map_col duckdb.map) -> duckdb.unresolved_type

Returns all keys from a map as an array.

-- Get all keys from a map
SELECT map_keys(r['map_col']) as keys 
FROM duckdb.query($$ SELECT MAP(['a', 'b', 'c'], [1, 2, 3]) as map_col $$) r;
-- Returns: {a,b,c}

-- Empty map
SELECT map_keys(r['map_col']) as keys 
FROM duckdb.query($$ SELECT MAP([], []) as map_col $$) r;
-- Returns: {}
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to extract keys from

map_values(map_col duckdb.map) -> duckdb.unresolved_type

Returns all values from a map as an array.

-- Get all values from a map
SELECT map_values(r['map_col']) as values 
FROM duckdb.query($$ SELECT MAP(['a', 'b', 'c'], [1, 2, 3]) as map_col $$) r;
-- Returns: {1,2,3}

-- Empty map
SELECT map_values(r['map_col']) as values 
FROM duckdb.query($$ SELECT MAP([], []) as map_col $$) r;
-- Returns: {}
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to extract values from

cardinality(map_col duckdb.map) -> numeric

Returns the size of the map (number of key-value pairs).

-- Get the number of entries in a map
SELECT cardinality(r['map_col']) as size 
FROM duckdb.query($$ SELECT MAP(['a', 'b', 'c'], [1, 2, 3]) as map_col $$) r;
-- Returns: 3

-- Empty map
SELECT cardinality(r['map_col']) as size 
FROM duckdb.query($$ SELECT MAP([], []) as map_col $$) r;
-- Returns: 0
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to get the size of

element_at(map_col duckdb.map, key duckdb.unresolved_type) -> duckdb.unresolved_type

Returns the value for a given key as an array.

-- Get value for a specific key
SELECT element_at(r['map_col'], 'a') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {1}

-- Non-existent key
SELECT element_at(r['map_col'], 'c') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {}
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to extract from
keyduckdb.unresolved_typeThe key to look up in the map

map_concat(map_col duckdb.map, map_col2 duckdb.map) -> duckdb.map

Merges multiple maps. On key collision, the value is taken from the last map.

-- Merge two maps
SELECT map_concat(r1['map1'], r2['map2']) as merged 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map1 $$) r1, 
     duckdb.query($$ SELECT MAP(['b', 'c'], [3, 4]) as map2 $$) r2;
-- Returns: {a=1, b=3, c=4}

-- Note: 'b' value from map2 (3) overwrites map1's value (2)
Required Arguments
NameTypeDescription
map_colduckdb.mapThe first map
map_col2duckdb.mapThe second map to merge

map_contains(map_col duckdb.map, key duckdb.unresolved_type) -> boolean

Checks if a map contains a given key.

-- Check if key exists
SELECT map_contains(r['map_col'], 'a') as has_key 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: t (true)

-- Check for non-existent key
SELECT map_contains(r['map_col'], 'c') as has_key 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: f (false)
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to check
keyduckdb.unresolved_typeThe key to search for

map_contains_entry(map_col duckdb.map, key duckdb.unresolved_type, value duckdb.unresolved_type) -> boolean

Checks if a map contains a given key-value pair.

-- Check if key-value pair exists
SELECT map_contains_entry(r['map_col'], 'a', 1) as has_entry 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: t (true)

-- Check with wrong value for existing key
SELECT map_contains_entry(r['map_col'], 'a', 2) as has_entry 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: f (false)
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to check
keyduckdb.unresolved_typeThe key to search for
valueduckdb.unresolved_typeThe value to match with the key

map_contains_value(map_col duckdb.map, value duckdb.unresolved_type) -> boolean

Checks if a map contains a given value.

-- Check if value exists
SELECT map_contains_value(r['map_col'], 1) as has_value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: t (true)

-- Check for non-existent value
SELECT map_contains_value(r['map_col'], 3) as has_value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: f (false)
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to check
valueduckdb.unresolved_typeThe value to search for

map_entries(map_col duckdb.map) -> duckdb.struct[]

Returns an array of struct(key, value) for each key-value pair in the map.

-- Get all key-value pairs as structs
SELECT map_entries(r['map_col']) as entries 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: {"(a,1)","(b,2)"}

-- Access individual struct fields
SELECT unnest(map_entries(r['map_col'])) as entry 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to extract entries from

map_extract_value(map_col duckdb.map, key duckdb.unresolved_type) -> duckdb.unresolved_type

Returns the value for a given key or NULL if the key is not contained in the map.

-- Extract single value (not as array)
SELECT map_extract_value(r['map_col'], 'a') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: 1

-- Non-existent key returns NULL
SELECT map_extract_value(r['map_col'], 'c') as value 
FROM duckdb.query($$ SELECT MAP(['a', 'b'], [1, 2]) as map_col $$) r;
-- Returns: NULL
Required Arguments
NameTypeDescription
map_colduckdb.mapThe map to extract from
keyduckdb.unresolved_typeThe key to look up in the map

map_from_entries(entries duckdb.struct[]) -> duckdb.map

Creates a map from an array of struct(k, v).

-- Create map from array of structs
SELECT map_from_entries(r['entries']) as new_map 
FROM duckdb.query($$ 
    SELECT [{'k': 'a', 'v': 1}, {'k': 'b', 'v': 2}] as entries 
$$) r;
-- Returns: {a=1, b=2}

-- This is the inverse operation of map_entries
SELECT map_from_entries(map_entries(r['map_col'])) as reconstructed 
FROM duckdb.query($$ SELECT MAP(['x', 'y'], [10, 20]) as map_col $$) r;
-- Returns: {x=10, y=20}
Required Arguments
NameTypeDescription
entriesduckdb.struct[]Array of structs with 'k' (key) and 'v' (value) fields

epoch_ms(timestamp_expr) -> BIGINT

Converts timestamps to Unix epoch milliseconds.

-- High-precision timestamp for JavaScript
SELECT epoch_ms(NOW()) AS timestamp_ms;

-- For time-series data
SELECT
    sensor_id,
    epoch_ms(reading_time) AS timestamp_ms,
    value
FROM sensor_readings;
Required Arguments
NameTypeDescription
timestamp_exprtimestampThe timestamp to convert to epoch milliseconds

epoch_ms(milliseconds) -> TIMESTAMP

Converts Unix epoch milliseconds to a timestamp. This is the inverse of the above function.

-- Convert epoch milliseconds to timestamp
SELECT epoch_ms(1640995200000) AS timestamp_from_ms; -- 2022-01-01 00:00:00

-- Convert stored milliseconds back to timestamps
SELECT
    event_id,
    epoch_ms(timestamp_ms) AS event_time
FROM events;
Required Arguments
NameTypeDescription
millisecondsbigintMilliseconds since Unix epoch

epoch_us(timestamp_expr) -> BIGINT

Converts timestamps to Unix epoch microseconds.

-- Microsecond precision timestamps
SELECT epoch_us(NOW()) AS timestamp_us;
Required Arguments
NameTypeDescription
timestamp_exprtimestampThe timestamp to convert to epoch microseconds

epoch_ns(timestamp_expr) -> BIGINT

Converts timestamps to Unix epoch nanoseconds.

-- Nanosecond precision timestamps
SELECT epoch_ns(NOW()) AS timestamp_ns;
Required Arguments
NameTypeDescription
timestamp_exprtimestampThe timestamp to convert to epoch nanoseconds

make_timestamp(microseconds) -> TIMESTAMP

Creates a timestamp from microseconds since Unix epoch (1970-01-01 00:00:00 UTC).

-- Create timestamp from current epoch microseconds
SELECT make_timestamp(epoch_us(NOW())) AS reconstructed_timestamp;

-- Create specific timestamps
SELECT make_timestamp(1640995200000000) AS new_years_2022; -- 2022-01-01 00:00:00
Required Arguments
NameTypeDescription
microsecondsbigintMicroseconds since Unix epoch

make_timestamptz(microseconds) -> TIMESTAMPTZ

Creates a timestamp with timezone from microseconds since Unix epoch.

-- Create timestamptz from current epoch microseconds
SELECT make_timestamptz(epoch_us(NOW())) AS reconstructed_timestamptz;

-- Create specific timestamptz
SELECT make_timestamptz(1640995200000000) AS new_years_2022_tz;
Required Arguments
NameTypeDescription
microsecondsbigintMicroseconds since Unix epoch

TABLESAMPLE (sampling_method(percentage | rows))

Samples a subset of rows from a table or query result. This is useful for analyzing large datasets by working with representative samples, improving query performance for exploratory data analysis.

-- Sample 10% of rows from a table
SELECT * FROM large_table TABLESAMPLE SYSTEM(10);

-- Sample approximately 1000 rows
SELECT * FROM events TABLESAMPLE SYSTEM(1000 ROWS);

-- Sample from data lake files
SELECT * FROM read_parquet('s3://datalake/**/*.parquet') TABLESAMPLE SYSTEM(5);

-- Use sampling for quick data profiling
SELECT
    region,
    COUNT(*) as sample_count,
    AVG(revenue) as avg_revenue
FROM sales_data TABLESAMPLE SYSTEM(2)
GROUP BY region;

-- Sample from joins for performance
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
JOIN orders o TABLESAMPLE SYSTEM(10) ON c.id = o.customer_id
GROUP BY c.name;

Sampling Methods:

  • SYSTEM: Random sampling at the storage level (faster, approximate percentage)
  • BERNOULLI: Row-by-row random sampling (slower, exact percentage)
-- System sampling (recommended for large tables)
SELECT * FROM huge_table TABLESAMPLE SYSTEM(1);

-- Bernoulli sampling (exact percentage)
SELECT * FROM medium_table TABLESAMPLE BERNOULLI(5);

Use Cases:

  • Data exploration: Quick analysis of large datasets
  • Performance testing: Test queries on sample data
  • Data profiling: Understand data distribution patterns
  • ETL development: Develop pipelines on sample data
  • Quality checks: Validate data quality on samples

Further information:

Required Arguments
NameTypeDescription
sampling_methodkeywordEither SYSTEM or BERNOULLI
percentagenumericPercentage of rows to sample (0-100)
Optional Arguments
NameTypeDescription
rowsintegerApproximate number of rows to sample (use with ROWS keyword)

union_extract(union_col, tag) -> duckdb.unresolved_type

Extracts a value from a union type by specifying the tag name of the member you want to access.

-- Extract the string value if the union contains a string
SELECT union_extract(my_union_column, 'string') FROM my_table;

-- Extract integer value from union
SELECT union_extract(data_field, 'integer') AS extracted_int FROM mixed_data;
Required Arguments
NameTypeDescription
union_colduckdb.union or duckdb.unresolved_typeThe union column to extract from
tagtextThe tag name of the union member to extract

union_tag(union_col) -> duckdb.unresolved_type

Returns the tag name of the currently active member in a union type.

-- Get the active tag for each row
SELECT union_tag(my_union_column) AS active_type FROM my_table;

-- Filter rows based on union tag
SELECT * FROM my_table WHERE union_tag(data_field) = 'string';
Required Arguments
NameTypeDescription
union_colduckdb.union or duckdb.unresolved_typeThe union column to get the tag from

approx_count_distinct(expression) -> BIGINT

Approximates the count of distinct elements using the HyperLogLog algorithm. This is much faster than COUNT(DISTINCT ...) for large datasets, with a small error rate.

-- Approximate distinct count of customer IDs
SELECT approx_count_distinct(customer_id) FROM orders;

-- Compare with exact count
SELECT
    approx_count_distinct(customer_id) AS approx_distinct,
    COUNT(DISTINCT customer_id) AS exact_distinct
FROM orders;
Required Arguments
NameTypeDescription
expressionanyThe expression to count distinct values for