dbt-profiler

February 26, 2026 · View on GitHub

CircleCI License dbt Hub dbt Fusion compatible

dbt-profiler

Add data profiling to your dbt project. Get null rates, distinct counts, min/max and more for any model or source.

Contents

Installation

dbt-profiler requires dbt >=1.10.0. Check dbt Hub for the latest installation instructions.

Supported adapters

AdapterSupported
AWS Athena
BigQuery
Databricks
PostgreSQL
Redshift
Snowflake
Oracle
SQL Server
Apache Spark
Presto

dbt-profiler may work with unsupported adapters but they haven't been tested. If you've used dbt-profiler with an unsupported adapter, feedback is very welcome: open an issue, a PR, or reach out in the #tools-dbt-profiler channel on dbt Slack.

Quick start

Profile a relation and print the result to stdout:

dbt run-operation print_profile --args '{"relation_name": "customers"}'

Use a relation profile as a dbt model:

-- models/customers_profile.sql
{{ dbt_profiler.get_profile(relation=ref("customers")) }}

Generate a schema.yml skeleton with profile data embedded in column meta properties:

dbt run-operation print_profile_schema --args '{"relation_name": "customers"}'

Profile measures

A calculated profile contains the following measures for each column:

MeasureDescriptionColumns
column_nameName of the columnall
data_typeData type of the columnall
not_null_proportionProportion of non-NULL values (e.g. 0.62 = 62% populated)all
distinct_proportionProportion of unique valuesall
distinct_countCount of unique valuesall
is_uniquetrue if all values are uniqueall
min *†Minimum valuenumeric, date, time
max *†Maximum valuenumeric, date, time
avg **†Average valuenumeric
median **†Median valuenumeric
std_dev_population **†Population standard deviationnumeric
std_dev_sample **†Sample standard deviationnumeric
profiled_atTimestamp when the profile was calculatedall

* numeric, date and time columns only ** numeric columns only † can be excluded using the exclude_measures argument

Macros

get_profile (source)

Returns a relation profile as a SQL query that can be used in a dbt model. Handy for previewing profiles in dbt Cloud.

Arguments

ArgumentRequiredDefaultDescription
relationyesRelation object
exclude_measuresno[]List of measures to exclude from the profile
include_columnsno[] (all)Columns to include. Cannot be used together with exclude_columns.
exclude_columnsno[]Columns to exclude. Cannot be used together with include_columns.
where_clausenoSQL WHERE clause to filter records before profiling
group_byno[]SQL GROUP BY columns to aggregate data before profiling

Usage

Use with ref():

{{ dbt_profiler.get_profile(relation=ref("customers"), where_clause="is_active = true") }}

Use with source():

{{ dbt_profiler.get_profile(relation=source("jaffle_shop","customers"), exclude_measures=["std_dev_population", "std_dev_sample"]) }}

To run only in execute mode:

-- depends_on: {{ ref("customers") }}
{% if execute %}
    {{ dbt_profiler.get_profile(relation=ref("customers")) }}
{% endif %}

get_profile_table (source)

Returns a relation profile as an agate.Table. Does not print anything to stdout; intended to be called from another macro or model, not as a standalone operation.

Arguments

ArgumentRequiredDefaultDescription
relationeither relation or relation_nameRelation object
relation_nameeither relation or relation_nameRelation name
schemanotarget schemaSchema where relation_name exists
databasenotarget databaseDatabase where relation_name exists
exclude_measuresno[]List of measures to exclude from the profile
include_columnsno[] (all)Columns to include. Cannot be used together with exclude_columns.
exclude_columnsno[]Columns to exclude. Cannot be used together with include_columns.
where_clausenoSQL WHERE clause to filter records before profiling

Usage

{% set table = dbt_profiler.get_profile_table(relation_name="customers") %}

Does not work in dbt Cloud. The profile doesn't display in the cloud console log because the underlying print_table() method is disabled.

Prints a relation profile as a Markdown table to stdout.

Arguments

ArgumentRequiredDefaultDescription
relationeither relation or relation_nameRelation object
relation_nameeither relation or relation_nameRelation name
schemanotarget schemaSchema where relation_name exists
databasenotarget databaseDatabase where relation_name exists
exclude_measuresno[]List of measures to exclude from the profile
include_columnsno[] (all)Columns to include. Cannot be used together with exclude_columns.
exclude_columnsno[]Columns to exclude. Cannot be used together with include_columns.
max_rowsnonone (not truncated)Maximum number of rows to display
max_columnsno7Maximum number of columns to display
max_column_widthno30Truncate all columns to at most this width
max_precisionnonone (not limited)Maximum precision for number types
where_clausenoSQL WHERE clause to filter records before profiling

Usage

dbt run-operation print_profile --args '{"relation_name": "customers"}'

For dbt Cloud, an alternative that logs to the console (without Markdown formatting):

{% set profile = dbt_profiler.get_profile(relation=ref("customers")) %}
{% for row in profile.rows %}
  {% do log(row.values(), info=True) %}
{% endfor %}

Example output

column_namedata_typenot_null_proportiondistinct_proportiondistinct_countis_uniqueminmaxavgstd_dev_populationstd_dev_sampleprofiled_at
customer_idint641.001.001001110050.5028.8729.012022-01-13 10:14:48+00
first_orderdate0.620.464602018-01-012018-04-072022-01-13 10:14:48+00
most_recent_orderdate0.620.525202018-01-092018-04-092022-01-13 10:14:48+00
number_of_ordersint640.620.0440151.600.770.782022-01-13 10:14:48+00
customer_lifetime_valuefloat640.620.3535019926.9718.6618.812022-01-13 10:14:48+00

Prints a schema.yml to stdout with all columns and their profile data embedded as meta properties.

Arguments

ArgumentRequiredDefaultDescription
relationeither relation or relation_nameRelation object
relation_nameeither relation or relation_nameRelation name
schemanotarget schemaSchema where relation_name exists
databasenotarget databaseDatabase where relation_name exists
exclude_measuresno[]List of measures to exclude from the profile
include_columnsno[] (all)Columns to include. Cannot be used together with exclude_columns.
exclude_columnsno[]Columns to exclude. Cannot be used together with include_columns.
model_descriptionno""Model description to include in the schema
column_descriptionno""Column description to include for each column
where_clausenoSQL WHERE clause to filter records before profiling

Usage

dbt run-operation print_profile_schema --args '{"relation_name": "customers"}'

Example output

version: 2
models:
- name: customers
  description: ''
  columns:
  - name: customer_id
    description: ''
    meta:
      data_type: int64
      row_count: 100.0
      not_null_proportion: 1.0
      distinct_proportion: 1.0
      distinct_count: 100.0
      is_unique: 1.0
      min: '1'
      max: '100'
      avg: 50.5
      std_dev_population: 28.86607004772212
      std_dev_sample: 29.01149197588202
      profiled_at: '2022-01-13 10:08:18.446822+00'
  - name: first_order
    description: ''
    meta:
      data_type: date
      row_count: 100.0
      not_null_proportion: 0.62
      distinct_proportion: 0.46
      distinct_count: 46.0
      is_unique: 0.0
      min: '2018-01-01'
      max: '2018-04-07'
      avg: null
      std_dev_population: null
      std_dev_sample: null
      profiled_at: '2022-01-13 10:08:18.446822+00'
  # ... remaining columns

This is what the profile looks like in the dbt docs UI:

dbt docs example


Does not work in dbt Cloud. The profile doesn't display in the cloud console log because the underlying print_table() method is disabled.

Prints a relation profile as a Markdown table wrapped in a Jinja docs block to stdout. Intended to be used as part of the dbt docs workflow.

Arguments

ArgumentRequiredDefaultDescription
relationeither relation or relation_nameRelation object
relation_nameeither relation or relation_nameRelation name
schemanotarget schemaSchema where relation_name exists
databasenotarget databaseDatabase where relation_name exists
exclude_measuresno[]List of measures to exclude from the profile
include_columnsno[] (all)Columns to include. Cannot be used together with exclude_columns.
exclude_columnsno[]Columns to exclude. Cannot be used together with include_columns.
docs_namenodbt_profiler__{{ relation_name }}Name of the generated docs block
max_rowsnonone (not truncated)Maximum number of rows to display
max_columnsno7Maximum number of columns to display
max_column_widthno30Truncate all columns to at most this width
max_precisionnonone (not limited)Maximum precision for number types
where_clausenoSQL WHERE clause to filter records before profiling

Usage

dbt run-operation print_profile_docs --args '{"relation_name": "customers"}'

Example output

{% docs dbt_profiler__customers %}
| column_name             | data_type | not_null_proportion | distinct_proportion | distinct_count | is_unique | min        | max        |
| ----------------------- | --------- | ------------------- | ------------------- | -------------- | --------- | ---------- | ---------- |
| customer_id             | int64     |                1.00 |                1.00 |            100 |         1 | 1          | 100        |
| first_order             | date      |                0.62 |                0.46 |             46 |         0 | 2018-01-01 | 2018-04-07 |
| most_recent_order       | date      |                0.62 |                0.52 |             52 |         0 | 2018-01-09 | 2018-04-09 |
| number_of_orders        | int64     |                0.62 |                0.04 |              4 |         0 | 1          | 5          |
| customer_lifetime_value | float64   |                0.62 |                0.35 |             35 |         0 | 1          | 99         |
{% enddocs %}

Using profiles in dbt docs

There are two ways to embed profiles in dbt docs: via meta properties (see print_profile_schema) or via doc blocks. The doc block approach is recommended because it keeps profile data out of schema.yml and lets profiles be updated independently.

Setup

1. Add a docs folder to dbt_project.yml:

model-paths: ["models", "docs"]

2. Run print_profile_docs and save the output to a file:

# docs/dbt_profiler/customers.md
dbt run-operation print_profile_docs --args '{"relation_name": "customers"}'

Note: store the output in a variable before redirecting to a file. Piping directly (e.g. dbt run-operation ... > customers.md) will empty the file before dbt compiles the project, causing an error if the doc block is already referenced. See the example update-relation-profile.sh script.

3. Reference the doc block in your model description:

version: 2

models:
  - name: customers
    description: |
      Represents a customer.

      `dbt-profiler` results:

      {{ doc("dbt_profiler__customers") }}
    columns:
      - name: customer_id
        tests:
          - not_null
          - unique

Automating profile updates with CI

The doc block approach makes it straightforward to keep profiles up to date via a scheduled CI job:

  1. List the models to profile (e.g. dbt list --output name -m ${node_selection})
  2. For each model, run print_profile_docs and write the output to docs/dbt_profiler/${relation_name}.md
  3. Open a pull request for the updated profiles (e.g. using the create-pull-request GitHub Action)

Contributing

Contributions are welcome! Please read CONTRIBUTING.md before opening a pull request. For significant changes, open an issue first to discuss the approach.

You can also reach the maintainers in the #tools-dbt-profiler channel on dbt Slack.

License

Apache License 2.0