Snowflake Setup Guide

January 28, 2026 · View on GitHub

This guide covers how to deploy and use the Parallel Snowflake integration for data enrichment.

Architecture

Snowflake SQL Query


TABLE(parallel_enrich(...) OVER (PARTITION BY 1))


Python UDTF with end_partition() batching


Single Parallel Task Group API call (all rows batched)


(input VARIANT, enriched VARIANT) results

The integration uses a User Defined Table Function (UDTF) with end_partition() to batch all rows in a partition into a single API call. This is much faster than row-by-row processing.

Prerequisites

  1. Snowflake Account - Paid account required (trial accounts don't support External Access)
  2. ACCOUNTADMIN Role - Required for creating integrations (see Manual Deployment if you don't have this)
  3. MFA Setup - If your account requires MFA, you'll need an authenticator app configured
  4. Python 3.12+ with parallel-web-tools[snowflake] installed
  5. Parallel API Key from platform.parallel.ai

How It Works

The Snowflake integration uses a UDTF (User Defined Table Function) with batching:

parallel_enrich() UDTF

    ├── process(): Collects each row
    ├── end_partition(): Batches all rows together

    └── Calls: enrich_batch() with ALL rows at once

                    └── Single Task Group with N runs

Performance:

  • PARTITION BY 1 → all rows in one batch → 1 API call
  • Without partition → each row separate → N API calls (slow)

Finding Your Account Identifier

Your Snowflake account identifier is in your Snowsight URL:

https://app.snowflake.com/ORGNAME/ACCOUNTNAME/worksheets
                         ↑       ↑
                         └───┬───┘
                     Account: ORGNAME-ACCOUNTNAME

Examples:

  • URL: https://app.snowflake.com/myorg/myaccount/ → Account: myorg-myaccount
  • URL: https://app.snowflake.com/us-east-1/xy12345/ → Account: xy12345.us-east-1

You can also run this SQL in Snowsight:

SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME() AS account_identifier;

Installation

pip install parallel-web-tools[snowflake]

Or with uv:

uv add "parallel-web-tools[snowflake]"

Note: The standalone parallel-cli binary does not include deployment commands. You must install via pip/uv with the [snowflake] extra to use parallel-cli enrich deploy --system snowflake.

Quick Start - CLI Deployment

The easiest way to deploy is using the CLI:

Basic (Password Auth)

parallel-cli enrich deploy --system snowflake \
    --account ORGNAME-ACCOUNTNAME \
    --user your-username \
    --password "your-password" \
    --warehouse COMPUTE_WH

With MFA

If your account requires MFA, you need to:

  1. Set up an authenticator app (Google Authenticator, Duo, etc.) in Snowsight:

    • Go to your profile → Security → Multi-factor Authentication
    • Enroll your authenticator app
  2. Run with the passcode flag:

parallel-cli enrich deploy --system snowflake \
    --account ORGNAME-ACCOUNTNAME \
    --user your-username \
    --password "your-password" \
    --authenticator username_password_mfa \
    --passcode 123456 \
    --warehouse COMPUTE_WH

Replace 123456 with the current 6-digit code from your authenticator app. Run the command immediately after getting a fresh code (they expire quickly).

Environment Variables

To avoid passing sensitive values on the command line:

# Use single quotes if password contains special characters like !
export SNOWFLAKE_PASSWORD='your!password'
export PARALLEL_API_KEY='your-api-key'

parallel-cli enrich deploy --system snowflake \
    --account ORGNAME-ACCOUNTNAME \
    --user your-username \
    --password "$SNOWFLAKE_PASSWORD" \
    --authenticator username_password_mfa \
    --passcode 123456 \
    --warehouse COMPUTE_WH

CLI Options

OptionDefaultDescription
--accountrequiredSnowflake account identifier
--userrequiredSnowflake username
--password-Snowflake password
--warehouseCOMPUTE_WHWarehouse to use
--roleACCOUNTADMINRole for deployment
--authenticatorexternalbrowserAuth method
--passcode-MFA code from authenticator app
--api-keyenv varParallel API key

Quick Start - Python Deployment

from parallel_web_tools.integrations.snowflake import deploy_parallel_functions

deploy_parallel_functions(
    account="orgname-accountname",
    user="your-user",
    password="your-password",
    parallel_api_key="your-parallel-api-key",
    # For MFA:
    authenticator="username_password_mfa",
    passcode="123456",
)

This creates:

  • Database: PARALLEL_INTEGRATION
  • Schema: ENRICHMENT
  • Network rule for api.parallel.ai
  • Secret with your API key
  • External access integration
  • parallel_enrich() UDTF (batched table function)
  • Roles: PARALLEL_DEVELOPER and PARALLEL_USER

Manual SQL Deployment (For Admins)

If you don't have ACCOUNTADMIN access, ask your Snowflake admin to run the setup SQL.

Generate SQL for Admin

# Generate the SQL scripts
python -c "
from parallel_web_tools.integrations.snowflake import get_setup_sql, get_udf_sql

print('=== SETUP SQL (run first) ===')
print(get_setup_sql('YOUR_PARALLEL_API_KEY'))
print()
print('=== UDF SQL (run second) ===')
print(get_udf_sql())
"

Replace YOUR_PARALLEL_API_KEY with your actual API key, then send the output to your admin.

What Admin Needs to Run

  1. Run as ACCOUNTADMIN - Required for External Access Integration
  2. Execute setup SQL - Creates network rule, secret, integration
  3. Execute UDF SQL - Creates the parallel_enrich() function

After admin completes setup, users with PARALLEL_USER role can use the function.

SQL Usage

The function is a table function (UDTF) that requires PARTITION BY for batching.

Basic Enrichment

WITH companies AS (
    SELECT * FROM (VALUES
        ('Google', 'google.com'),
        ('Anthropic', 'anthropic.com'),
        ('Apple', 'apple.com')
    ) AS t(company_name, website)
)
SELECT
    e.input:company_name::STRING AS company_name,
    e.input:website::STRING AS website,
    e.enriched:ceo_name::STRING AS ceo_name,
    e.enriched:founding_year::STRING AS founding_year
FROM companies t,
     TABLE(PARALLEL_INTEGRATION.ENRICHMENT.parallel_enrich(
         TO_JSON(OBJECT_CONSTRUCT('company_name', t.company_name, 'website', t.website)),
         ARRAY_CONSTRUCT('CEO name', 'Founding year')
     ) OVER (PARTITION BY 1)) e;

Sample output:

company_namewebsiteceo_namefounding_year
Googlegoogle.comSundar Pichai1998
Anthropicanthropic.comDario Amodei2021
Appleapple.comTim Cook1976

Key points:

  • TO_JSON(OBJECT_CONSTRUCT(...)) creates the input
  • PARTITION BY 1 batches all rows into single API call
  • Returns input (original data) and enriched (results)

The raw enriched VARIANT column contains:

{
  "ceo_name": "Sundar Pichai",
  "founding_year": "1998",
  "basis": [{"field": "ceo_name", "citations": [...], "confidence": "high"}]
}

Custom Processor

SELECT
    e.input:company_name::STRING AS company_name,
    e.enriched:ceo_name::STRING AS ceo_name
FROM companies t,
     TABLE(PARALLEL_INTEGRATION.ENRICHMENT.parallel_enrich(
         TO_JSON(OBJECT_CONSTRUCT('company_name', t.company_name)),
         ARRAY_CONSTRUCT('CEO name'),
         'base-fast'  -- processor option
     ) OVER (PARTITION BY 1)) e;

Save Results to Table

CREATE TABLE enriched_companies AS
SELECT
    e.input:company_name::STRING AS company_name,
    e.input:website::STRING AS website,
    e.enriched:ceo_name::STRING AS ceo_name,
    e.enriched:founding_year::STRING AS founding_year
FROM companies t,
     TABLE(PARALLEL_INTEGRATION.ENRICHMENT.parallel_enrich(
         TO_JSON(OBJECT_CONSTRUCT('company_name', t.company_name, 'website', t.website)),
         ARRAY_CONSTRUCT('CEO name', 'Founding year')
     ) OVER (PARTITION BY 1)) e;

Troubleshooting

"External access is not supported for trial accounts"

Snowflake trial accounts cannot make external HTTP calls. You need to upgrade to a paid Snowflake account (Standard edition or above).

"Failed to connect... SAML Identity Provider"

SSO/SAML isn't configured for your account. Use password authentication instead:

parallel-cli enrich deploy --system snowflake \
    --account your-account \
    --user your-user \
    --password "your-password" \
    ...

"Multi-factor authentication is required"

Your account requires MFA. Set up an authenticator app in Snowsight, then use:

parallel-cli enrich deploy --system snowflake \
    --authenticator username_password_mfa \
    --passcode 123456 \
    ...

"Role 'ACCOUNTADMIN' is not granted to this user"

You don't have ACCOUNTADMIN. Try a different role you have access to:

parallel-cli enrich deploy --system snowflake \
    --role SYSADMIN \
    ...

Or check your roles:

SHOW GRANTS TO USER your_username;

"Insufficient privileges to operate on account"

ACCOUNTADMIN is required for creating External Access Integrations. Either:

  1. Get ACCOUNTADMIN access from your Snowflake admin
  2. Have an admin run the SQL manually (see Manual SQL Deployment)

"Integration does not exist or not authorized"

The External Access Integration wasn't created. This usually means:

  • You don't have ACCOUNTADMIN privileges
  • The setup SQL didn't complete successfully

Re-run with ACCOUNTADMIN role or have an admin run the setup.

"Package 'parallel-web-tools' not found" or PyPI errors

The UDF uses parallel-web-tools from PyPI. Ensure:

  1. The SNOWFLAKE.PYPI_REPOSITORY_USER role is granted (setup SQL does this automatically)
  2. Your Snowflake account has PyPI repository access enabled
-- Verify PyPI access
GRANT DATABASE ROLE SNOWFLAKE.PYPI_REPOSITORY_USER TO ROLE your_role;

Authentication Pop-ups

If you see repeated authentication prompts, install keyring support:

pip install "snowflake-connector-python[secure-local-storage]"

Timeout Errors

For very large batches, the 30-minute timeout may not be enough. Consider:

  • Using lite-fast processor for faster results
  • Using PARTITION BY to split into smaller batches
  • Processing fewer rows per partition

API Reference

deploy_parallel_functions()

def deploy_parallel_functions(
    account: str,
    user: str,
    password: str | None = None,
    warehouse: str = "COMPUTE_WH",
    database: str = "PARALLEL_INTEGRATION",
    schema: str = "ENRICHMENT",
    role: str = "ACCOUNTADMIN",
    parallel_api_key: str | None = None,
    authenticator: str | None = None,
    passcode: str | None = None,
    force: bool = False,
) -> None

Parameters:

ParameterTypeDefaultDescription
accountstrrequiredSnowflake account identifier
userstrrequiredSnowflake username
passwordstr | NoneNonePassword (or use authenticator)
warehousestr"COMPUTE_WH"Warehouse to use
databasestr"PARALLEL_INTEGRATION"Database to create
schemastr"ENRICHMENT"Schema to create
rolestr"ACCOUNTADMIN"Role for deployment
parallel_api_keystr | NoneNoneAPI key (uses env var if not provided)
authenticatorstr | NoneNoneAuth method (e.g., "username_password_mfa")
passcodestr | NoneNoneMFA code from authenticator app
forceboolFalseSkip confirmation for existing resources

cleanup_parallel_functions()

def cleanup_parallel_functions(
    account: str,
    user: str,
    password: str | None = None,
    warehouse: str = "COMPUTE_WH",
    role: str = "ACCOUNTADMIN",
    authenticator: str | None = None,
) -> None

Removes all Parallel integration objects from Snowflake.

SQL Function: parallel_enrich()

-- Version 1: Default processor (lite-fast)
TABLE(parallel_enrich(input_json VARCHAR, output_columns ARRAY))
    RETURNS TABLE (input VARIANT, enriched VARIANT)

-- Version 2: Custom processor
TABLE(parallel_enrich(input_json VARCHAR, output_columns ARRAY, processor VARCHAR))
    RETURNS TABLE (input VARIANT, enriched VARIANT)

Parameters:

ParameterTypeDescription
input_jsonVARCHARJSON string via TO_JSON(OBJECT_CONSTRUCT(...))
output_columnsARRAYArray of output column descriptions
processorVARCHARProcessor to use (optional, default: lite-fast)

Returns: Table with input (original data) and enriched (results) VARIANT columns

Usage: Must use with OVER (PARTITION BY ...) for batching

Processor Options

ProcessorSpeedCostBest For
lite, lite-fastFastest~$0.005/rowBasic metadata, high volume
base, base-fastFast~$0.01/rowStandard enrichments
core, core-fastMedium~$0.025/rowCross-referenced data
pro, pro-fastSlow~$0.10/rowDeep research

Column Name Mapping

Output columns are automatically converted to valid JSON property names:

DescriptionProperty Name
"CEO name"ceo_name
"Founding year (YYYY)"founding_year
"Annual revenue [USD]"annual_revenue
"2024 Revenue"col_2024_revenue

Security

How it Works

  1. Network Rule: Only allows egress to api.parallel.ai:443
  2. Secret: API key stored encrypted (not visible in SQL)
  3. External Access Integration: Combines rule and secret
  4. UDF: Uses integration to make secure API calls

Roles

Two roles are created:

  • PARALLEL_DEVELOPER: Can create and modify UDFs
  • PARALLEL_USER: Can execute UDFs only

Grant PARALLEL_USER to users who need to run enrichments:

GRANT ROLE PARALLEL_USER TO USER analyst_user;

Cost Considerations

Costs depend on:

  1. Number of rows: Each row = one enrichment run (but batched efficiently)
  2. Processor used: pro is 20x more expensive than lite
  3. Output columns: More columns may require more processing

Estimate costs:

  • lite-fast: ~$0.005/row
  • base-fast: ~$0.01/row
  • pro-fast: ~$0.10/row

Note: With PARTITION BY 1, all rows are batched into a single API call, reducing latency significantly.

Best Practices

1. Use Specific Descriptions

-- Good - specific
ARRAY_CONSTRUCT(
    'CEO name (current CEO or equivalent leader)',
    'Founding year (YYYY format)'
)

-- Less specific - may get inconsistent results
ARRAY_CONSTRUCT('CEO', 'Year')

2. Start with lite-fast

Use lite-fast for testing, then switch to base-fast or higher for production.

3. Use PARTITION BY for Batching

The PARTITION BY clause controls how rows are batched into API calls:

-- All rows in one batch (single API call)
TABLE(parallel_enrich(...) OVER (PARTITION BY 1))

-- One batch per region (one API call per region)
TABLE(parallel_enrich(...) OVER (PARTITION BY region))

-- One batch per date (process daily data separately)
TABLE(parallel_enrich(...) OVER (PARTITION BY DATE_TRUNC('day', created_at)))

When to use each approach:

PatternUse Case
PARTITION BY 1Small datasets (<1000 rows), fastest for few rows
PARTITION BY columnLarge datasets, natural groupings, incremental processing
PARTITION BY CEIL(ROW_NUMBER() OVER () / 100)Fixed batch sizes

Example: Partition by existing column

-- Process each region as a separate batch
SELECT
    e.input:company_name::STRING AS company_name,
    e.input:region::STRING AS region,
    e.enriched:ceo_name::STRING AS ceo_name
FROM companies t,
     TABLE(PARALLEL_INTEGRATION.ENRICHMENT.parallel_enrich(
         TO_JSON(OBJECT_CONSTRUCT('company_name', t.company_name, 'region', t.region)),
         ARRAY_CONSTRUCT('CEO name')
     ) OVER (PARTITION BY t.region)) e;

Example: Fixed batch sizes

-- Process in batches of 100 rows
WITH numbered AS (
    SELECT *, CEIL(ROW_NUMBER() OVER (ORDER BY company_name) / 100.0) AS batch_id
    FROM companies
)
SELECT
    e.input:company_name::STRING AS company_name,
    e.enriched:ceo_name::STRING AS ceo_name
FROM numbered t,
     TABLE(PARALLEL_INTEGRATION.ENRICHMENT.parallel_enrich(
         TO_JSON(OBJECT_CONSTRUCT('company_name', t.company_name)),
         ARRAY_CONSTRUCT('CEO name')
     ) OVER (PARTITION BY t.batch_id)) e;

Example: Incremental processing by date

-- Only process today's new records
SELECT
    e.input:company_name::STRING AS company_name,
    e.enriched:ceo_name::STRING AS ceo_name
FROM companies t,
     TABLE(PARALLEL_INTEGRATION.ENRICHMENT.parallel_enrich(
         TO_JSON(OBJECT_CONSTRUCT('company_name', t.company_name)),
         ARRAY_CONSTRUCT('CEO name')
     ) OVER (PARTITION BY DATE_TRUNC('day', t.created_at))) e
WHERE t.created_at >= CURRENT_DATE;

4. Cache Results

Store enriched results in a table to avoid re-processing:

CREATE TABLE enriched_cache AS
SELECT e.input, e.enriched
FROM companies t,
     TABLE(PARALLEL_INTEGRATION.ENRICHMENT.parallel_enrich(
         TO_JSON(OBJECT_CONSTRUCT('company_name', t.company_name)),
         ARRAY_CONSTRUCT('CEO name', 'Founding year')
     ) OVER (PARTITION BY 1)) e;

Cleanup

Using CLI

# Not yet implemented - use Python or SQL

Using Python

from parallel_web_tools.integrations.snowflake import cleanup_parallel_functions

cleanup_parallel_functions(
    account="your-account",
    user="your-user",
    password="your-password",
)

Using SQL

from parallel_web_tools.integrations.snowflake import get_cleanup_sql
print(get_cleanup_sql())

Then execute the SQL in Snowflake.

Next Steps