Data Formulator
May 28, 2026 · View on GitHub
Drop-in Python files that add new data-source connectors to Data Formulator without modifying its source code. If the built-in connectors don't cover your data source (an internal warehouse, a SaaS API, a niche database), write a small plugin and DF will pick it up on the next restart.
This folder contains example plugins. Treat them as templates: copy one, rename it, and adapt the body.
Quick start (3 steps)
-
Find your plugin directory. It lives under your Data Formulator home dir:
$DATA_FORMULATOR_HOME/plugins/If
DATA_FORMULATOR_HOMEis not set, DF defaults to~/.data_formulator/, so the plugin dir is~/.data_formulator/plugins/.Power users can point somewhere else with
DF_PLUGIN_DIR(highest precedence) — useful for sharing one plugin folder across multiple DF installs. -
Copy an example into it. For instance:
mkdir -p "${DATA_FORMULATOR_HOME:-$HOME/.data_formulator}/plugins" cp examples/plugins/sqlite_data_loader.py \ "${DATA_FORMULATOR_HOME:-$HOME/.data_formulator}/plugins/" -
Restart Data Formulator. The new connector appears in the UI automatically. No registry edits, no rebuilds.
To verify it loaded, check the startup log for a line like:
INFO ... Plugin loader 'sqlite' registered from sqlite_data_loader.py
INFO ... Plugin scan complete: 1 registered, 0 failed (dir=..., reason=WORKSPACE_BACKEND=local)
Plugin directory resolution order
| Precedence | Source | Default |
|---|---|---|
| 1 | DF_PLUGIN_DIR env var (explicit override) | — |
| 2 | $DATA_FORMULATOR_HOME/plugins | — |
| 3 | Fallback | ~/.data_formulator/plugins/ |
File-name contract
| Filename | Registry key |
|---|---|
sqlite_data_loader.py | sqlite |
acme_warehouse_data_loader.py | acme_warehouse |
notion_data_loader.py | notion |
Rules:
- The filename must end in
_data_loader.py. - The prefix becomes the registry key — keep it lowercase, no spaces.
- If the key matches a built-in (e.g.
mysql_data_loader.py), the plugin overrides the built-in. Useful for hot-patching.
What goes inside the file
Each plugin defines exactly one class that subclasses
ExternalDataLoader.
The minimum surface area:
from data_formulator.data_loader.external_data_loader import (
ExternalDataLoader, MAX_IMPORT_ROWS,
)
import pyarrow as pa
class MyLoader(ExternalDataLoader):
# Optional: human-friendly UI label. Without this, the registry key
# is title-cased (``"my_warehouse"`` → ``"My Warehouse"``). Override
# to fix awkward casing (``"SQLite"``, ``"BigQuery"``).
DISPLAY_NAME = "My Warehouse"
@staticmethod
def list_params() -> list[dict]:
"""Declare connection-form fields. The UI auto-renders this."""
return [
{"name": "endpoint", "type": "string", "required": True,
"tier": "connection", "description": "Server URL"},
{"name": "token", "type": "string", "required": True,
"tier": "auth", "sensitive": True, "description": "API token"},
]
@staticmethod
def auth_instructions() -> str:
"""Markdown help text shown next to the form."""
return "Get your API token from https://example.com/settings/tokens"
def __init__(self, params: dict):
self.params = params
# validate + open connection here
def list_tables(self, table_filter: str | None = None) -> list[dict]:
"""Return catalog: [{name, metadata: {columns, row_count}}, ...]"""
...
def fetch_data_as_arrow(self, source_table: str,
import_options: dict | None = None) -> pa.Table:
"""Read rows. Honour import_options['size'] up to MAX_IMPORT_ROWS."""
...
Look at sqlite_data_loader.py for a runnable
implementation (~170 lines, stdlib only).
list_params() field reference
| Key | Meaning |
|---|---|
name | Parameter key passed into __init__(params) |
type | "string", "int", "bool", "password" |
required | If True, DF rejects connections that omit it |
default | Pre-filled value in the form |
sensitive | If True, value is redacted from logs / stored metadata |
tier | "auth", "connection", or "filter" — groups fields visually |
description | Help text shown under the field |
Security gate — please read
Plugins execute arbitrary Python in the server process. To prevent accidental code execution in shared deployments, the plugin scanner is enabled only in single-user local mode:
WORKSPACE_BACKENDunset orlocal→ scanner runs.WORKSPACE_BACKENDis anything else → scanner is skipped silently.- To opt in for a hosted deployment, set
DF_ALLOW_PLUGINS=1and make sure the plugin directory is writable only by trusted administrators.
Debugging a plugin that won't load
The scanner reports every failure two ways:
-
In the log, with a full traceback:
WARNING Failed to load plugin foo_data_loader.py: ... ModuleNotFoundError ... -
In
DISABLED_LOADERS, which the frontend surfaces as a greyed-out connector with the reason. Common reasons:Message Fix missing dependency 'X' (pip install X)uv pip install Xin DF's venvno ExternalDataLoader subclass foundAdd a class Foo(ExternalDataLoader)to the filecould not create import specFile isn't valid Python or has wrong extension <ExceptionType>: <message>Anything else raised during import— check the traceback
The plugin scanner cleans sys.modules on failure, so simply restarting
DF after editing the file picks up the fix — no manual cleanup needed.
Adding third-party dependencies
If your plugin needs a package that isn't already in DF's environment:
# from the data-formulator repo root
uv pip install <package>
The plugin file is loaded into the same Python process as DF, so anything importable from DF's venv is importable from your plugin.
For AI coding agents
When asked to add a new data source to a user's Data Formulator install, prefer writing a plugin file over modifying DF's source tree:
- Read
sqlite_data_loader.pyand theExternalDataLoaderABC inpy-src/data_formulator/data_loader/external_data_loader.pyto understand the required methods. - Create a new file
<source>_data_loader.pyin this folder (or directly in~/.data_formulator/plugins/). - Subclass
ExternalDataLoader. Implement at minimum:list_params,auth_instructions,__init__,list_tables,fetch_data_as_arrow. - Return data as a
pyarrow.Tablefromfetch_data_as_arrow— do not convert to pandas in the hot path. - Quote identifiers when building SQL (see
_quote_identin the SQLite example) to avoid injection vulnerabilities. - Respect
import_options['size']and cap atMAX_IMPORT_ROWS. - If the source has credentials, mark those params
sensitive: Trueandtier: "auth"so DF redacts them from stored metadata. - Do not modify the user's data — open read-only connections where the source supports it.
After writing the file, verify it loads with:
DF_PLUGIN_DIR=<dir> uv run python -c \
"from data_formulator import data_loader as dl; \
print(dl.PLUGIN_LOADERS, dl.DISABLED_LOADERS)"