airtable-plasmate

April 12, 2026 · View on GitHub

Airtable integration for Plasmate - fetch web content and store structured data in Airtable with 10-100x token compression.

Overview

This integration connects Plasmate's browser engine with Airtable's no-code database platform. Plasmate converts HTML to a Semantic Object Model (SOM), extracting structured data that maps cleanly to Airtable fields.

Use cases:

Build content databases from web pages
Create research collections with auto-extracted metadata
Sync product catalogs from URLs
Archive articles with full text extraction
Monitor competitors by storing page snapshots

Installation

pip install airtable-plasmate

Or from source:

git clone https://github.com/nickarora/plasmate
cd plasmate/integrations/airtable-plasmate
pip install -e .

Requirements:

Python 3.9+
Plasmate binary (for CLI mode) or Plasmate API access
Airtable account with API access

Quick Start

from airtable_plasmate import PlasmateAirtable

# Initialize client
client = PlasmateAirtable(
    api_key="pat...",      # Airtable personal access token
    base_id="app...",      # Your Airtable base ID
)

# Fetch a URL and store in Airtable
record = client.fetch_and_store(
    url="https://example.com/article",
    table_name="Articles",
)

print(f"Created record: {record['id']}")

Features

1. Simple Fetch and Store

# Uses default field mappings (Title, Description, Content, etc.)
record = client.fetch_and_store(
    url="https://news.ycombinator.com",
    table_name="Articles",
)

2. Custom Field Mappings

Map any SOM field to any Airtable field:

from airtable_plasmate import FieldMapping, AirtableFieldType

mappings = [
    FieldMapping(
        som_path="metadata.title",
        airtable_field="Article Title",
        field_type=AirtableFieldType.SINGLE_LINE_TEXT,
    ),
    FieldMapping(
        som_path="metadata.description",
        airtable_field="Summary",
        field_type=AirtableFieldType.LONG_TEXT,
    ),
    FieldMapping(
        som_path="metadata.image",
        airtable_field="Cover Image",
        field_type=AirtableFieldType.ATTACHMENT,
    ),
    FieldMapping(
        som_path="content.text",
        airtable_field="Full Text",
        field_type=AirtableFieldType.LONG_TEXT,
        transform=lambda x: x[:5000],  # Truncate to 5000 chars
    ),
]

record = client.fetch_and_store(
    url="https://example.com",
    table_name="Articles",
    fields_map=mappings,
)

3. Batch Processing

Process multiple URLs efficiently:

urls = [
    "https://github.com",
    "https://stackoverflow.com",
    "https://reddit.com",
]

records = client.batch_fetch_and_store(
    urls=urls,
    table_name="Articles",
    on_error="skip",  # "skip", "raise", or "include"
)

print(f"Created {len(records)} records")

4. Sync Existing Records

Automatically fetch URLs from a field and populate other fields:

# Find records with URLs but without synced content
updated = client.sync_from_url_field(
    table_name="Articles",
    url_field="URL",
    status_field="Sync Status",
)

5. Custom Headers

Fetch authenticated or protected pages:

record = client.fetch_and_store(
    url="https://api.example.com/article",
    table_name="Articles",
    headers={
        "Authorization": "Bearer your-token",
        "Cookie": "session=abc123",
    },
)

Field Mapping Reference

Default SOM Paths

SOM Path	Description	Example
`metadata.title`	Page title	"Example Article"
`metadata.description`	Meta description	"An example..."
`metadata.author`	Author name	"John Doe"
`metadata.published_date`	Publication date	"2024-01-15"
`metadata.image`	Featured image URL	"https://..."
`metadata.keywords`	Keywords array	["tech", "ai"]
`content.text`	Main content text	"Full article..."
`links`	Array of links	[{href, text}]
`images`	Array of images	[{src, alt}]

Airtable Field Types

Type	Use For
`SINGLE_LINE_TEXT`	Titles, short text
`LONG_TEXT`	Descriptions, content
`RICH_TEXT`	Formatted content
`URL`	Links
`ATTACHMENT`	Images, files
`NUMBER`	Counts, metrics
`DATE` / `DATETIME`	Dates
`SINGLE_SELECT`	Categories
`MULTIPLE_SELECT`	Tags
`CHECKBOX`	Boolean flags

Airtable Automations

Use Plasmate directly in Airtable Automations for no-code workflows.

Setup

In Airtable, go to Automations
Create new automation with trigger:
- "When record created" or "When record updated"
- Configure to watch your URL field
Add action: Run a script
Copy the script from scripts/automation.js
Configure the CONFIG object at the top of the script

Generate Custom Scripts

from airtable_plasmate import generate_automation_script, AutomationScript

config = AutomationScript(
    url_field="URL",
    output_fields={
        "metadata.title": "Title",
        "metadata.description": "Summary",
        "content.text": "Content",
    },
    plasmate_url="https://api.plasmate.app",
    status_field="Sync Status",
)

script = generate_automation_script(config)
print(script)  # Paste this into Airtable

Airtable Setup

Required Table Fields

Create a table with these fields for the default mappings:

Field Name	Field Type	Notes
URL	URL	Source URL to fetch
Title	Single line text	Page title
Description	Long text	Meta description
Content	Long text	Main content
Author	Single line text	Author name
Published Date	Date	Publication date
Featured Image	Attachment	OG image
Sync Status	Single select	Synced/Error/Pending

Getting Your API Key

Go to Airtable API
Create a new personal access token
Grant data.records:read and data.records:write scopes
Select the bases you want to access

Getting Your Base ID

Open your Airtable base
Click Help > API documentation
Find your base ID (starts with app)

Configuration

Environment Variables

export AIRTABLE_API_KEY="pat..."
export AIRTABLE_BASE_ID="app..."
export PLASMATE_BINARY_PATH="/usr/local/bin/plasmate"

Plasmate Configuration

from airtable_plasmate import PlasmateAirtable, PlasmateConfig

config = PlasmateConfig(
    binary_path="/usr/local/bin/plasmate",
    timeout=60,
    extra_args=["--user-agent", "MyBot/1.0"],
)

client = PlasmateAirtable(
    api_key="pat...",
    base_id="app...",
    plasmate_config=config,
)

Examples

See the examples/ directory:

content_database.py - Build a content database from URLs
More examples coming soon

API Reference

PlasmateAirtable

Main client class.

PlasmateAirtable(
    api_key: str,           # Airtable personal access token
    base_id: str,           # Base ID (starts with 'app')
    plasmate_config: PlasmateConfig | None = None,
)

Methods:

fetch_som(url, headers) - Fetch URL and return SOM
fetch_and_store(url, table_name, fields_map, headers, transform) - Fetch and create record
batch_fetch_and_store(urls, table_name, fields_map, headers, on_error) - Batch process
sync_from_url_field(table_name, url_field, status_field, fields_map, filter_formula) - Sync existing records

FieldMapping

Configure individual field mappings.

FieldMapping(
    som_path: str,              # Path in SOM (e.g., "metadata.title")
    airtable_field: str,        # Airtable field name
    field_type: AirtableFieldType = SINGLE_LINE_TEXT,
    transform: Callable | None = None,  # Transform function
    default: Any = None,        # Default value if not found
)

Contributing

Contributions welcome! Please read the contributing guidelines in the main Plasmate repository.

License

MIT License - see LICENSE file for details.