airtable-plasmate

April 12, 2026 ยท View on GitHub

Airtable integration for Plasmate - fetch web content and store structured data in Airtable with 10-100x token compression.

Overview

This integration connects Plasmate's browser engine with Airtable's no-code database platform. Plasmate converts HTML to a Semantic Object Model (SOM), extracting structured data that maps cleanly to Airtable fields.

Use cases:

  • Build content databases from web pages
  • Create research collections with auto-extracted metadata
  • Sync product catalogs from URLs
  • Archive articles with full text extraction
  • Monitor competitors by storing page snapshots

Installation

pip install airtable-plasmate

Or from source:

git clone https://github.com/nickarora/plasmate
cd plasmate/integrations/airtable-plasmate
pip install -e .

Requirements:

  • Python 3.9+
  • Plasmate binary (for CLI mode) or Plasmate API access
  • Airtable account with API access

Quick Start

from airtable_plasmate import PlasmateAirtable

# Initialize client
client = PlasmateAirtable(
    api_key="pat...",      # Airtable personal access token
    base_id="app...",      # Your Airtable base ID
)

# Fetch a URL and store in Airtable
record = client.fetch_and_store(
    url="https://example.com/article",
    table_name="Articles",
)

print(f"Created record: {record['id']}")

Features

1. Simple Fetch and Store

# Uses default field mappings (Title, Description, Content, etc.)
record = client.fetch_and_store(
    url="https://news.ycombinator.com",
    table_name="Articles",
)

2. Custom Field Mappings

Map any SOM field to any Airtable field:

from airtable_plasmate import FieldMapping, AirtableFieldType

mappings = [
    FieldMapping(
        som_path="metadata.title",
        airtable_field="Article Title",
        field_type=AirtableFieldType.SINGLE_LINE_TEXT,
    ),
    FieldMapping(
        som_path="metadata.description",
        airtable_field="Summary",
        field_type=AirtableFieldType.LONG_TEXT,
    ),
    FieldMapping(
        som_path="metadata.image",
        airtable_field="Cover Image",
        field_type=AirtableFieldType.ATTACHMENT,
    ),
    FieldMapping(
        som_path="content.text",
        airtable_field="Full Text",
        field_type=AirtableFieldType.LONG_TEXT,
        transform=lambda x: x[:5000],  # Truncate to 5000 chars
    ),
]

record = client.fetch_and_store(
    url="https://example.com",
    table_name="Articles",
    fields_map=mappings,
)

3. Batch Processing

Process multiple URLs efficiently:

urls = [
    "https://github.com",
    "https://stackoverflow.com",
    "https://reddit.com",
]

records = client.batch_fetch_and_store(
    urls=urls,
    table_name="Articles",
    on_error="skip",  # "skip", "raise", or "include"
)

print(f"Created {len(records)} records")

4. Sync Existing Records

Automatically fetch URLs from a field and populate other fields:

# Find records with URLs but without synced content
updated = client.sync_from_url_field(
    table_name="Articles",
    url_field="URL",
    status_field="Sync Status",
)

5. Custom Headers

Fetch authenticated or protected pages:

record = client.fetch_and_store(
    url="https://api.example.com/article",
    table_name="Articles",
    headers={
        "Authorization": "Bearer your-token",
        "Cookie": "session=abc123",
    },
)

Field Mapping Reference

Default SOM Paths

SOM PathDescriptionExample
metadata.titlePage title"Example Article"
metadata.descriptionMeta description"An example..."
metadata.authorAuthor name"John Doe"
metadata.published_datePublication date"2024-01-15"
metadata.imageFeatured image URL"https://..."
metadata.keywordsKeywords array["tech", "ai"]
content.textMain content text"Full article..."
linksArray of links[{href, text}]
imagesArray of images[{src, alt}]

Airtable Field Types

TypeUse For
SINGLE_LINE_TEXTTitles, short text
LONG_TEXTDescriptions, content
RICH_TEXTFormatted content
URLLinks
ATTACHMENTImages, files
NUMBERCounts, metrics
DATE / DATETIMEDates
SINGLE_SELECTCategories
MULTIPLE_SELECTTags
CHECKBOXBoolean flags

Airtable Automations

Use Plasmate directly in Airtable Automations for no-code workflows.

Setup

  1. In Airtable, go to Automations
  2. Create new automation with trigger:
    • "When record created" or "When record updated"
    • Configure to watch your URL field
  3. Add action: Run a script
  4. Copy the script from scripts/automation.js
  5. Configure the CONFIG object at the top of the script

Generate Custom Scripts

from airtable_plasmate import generate_automation_script, AutomationScript

config = AutomationScript(
    url_field="URL",
    output_fields={
        "metadata.title": "Title",
        "metadata.description": "Summary",
        "content.text": "Content",
    },
    plasmate_url="https://api.plasmate.app",
    status_field="Sync Status",
)

script = generate_automation_script(config)
print(script)  # Paste this into Airtable

Airtable Setup

Required Table Fields

Create a table with these fields for the default mappings:

Field NameField TypeNotes
URLURLSource URL to fetch
TitleSingle line textPage title
DescriptionLong textMeta description
ContentLong textMain content
AuthorSingle line textAuthor name
Published DateDatePublication date
Featured ImageAttachmentOG image
Sync StatusSingle selectSynced/Error/Pending

Getting Your API Key

  1. Go to Airtable API
  2. Create a new personal access token
  3. Grant data.records:read and data.records:write scopes
  4. Select the bases you want to access

Getting Your Base ID

  1. Open your Airtable base
  2. Click Help > API documentation
  3. Find your base ID (starts with app)

Configuration

Environment Variables

export AIRTABLE_API_KEY="pat..."
export AIRTABLE_BASE_ID="app..."
export PLASMATE_BINARY_PATH="/usr/local/bin/plasmate"

Plasmate Configuration

from airtable_plasmate import PlasmateAirtable, PlasmateConfig

config = PlasmateConfig(
    binary_path="/usr/local/bin/plasmate",
    timeout=60,
    extra_args=["--user-agent", "MyBot/1.0"],
)

client = PlasmateAirtable(
    api_key="pat...",
    base_id="app...",
    plasmate_config=config,
)

Examples

See the examples/ directory:

  • content_database.py - Build a content database from URLs
  • More examples coming soon

API Reference

PlasmateAirtable

Main client class.

PlasmateAirtable(
    api_key: str,           # Airtable personal access token
    base_id: str,           # Base ID (starts with 'app')
    plasmate_config: PlasmateConfig | None = None,
)

Methods:

  • fetch_som(url, headers) - Fetch URL and return SOM
  • fetch_and_store(url, table_name, fields_map, headers, transform) - Fetch and create record
  • batch_fetch_and_store(urls, table_name, fields_map, headers, on_error) - Batch process
  • sync_from_url_field(table_name, url_field, status_field, fields_map, filter_formula) - Sync existing records

FieldMapping

Configure individual field mappings.

FieldMapping(
    som_path: str,              # Path in SOM (e.g., "metadata.title")
    airtable_field: str,        # Airtable field name
    field_type: AirtableFieldType = SINGLE_LINE_TEXT,
    transform: Callable | None = None,  # Transform function
    default: Any = None,        # Default value if not found
)

Contributing

Contributions welcome! Please read the contributing guidelines in the main Plasmate repository.

License

MIT License - see LICENSE file for details.