airtable-plasmate
April 12, 2026 ยท View on GitHub
Airtable integration for Plasmate - fetch web content and store structured data in Airtable with 10-100x token compression.
Overview
This integration connects Plasmate's browser engine with Airtable's no-code database platform. Plasmate converts HTML to a Semantic Object Model (SOM), extracting structured data that maps cleanly to Airtable fields.
Use cases:
- Build content databases from web pages
- Create research collections with auto-extracted metadata
- Sync product catalogs from URLs
- Archive articles with full text extraction
- Monitor competitors by storing page snapshots
Installation
pip install airtable-plasmate
Or from source:
git clone https://github.com/nickarora/plasmate
cd plasmate/integrations/airtable-plasmate
pip install -e .
Requirements:
- Python 3.9+
- Plasmate binary (for CLI mode) or Plasmate API access
- Airtable account with API access
Quick Start
from airtable_plasmate import PlasmateAirtable
# Initialize client
client = PlasmateAirtable(
api_key="pat...", # Airtable personal access token
base_id="app...", # Your Airtable base ID
)
# Fetch a URL and store in Airtable
record = client.fetch_and_store(
url="https://example.com/article",
table_name="Articles",
)
print(f"Created record: {record['id']}")
Features
1. Simple Fetch and Store
# Uses default field mappings (Title, Description, Content, etc.)
record = client.fetch_and_store(
url="https://news.ycombinator.com",
table_name="Articles",
)
2. Custom Field Mappings
Map any SOM field to any Airtable field:
from airtable_plasmate import FieldMapping, AirtableFieldType
mappings = [
FieldMapping(
som_path="metadata.title",
airtable_field="Article Title",
field_type=AirtableFieldType.SINGLE_LINE_TEXT,
),
FieldMapping(
som_path="metadata.description",
airtable_field="Summary",
field_type=AirtableFieldType.LONG_TEXT,
),
FieldMapping(
som_path="metadata.image",
airtable_field="Cover Image",
field_type=AirtableFieldType.ATTACHMENT,
),
FieldMapping(
som_path="content.text",
airtable_field="Full Text",
field_type=AirtableFieldType.LONG_TEXT,
transform=lambda x: x[:5000], # Truncate to 5000 chars
),
]
record = client.fetch_and_store(
url="https://example.com",
table_name="Articles",
fields_map=mappings,
)
3. Batch Processing
Process multiple URLs efficiently:
urls = [
"https://github.com",
"https://stackoverflow.com",
"https://reddit.com",
]
records = client.batch_fetch_and_store(
urls=urls,
table_name="Articles",
on_error="skip", # "skip", "raise", or "include"
)
print(f"Created {len(records)} records")
4. Sync Existing Records
Automatically fetch URLs from a field and populate other fields:
# Find records with URLs but without synced content
updated = client.sync_from_url_field(
table_name="Articles",
url_field="URL",
status_field="Sync Status",
)
5. Custom Headers
Fetch authenticated or protected pages:
record = client.fetch_and_store(
url="https://api.example.com/article",
table_name="Articles",
headers={
"Authorization": "Bearer your-token",
"Cookie": "session=abc123",
},
)
Field Mapping Reference
Default SOM Paths
| SOM Path | Description | Example |
|---|---|---|
metadata.title | Page title | "Example Article" |
metadata.description | Meta description | "An example..." |
metadata.author | Author name | "John Doe" |
metadata.published_date | Publication date | "2024-01-15" |
metadata.image | Featured image URL | "https://..." |
metadata.keywords | Keywords array | ["tech", "ai"] |
content.text | Main content text | "Full article..." |
links | Array of links | [{href, text}] |
images | Array of images | [{src, alt}] |
Airtable Field Types
| Type | Use For |
|---|---|
SINGLE_LINE_TEXT | Titles, short text |
LONG_TEXT | Descriptions, content |
RICH_TEXT | Formatted content |
URL | Links |
ATTACHMENT | Images, files |
NUMBER | Counts, metrics |
DATE / DATETIME | Dates |
SINGLE_SELECT | Categories |
MULTIPLE_SELECT | Tags |
CHECKBOX | Boolean flags |
Airtable Automations
Use Plasmate directly in Airtable Automations for no-code workflows.
Setup
- In Airtable, go to Automations
- Create new automation with trigger:
- "When record created" or "When record updated"
- Configure to watch your URL field
- Add action: Run a script
- Copy the script from
scripts/automation.js - Configure the
CONFIGobject at the top of the script
Generate Custom Scripts
from airtable_plasmate import generate_automation_script, AutomationScript
config = AutomationScript(
url_field="URL",
output_fields={
"metadata.title": "Title",
"metadata.description": "Summary",
"content.text": "Content",
},
plasmate_url="https://api.plasmate.app",
status_field="Sync Status",
)
script = generate_automation_script(config)
print(script) # Paste this into Airtable
Airtable Setup
Required Table Fields
Create a table with these fields for the default mappings:
| Field Name | Field Type | Notes |
|---|---|---|
| URL | URL | Source URL to fetch |
| Title | Single line text | Page title |
| Description | Long text | Meta description |
| Content | Long text | Main content |
| Author | Single line text | Author name |
| Published Date | Date | Publication date |
| Featured Image | Attachment | OG image |
| Sync Status | Single select | Synced/Error/Pending |
Getting Your API Key
- Go to Airtable API
- Create a new personal access token
- Grant
data.records:readanddata.records:writescopes - Select the bases you want to access
Getting Your Base ID
- Open your Airtable base
- Click Help > API documentation
- Find your base ID (starts with
app)
Configuration
Environment Variables
export AIRTABLE_API_KEY="pat..."
export AIRTABLE_BASE_ID="app..."
export PLASMATE_BINARY_PATH="/usr/local/bin/plasmate"
Plasmate Configuration
from airtable_plasmate import PlasmateAirtable, PlasmateConfig
config = PlasmateConfig(
binary_path="/usr/local/bin/plasmate",
timeout=60,
extra_args=["--user-agent", "MyBot/1.0"],
)
client = PlasmateAirtable(
api_key="pat...",
base_id="app...",
plasmate_config=config,
)
Examples
See the examples/ directory:
content_database.py- Build a content database from URLs- More examples coming soon
API Reference
PlasmateAirtable
Main client class.
PlasmateAirtable(
api_key: str, # Airtable personal access token
base_id: str, # Base ID (starts with 'app')
plasmate_config: PlasmateConfig | None = None,
)
Methods:
fetch_som(url, headers)- Fetch URL and return SOMfetch_and_store(url, table_name, fields_map, headers, transform)- Fetch and create recordbatch_fetch_and_store(urls, table_name, fields_map, headers, on_error)- Batch processsync_from_url_field(table_name, url_field, status_field, fields_map, filter_formula)- Sync existing records
FieldMapping
Configure individual field mappings.
FieldMapping(
som_path: str, # Path in SOM (e.g., "metadata.title")
airtable_field: str, # Airtable field name
field_type: AirtableFieldType = SINGLE_LINE_TEXT,
transform: Callable | None = None, # Transform function
default: Any = None, # Default value if not found
)
Contributing
Contributions welcome! Please read the contributing guidelines in the main Plasmate repository.
License
MIT License - see LICENSE file for details.