ContentForge Testing Guide

May 24, 2026 · View on GitHub

Complete testing guide for the ContentForge enterprise content production plugin.


Table of Contents

  1. Test Environment Setup
  2. Installation Tests
  3. Pipeline Tests
  4. Command Tests
  5. Skill Tests
  6. Script Tests
  7. Config & Industry Knowledge Pack Tests
  8. Hook Tests
  9. MCP Connector Tests
  10. Google Integration Tests
  11. Multi-Backend Tests
  12. Edge Cases & Error Scenarios
  13. Regression Checklist
  14. Test Priority Order

1. Test Environment Setup

Prerequisites

  • Claude Cowork or Claude Code with plugin support
  • Google service account credentials (optional, for Google Sheets/Drive tests)
  • At least one brand profile set up (or plan to set up during testing)

Installation Sources

MethodURL
Marketplacehttps://github.com/indranilbanerjee/neels-plugins.git
Direct URLhttps://github.com/indranilbanerjee/contentforge.git

Pre-Test Cleanup

# Clear plugin cache (if reinstalling)
rm -rf ~/.claude/plugins/cache/

# Clear ContentForge brand data (for fresh brand setup test)
# WARNING: Only do this if you want to start fresh
rm -rf ~/.claude-marketing/

Test Brands to Use

Brand NameIndustryPurpose
"TestBrand Alpha"technologyPrimary test brand
"HealthFirst Clinic"healthcareRegulated industry test
"GlobalFinance Corp"bfsiFinancial compliance test
"QuickShop"ecommerceSimple B2C test

2. Installation Tests

2.1 Marketplace Installation

Steps:

  1. In Claude Cowork, go to Settings > Plugins > Add Marketplace
  2. Enter URL: https://github.com/indranilbanerjee/neels-plugins.git
  3. Install contentforge

Expected Results:

  • Marketplace loads without errors
  • ContentForge listed with version 3.8.0
  • Description mentions "13 agents, 19 skills, 10 industry knowledge packs"
  • Installation completes without rollback
  • No "Host key verification failed" error (uses HTTPS, not SSH)

If installation fails:

  • Check ~/.claude/logs/main.log for VMCLIRunner errors
  • Look for virtiofs mount: Plan9 mount failed (VM instability — retry)
  • Look for EXDEV errors (known bug #25444)
  • Clear ~/.claude/plugins/cache/ and retry

2.2 Direct URL Installation

Steps:

  1. Settings > Plugins > Add Plugin
  2. Enter URL: https://github.com/indranilbanerjee/contentforge.git

Expected: Same results as marketplace installation

2.3 Session Start Verification

Test: Start a new session after installation

Expected Results:

  • SessionStart hook fires — setup.py runs without errors
  • Version banner displays:
    ✓ ContentForge v3.8 loaded
    
    Quick Start:
      1. /contentforge:style-guide — Set up brand profile (do this first, 5 min)
      2. /contentforge  — Create content (20-30 min per piece)
      3. /contentforge:help        — Full guide, examples, troubleshooting
    
    Already have a brand? Jump straight to /contentforge [topic]
    
    Commands: /batch-process | /content-refresh | /contentforge:social-adapt | /contentforge:publish
    Status:   /contentforge:integrations | /contentforge:analytics
    
  • 7 commands visible in Customize panel (create-content, content-brief, social-adapt, publish, translate, brand-setup, audit-content)
  • 19 skills visible in Skills section
  • 13 agents registered (check for no frontmatter errors in logs)

2.4 Plugin Structure Verification

Test: Verify all expected files are present after installation

Expected file counts:

  • agents/ — 13 files (01 through 11 + 03.5 + 06.5)
  • commands/ — 7 files
  • skills/ — 19 skill directories, each with SKILL.md
  • scripts/ — 8 files (setup.py, connector-status.py, sheets-tracker.py, drive-uploader.py, pipeline-tracker.py, airtable-tracker.py, local-tracker.py, backend-migrator.py)
  • config/ — 7 config files + industries/ subdirectory with 10 JSON packs
  • templates/ — 10 template files
  • utilities/ — 6 utility files
  • .mcp.json — 9 HTTP connectors
  • hooks/hooks.json — 2 hook events (SessionStart, PreToolUse)

3. Pipeline Tests

The 10-phase pipeline is the core product. Test with different content types and industries.

3.1 Full Pipeline — Blog Post (Technology)

Prompt: /contentforge Write a blog post about "How AI Agents Are Transforming Content Marketing in 2026" for a technology brand

Expected Results — Phase by Phase:

PhaseWhat to Verify
Phase 1: ResearchFinds 5+ live sources, competitor analysis, differentiation angle. Research brief output visible.
Phase 2: Fact CheckVerifies claims, checks URLs, flags unverified stats. 80%+ verification rate. Statistics Verification Report produced.
Phase 3: Content DraftLoads technology knowledge pack (Step 0.3 SME Calibration). Uses correct terminology depth. Visual placeholders inserted. Draft Metadata includes SME Calibration Summary.
Phase 3.5: Visual Asset AnnotatorGenerates charts from Phase 2 data. Creates <!-- VISUAL: ... --> markers. Asset manifest written. Visual density 2-4 per 1000 words for blog.
Phase 4: Scientific ValidatorValidates chart data accuracy. Domain-specific validation (Step 5) checks technology terminology. Zero hallucinations.
Phase 5: StructurerOptimizes structure, fixes grammar. Preserves visual markers.
Phase 6: SEO/GEOKeyword optimization. Produces typed `<!-- INTERNAL-LINK: type=topical
Phase 6.5: HumanizerRemoves AI telltale phrases. Adds natural voice. Preserves visual + link markers.
Phase 7: ReviewerScores all dimensions including Visual Asset Quality and Internal Linking — split into 6a Topical / 6b Commercial / 6c Conversion sub-scores, averaged for the dimension. No "no site structure = full credit (8)" free-pass; agent must produce useful markers (real URLs or placeholders) to earn credit. Overall score 7+.
Phase 8: Output.docx generated. Charts embedded. TODO boxes for human-needed visuals. Internal links as clickable hyperlinks color-coded by category (topical blue, commercial green, conversion purple, authority slate); placeholders render as bold red [anchor] [LINK TBD: type]. Appendix D — Internal Link Map shows every link the SEO agent placed with target URLs, anchor text, section, and reason. Completion summary shows internal_links_total and internal_links_by_type.

v3.4.0 Feature Checks:

  • SME Calibration Summary appears in Draft Metadata (Phase 3)
  • Technology knowledge pack loaded (verify terminology like "AI vs ML vs deep learning" distinction)
  • Domain-Specific Validation report section appears in Phase 4 output
  • Visual placeholders marked by Phase 3, annotated by Phase 3.5
  • Phase 4 validates chart data against Phase 2 verified statistics
  • Final output has charts embedded (if data available) or TODO boxes
  • Internal link markers present (or generic recommendations if no site structure)

3.2 Full Pipeline — Whitepaper (Pharma)

Prompt: /contentforge Write a whitepaper about "The Role of AI in Drug Discovery: From Target Identification to Clinical Trials" for a pharma company

Why this test matters: Pharma is a heavily regulated industry with strict evidence standards.

Specific Checks:

  • SME Calibration loads pharma knowledge pack
  • Expertise stance: "Clinical researcher or pharmaceutical scientist"
  • Uses FDA/EMA terminology correctly
  • Evidence hierarchy: Clinical trials ranked above observational studies
  • Required disclaimers present (investigational compound, not medical advice)
  • Domain validation catches prohibited claims ("cure", "miracle", "100% effective")
  • Visual density 3-5 per 1000 words (whitepaper target)
  • Quality Gate 4 includes regulatory compliance check
  • Word count 3000-6000 (whitepaper range)

3.3 Full Pipeline — Article (BFSI)

Prompt: /contentforge Write an article about "Open Banking APIs: How Financial Institutions Can Monetize Data Safely" for a banking audience

Specific Checks:

  • BFSI knowledge pack loaded
  • Regulatory awareness: GDPR, PCI DSS, SOX mentioned correctly
  • Financial terminology used precisely (APR vs interest rate, etc.)
  • Required disclaimers (not financial advice, regulatory specifics by jurisdiction)
  • Common pitfalls avoided (no "guaranteed returns", no "risk-free")
  • Evidence standards: regulatory filings and official reports cited

3.4 Full Pipeline — FAQ (Healthcare)

Prompt: /contentforge Write an FAQ about "Understanding Telehealth: Patient Questions Answered" for a healthcare provider

Specific Checks:

  • Healthcare knowledge pack loaded
  • HIPAA awareness in content
  • Medical terminology appropriate for patient audience (simpler language)
  • Disclaimers present (not a substitute for professional medical advice)
  • Visual density 0-1 per 1000 words (FAQ target — minimal visuals)

3.5 Full Pipeline — Research Paper (Education)

Prompt: /contentforge Write a research paper about "The Impact of AI Tutoring Systems on Student Learning Outcomes" for education researchers

Specific Checks:

  • Education knowledge pack loaded
  • Academic structure: methodology, findings, discussion
  • Evidence standards: peer-reviewed educational research cited
  • FERPA awareness
  • Citation density meets research paper requirements

3.6 Pipeline with No Brand Profile

Prompt: /contentforge Write a blog post about "Remote Work Productivity Tips"

Without setting up a brand first.

Expected: Pipeline should still work with generic defaults. No crash. Should offer to set up a brand profile.

3.7 Pipeline Feedback Loops

Test: Trigger a quality gate failure to see if the feedback loop works.

How: Request content with an intentionally complex or niche topic that might fail fact-checking. Observe whether Phase 4 → Phase 3.5 feedback loop triggers (max 1 iteration per scoring-thresholds.json).

Expected:

  • Quality gate failure detected
  • Feedback sent to earlier phase
  • Re-run produces improved output
  • Max loop limit respected (5 total loops, 1 for phase 4→3.5)

4. Command Tests

Test all 7 commands visible in the Customize panel.

4.1 /brand-setup

Prompt: "Set up brand profile for TestBrand Alpha — a B2B SaaS company that makes project management software"

Step-by-Step Verification:

StepTestExpected
A: Brand IdentityProvide brand name, URL, industryBrand name stored, industry mapped to knowledge pack
B: Voice & ToneProvide voice description or sample contentVoice profile created with tone dimensions
C: TerminologyProvide approved/banned termsTerminology rules stored in brand profile
D: ComplianceProvide compliance rulesGuardrails configured
E: Reference ContentProvide sample URLs or documentsReference content analyzed
F: Key File GenerationLet it auto-generate key filesbrand-profile.json, guardrails.json, reference-content.md created
G: Tracking BackendChoose Google Sheets, Airtable, or LocalBackend configured in brand profile tracking.backend field

v3.4.0 Step F Checks:

  • Step F presented as an option (generate new or update existing)
  • Key files generated from website analysis + user input
  • Files saved to ~/.claude-marketing/{brand}/
  • Drive upload attempted (with graceful fallback if no credentials)
  • brand-profile.json has correct industry field matching knowledge pack filename

4.2 /create-content

Prompt: /create-content "5 B2B SaaS Pricing Strategies That Actually Work" blog

Expected: Triggers full 10-phase pipeline. Same verification as Pipeline Tests above.

4.3 /content-brief

Prompt: /content-brief "kubernetes security best practices"

Expected:

  • Keyword research with search volume data
  • Competitor analysis (top 5 ranking pages)
  • Search intent classification
  • Audience insights
  • Recommended outline
  • SEO strategy

4.4 /social-adapt

Requires: An existing piece of content (run /create-content first)

Prompt: /social-adapt [previous article] for linkedin and twitter

Expected:

  • LinkedIn version: professional tone, longer format, hashtags
  • Twitter/X version: punchy, within character limit, thread if needed
  • Each platform follows social-platform-specs.json rules
  • Brand voice maintained across platforms

4.5 /publish

Requires: Webflow or WordPress MCP connector configured

Prompt: /publish [previous content] --platform=webflow --status=draft

Expected:

  • Preview shown before publishing
  • HTML export fallback if MCP not connected
  • Draft status respected (not published live)

4.6 /translate

Prompt: /translate [previous content] --language=es --level=adapted

Expected:

  • Spanish translation with brand voice preservation
  • Citations maintained in original language with translations
  • SEO elements translated (meta title, description)
  • "Adapted" level: cultural nuances adjusted, not literal word-for-word

4.7 /audit-content

Prompt: /audit-content [provide a Drive folder URL or WordPress URL]

Expected:

  • Content library scanned
  • Freshness decay identified (outdated statistics, stale references)
  • Coverage gaps flagged
  • Optimization opportunities listed

5. Skill Tests

Test each of the 19 skills individually.

Core Pipeline Skills

#SkillTest PromptKey Checks
1/contentforge"Write article about cloud migration"Full 10-phase pipeline triggers
2/batch-process"Create 3 blog posts about: AI agents, no-code tools, API security"Parallel processing, queue management, progress tracking
3/content-refresh"Update [old article] with current data"Identifies outdated stats, refreshes sources, preserves structure

Integration Skills

#SkillTest PromptKey Checks
4/contentforge:integrations(no argument)Shows 9 HTTP connectors, grouped by category, connected vs available
5/contentforge:connect/contentforge:connect notionStep-by-step Notion setup instructions
6/contentforge:add-integration/contentforge:add-integration "I want to connect Airtable"Custom connector setup guide, no crash
7/contentforge:publish/contentforge:publish [content] to webflowCMS publishing with preview

Content Enhancement Skills

#SkillTest PromptKey Checks
8/contentforge:social-adapt"Adapt this article for LinkedIn and Instagram"Platform-specific adaptations per social-platform-specs.json
9/contentforge:translate"Translate to French with cultural adaptation"Preserves voice, citations, SEO
10/contentforge:variants"Generate A/B variants for this headline"Scored variations with rationale
11/contentforge:video-script"Create a YouTube script from this article"Timestamps, B-roll, hooks

Planning & Analysis Skills

#SkillTest PromptKey Checks
12/contentforge:brief"Create brief for 'DevOps automation trends'"Keyword data, competitor analysis, outline
13/contentforge:calendar"Plan content calendar for Q2 2026"Schedule, deadlines, team assignments
14/contentforge:audit"Audit our blog for content decay"Freshness analysis, gap identification
15/contentforge:analytics"Show content quality trends"Score trends, pipeline timing, insights

Brand & Config Skills

#SkillTest PromptKey Checks
16/contentforge:style-guide"Import style guide from [URL]"Extracts voice, terminology, guardrails
17/contentforge:template"Create a case study template"Custom content type beyond built-in 5
18/contentforge:switch-backend/contentforge:switch-backend airtableValidates target, offers migration, updates brand profile
19/contentforge:help(no argument)Shows v3.8.0, 13 agents, 19 skills, 9 connectors, 10-phase pipeline

/contentforge:help Argument Tests:

ArgumentExpected Output
--pipeline10-phase pipeline with timing and quality gates
--skillsAll 19 skills listed with descriptions
--brandBrand profile setup methods
--examplesExample workflows from brief to publish
--troubleshootCommon issues and solutions
--connectorsConnector status (shortcut for /contentforge:integrations)

6. Script Tests

6.1 setup.py

Trigger: Runs automatically on session start (SessionStart hook)

Expected output:

  • Plugin root path printed
  • Scripts directory path printed
  • .mcp.json validated (9 HTTP connectors)
  • Google credentials check (present or not)
  • pip package check (gspread, google-auth)

6.2 connector-status.py

Trigger: Via /contentforge:integrations skill

Expected:

  • Lists all 9 HTTP connectors with status (Notion, Canva, Figma, Webflow, Slack, Gmail, Google Calendar, fal-ai, Replicate)
  • Shows Google Sheets/Drive as "script" transport type
  • Reports which connectors are configured vs available
  • Platform-level integration notes for Google Drive/Docs

6.3 sheets-tracker.py (requires Google credentials)

Test each operation:

OperationTestExpected
initInitialize tracking sheetCreates sheet with correct 20-column headers
add-rowAdd a content requestNew row with auto-incremented requirement_id
get-pendingList pending contentReturns rows with pending status, sorted by priority
get-rowGet specific rowReturns correct row data
update-rowUpdate statusRow updated, no data loss
mark-completeComplete a requestStatus changed, completion date set

Edge cases:

  • requirement_id uses max existing ID (not row count) — handles deleted rows correctly
  • Priority clamped to 1-5 range
  • Crash-safe sort in get_pending()
  • Auto-installs gspread + google-auth on first run in Cowork VM

6.4 drive-uploader.py (requires Google credentials)

Test each operation:

OperationTestExpected
ensure-foldersCreate folder hierarchyBrand/Type/Year/Month/ structure created
uploadUpload a .docx fileFile uploaded to correct folder
listList files in folderReturns file list with metadata
upload-assetsUpload chart imagesAssets uploaded to assets subfolder

Edge cases:

  • Brand names with apostrophes handled safely (client-side matching, no query injection)
  • ~ path resolution works (expanduser)
  • Folder creation is idempotent (doesn't create duplicates)

6.5 pipeline-tracker.py (no credentials needed)

Test each action:

ActionTestExpected
initInitialize pipeline runCreates ~/.claude-marketing/{brand}/pipeline-run.json
phase-startStart phase 1Timestamp recorded for phase 1
phase-endEnd phase 1Duration calculated, content words recorded
get-reportGenerate timing reportJSON with per-phase timing, benchmarks, token estimates

Edge cases:

  • Multiple runs per phase (feedback loops) — total time = sum of all run durations
  • Token estimation includes disclaimer about approximation
  • Benchmark comparison shows correct content type benchmarks
  • get-report without phase-end for current phase shows "in progress"

6.6 airtable-tracker.py (requires AIRTABLE_TOKEN)

Test each action:

ActionTestExpected
initInitialize Airtable tableTable created with 20-column schema
add-rowAdd content requestRecord created with auto-incremented requirement_id
get-pendingList pendingReturns pending records sorted by priority
get-rowGet specific recordReturns correct record data
update-rowUpdate statusRecord updated
mark-completeComplete with fileStatus updated, .docx attached to record

Edge cases:

  • Auto-installs pyairtable on first run
  • File attachment works via --attach-file parameter
  • Numeric fields coerced correctly (priority, quality_score)
  • Missing AIRTABLE_TOKEN gives clear error message

6.7 local-tracker.py (no credentials needed)

Test each action:

ActionTestExpected
initInitialize local trackingCreates ~/.claude-marketing/{brand}/tracking/tracking.json
add-rowAdd content requestRecord appended to tracking.json
get-pendingList pendingReturns pending records sorted by priority
get-rowGet specific recordReturns correct record
update-rowUpdate statusRecord updated in tracking.json
mark-completeComplete with fileStatus updated, .docx copied to organized outputs/ dir

Edge cases:

  • Zero dependencies — stdlib only
  • Output files organized in outputs/{year}/{month}/ structure
  • Works immediately with no setup
  • Handles concurrent access gracefully

6.8 backend-migrator.py

Test each action:

ActionTestExpected
statusCheck current backend statusReports backend, record count, file count
migrate local→airtableMigrate from local to AirtableRecords copied, files attached, source preserved
migrate local→googleMigrate from local to GoogleRecords added to Sheet, files uploaded to Drive

Edge cases:

  • Source data is NEVER deleted (additive migration)
  • Idempotent — running twice doesn't create duplicates
  • Handles missing credentials gracefully
  • Reports migration progress and summary

7. Config & Industry Knowledge Pack Tests

7.1 Validate All 10 Industry Knowledge Packs

For each pack, create a short blog post and verify SME Calibration + Domain Validation:

IndustryFileKey Terminology to Check
technologytechnology.jsonAI vs ML distinction, latency vs throughput, open source definitions
pharmapharma.jsonPhase I/II/III trials, NDA/BLA, p-values, FDA/EMA protocols
bfsibfsi.jsonAPR vs interest rate, fiduciary duty, Basel III, KYC/AML
healthcarehealthcare.jsonICD codes, HIPAA, evidence-based vs experimental
real_estatereal_estate.jsonCap rate, NOI, fair housing, appraisal vs assessment
b2b_saasb2b_saas.jsonARR vs MRR, churn rate, CAC:LTV, net revenue retention
legallegal.jsonPrecedent, statute of limitations, jurisdiction, discovery
ecommerceecommerce.jsonAOV, conversion rate, cart abandonment, fulfillment
consumer_goodsconsumer_goods.jsonFMCG, SKU rationalization, shelf life, brand equity
educationeducation.jsonPedagogy, assessment types, FERPA, accreditation

For each pack verify:

  • SME Calibration Summary references the correct knowledge pack
  • Terminology depth matches the audience level
  • Regulatory awareness matches the industry
  • Evidence standards are applied correctly
  • Domain-specific validation catches industry-specific pitfalls

7.2 Scoring Thresholds

Verify config/scoring-thresholds.json:

  • phase_3_5_visual_assets quality gate present (require_data_chart_verification, min_visual_density, require_alt_text, require_captions)
  • phase_4_to_3_5 feedback loop limit = 1
  • Regulated industries (pharma, bfsi, healthcare, legal) use stricter thresholds
  • Higher minimum citation accuracy for regulated content
  • Max 5 total feedback loops

7.3 Brand Registry Template

Verify config/brand-registry-template.json:

  • industry field accepts all 10 knowledge pack names
  • seo_preferences.internal_linking has sitemap_url, page_registry, pillar_pages
  • seo_preferences.brand_pages has product_or_service_pages, conversion_pages, authority_pages (v3.9.5+ — drives commercial / conversion / authority internal links)
  • google_integration section present (credentials_path, tracking_sheet_id, drive_output_folder_id)
  • output_preferences.brand_colors available for chart styling

7.4 Other Config Files

Config FileTest
social-platform-specs.jsonVerify character limits match current platform specs
multilingual-patterns.jsonVerify language codes and brand voice patterns
content-type-defaults.jsonVerify word count ranges per content type

8. Hook Tests

8.1 SessionStart Hook

Test: Start a new session

Expected:

  • setup.py runs without errors
  • Version banner shows "v3.4"
  • All 6 skill shortcuts listed in banner
  • No Python errors or tracebacks

8.2 PreToolUse — Hallucination Detection (Write/Edit)

Test: Generate content and watch for the hallucination check during Write/Edit operations

Expected behavior:

  • Hook fires on Write/Edit of content deliverables
  • Hook SKIPs for non-content files (plugin config, scripts, etc.)

Test with intentionally bad content — the hook should catch all three:

Bad ContentExpected Detection
"Studies show 87% of companies..." (no source)CRITICAL — unattributed statistic
"Visit https://example.com/dashboard"CRITICAL — placeholder URL
"The #1 leading solution in the market"WARNING — unsubstantiated superlative

Additional checks:

  • Fix suggestions provided for each flag
  • Severity levels correct (CRITICAL for stats/URLs in headlines, WARNING for body text)
  • Doesn't over-flag — legitimate cited statistics pass through

9. MCP Connector Tests

9.1 ContentForge HTTP Connectors (7)

#ConnectorURLTest ActionExpected
1Notionmcp.notion.com/mcpRead a Notion pageContent retrieved via MCP
2Canvamcp.canva.com/mcpGenerate a designDesign created or template listed
3Figmamcp.figma.com/mcpAccess design fileDesign data retrieved
4Webflowmcp.webflow.com/ssePublish draft contentContent appears in Webflow CMS
5Slackmcp.slack.com/mcpSend notificationMessage delivered to channel
6Gmailgmailmcp.googleapis.com/mcp/v1Draft emailEmail draft created
7Google Calendarcalendarmcp.googleapis.com/mcp/v1Create content calendar eventCalendar event created

Note: Each connector requires OAuth authorization on first use. The Claude platform handles this — you'll see an authorization prompt. Not all testers will have accounts for all services.

9.2 Connector Categories

Verify connectors map to the right workflow categories per CONNECTORS.md:

CategoryConnectorWorkflow Impact
Knowledge baseNotionCore requirement storage
DesignCanva, FigmaFeatured images, social graphics
CMSWebflowPublishing destination (/contentforge:publish)
ChatSlackBatch status notifications
EmailGmailDraft delivery, review notifications
CalendarGoogle CalendarContent calendar events (/contentforge:calendar)

9.3 Graceful Degradation

Test: Invoke a skill that uses a connector that's NOT authorized/connected

Expected:

  • Skill doesn't crash
  • Clear message about which connector is needed
  • Instructions on how to connect it (or suggestion to run /contentforge:connect <name>)
  • Fallback behavior (manual data input or skip)

9.4 Platform-Level Integrations

Test: Verify Google Drive/Docs work through Claude platform integration (Settings > Integrations)

Expected:

  • Google Drive documents accessible for brand knowledge
  • /contentforge:integrations notes that platform-level integrations exist separately
  • connector-status.py can't detect platform integrations (expected — mentions this)

10. Google Integration Tests

Prerequisites: Google service account with Sheets API + Drive API enabled

10.1 Initial Setup

  1. Create service account in Google Cloud Console
  2. Download JSON credentials
  3. Place at ~/.claude-marketing/google-credentials.json
  4. Create a Google Sheet and share with service account email
  5. Create a Google Drive folder and share with service account email
  6. Configure brand profile with tracking_sheet_id and drive_output_folder_id

10.2 Sheets Tracking (End-to-End)

Test: Run a full content pipeline and verify tracking

  • Content request added to sheet on pipeline start (add-row)
  • Status updates as pipeline progresses (update-row)
  • Completion marked when finished (mark-complete)
  • requirement_id auto-increments correctly
  • Multiple rows don't collide on IDs after deletions

10.3 Drive Delivery (End-to-End)

Test: Complete a content piece and verify Drive upload

  • Folder hierarchy created: Brand/Blog/2026/03/
  • .docx file uploaded to correct folder
  • Chart assets uploaded to assets subfolder
  • File metadata correct (name, MIME type)

10.4 Without Google Credentials

Test: Run the full pipeline without Google credentials configured

Expected:

  • Pipeline completes normally — no crash
  • Graceful message about Google integration not configured
  • Content saved locally to ~/.claude-marketing/{brand}/
  • No error loops or retries

11. Multi-Backend Tests

11.1 Backend Selection During Brand Setup

Test: Run /brand-setup or /contentforge:style-guide and reach Step G

Expected:

  • Three options presented: Google Sheets + Drive, Airtable, Local
  • Google and Airtable are primary options (presented first)
  • Local available but only if explicitly chosen or skipped
  • Skipping defaults to local with a message about /contentforge:switch-backend
  • Choosing Google triggers service account setup guidance
  • Choosing Airtable triggers token creation guidance
  • Brand profile tracking.backend field is set correctly

11.2 Pipeline with Each Backend

Test: Run a full pipeline with each backend configured

BackendWhat to Verify
Localtracking.json updated, .docx in organized outputs/ directory
AirtableAirtable record created/updated, .docx attached as attachment
GoogleSheet row created/updated, .docx uploaded to Drive folder

11.3 Backend Switching

Test: /contentforge:switch-backend airtable (while currently on local)

Expected:

  • Current backend identified and record count shown
  • Target backend validated (AIRTABLE_TOKEN check)
  • Migration offered (yes/no/skip)
  • If "yes": records migrated, files attached, summary shown
  • Brand profile updated to new backend
  • Source data preserved (never deleted)

11.4 Backend Migration

Test: Migrate from local to Airtable with existing tracking data

Expected:

  • All records copied to Airtable
  • Output files attached to Airtable records
  • Migration is idempotent (re-running doesn't create duplicates)
  • Summary shows records_migrated, files_migrated, files_failed counts
  • Source local data remains intact at ~/.claude-marketing/{brand}/tracking/

11.5 Pipeline Performance Tracking

Test: Run a full pipeline and check the completion summary

Expected:

  • PIPELINE PERFORMANCE section has actual times (not placeholder estimates)
  • Per-phase timing table with: Phase, Name, Time, Benchmark, Status, Iterations
  • Phases with feedback loops show correct iteration count
  • Total processing time matches sum of phase times
  • Benchmark comparison shows under/over with correct content type
  • TOKEN USAGE ESTIMATE section present with: Agent Instructions, Content, Config, Subtotal, Overhead, Total
  • Disclaimer: "Token estimates are approximate. For precise session costs, use /cost."

11.6 setup.py Backend Detection

Test: Start a session with different env vars set

Expected:

  • With AIRTABLE_TOKEN set: shows "AIRTABLE_TOKEN=configured"
  • Without AIRTABLE_TOKEN: shows "AIRTABLE_TOKEN=not_configured"
  • TRACKING_BACKENDS lists all available backends (always includes "local")
  • Google credentials check still works as before

12. Edge Cases & Error Scenarios

11.1 Empty/Minimal Input

TestExpected
/contentforge (no topic)Asks for topic, doesn't crash
/brand-setup (no name)Asks for brand name
/contentforge:translate (no language)Asks for target language
/contentforge:connect (no service name)Shows available connectors
/contentforge:help (no argument)Shows full help overview

11.2 Very Long Content

TestExpected
Whitepaper 5000+ wordsPipeline completes, all phases handle length
Topic with 50+ research sourcesPhase 2 handles gracefully, may take longer
Very long brand name (100+ chars)Paths handled correctly, no truncation issues

11.3 Unsupported Industry

TestExpected
Brand with industry "aerospace"Falls back to general defaults (no knowledge pack), warns user
Brand with industry "" (empty)Uses defaults, no crash
Brand with misspelled industry "tecnology"Should suggest "technology" or fall back

11.4 Special Characters in Brand Names

TestExpected
Apostrophe: "O'Reilly Media"Drive uploader uses client-side matching (no query injection)
Spaces: "Test Brand Alpha"Paths handled correctly
Unicode: "Cafe Express"No encoding errors
Ampersand: "Johnson & Johnson"No URL encoding issues in Drive/Sheets

11.5 Network Failures

TestExpected
Run pipeline without internetResearch phase handles gracefully, may ask for manual input
MCP connector timeoutSkill shows error, doesn't crash pipeline
Google API quota exceededScript shows quota error, content saved locally

11.6 Concurrent Operations

TestExpected
/batch-process with 5 piecesQueue managed, no interleaving of outputs
Start pipeline while another is runningHandled gracefully (queued or error message)

11.7 Pipeline Interruption

TestExpected
Cancel mid-pipeline (Ctrl+C or close session)No corrupt data, can restart
Reconnect after interruptionFresh start, no stale state

13. Regression Checklist

Run this after any changes to verify nothing is broken.

Core Pipeline

  • Session start hook fires with correct version (v3.5)
  • Brand setup completes all steps A-G (including Step G backend selection)
  • Full pipeline runs for blog content type
  • Full pipeline runs for whitepaper content type
  • Phase 3 loads industry knowledge pack (SME Calibration)
  • Phase 3.5 generates visual annotations
  • Phase 4 runs domain-specific validation
  • Phase 6 produces internal link markers
  • Phase 8 embeds charts and link markers in output
  • Hallucination hook catches bad content
  • Hallucination hook skips non-content files

Skills & Commands

  • All 19 skills respond to invocation
  • All 7 commands appear in Customize panel
  • /contentforge:help shows complete, accurate information
  • /contentforge:integrations shows 9 HTTP connectors with correct status
  • Argument hints show in Skills UI when typing /contentforge: (spot check 3-5 skills)
  • /contentforge:publish cannot be triggered by Claude without explicit user invocation (disable-model-invocation)
  • /contentforge:help has name: cf-help in frontmatter
  • skills/contentforge/evals/evals.json exists and is valid JSON with 3 test cases
  • skills/cf-brief/evals/evals.json exists and is valid JSON with 2 test cases
  • skills/cf-style-guide/evals/evals.json exists and is valid JSON with 2 test cases

Scripts

  • setup.py runs on session start without errors
  • connector-status.py lists 9 HTTP + script connectors
  • sheets-tracker.py operations work (if Google credentials configured)
  • drive-uploader.py operations work (if Google credentials configured)
  • pipeline-tracker.py init/phase-start/phase-end/get-report all produce valid JSON
  • airtable-tracker.py operations work (if AIRTABLE_TOKEN configured)
  • local-tracker.py operations work (zero deps, no credentials)
  • backend-migrator.py status and migrate actions work

Versioning Consistency

  • plugin.json version = 3.8.0
  • hooks.json version string = v3.8
  • README.md version = 3.8.0
  • Marketplace entry version = 3.8.0
  • 13 agents in all descriptions (not 12)
  • 19 skills in all descriptions (not 18)
  • 7 commands in all descriptions
  • 9 HTTP connectors in all descriptions (not 7)
  • 8 scripts in all descriptions (not 4)
  • 10 industry knowledge packs mentioned
  • 10-phase pipeline everywhere (not 9-phase)
  • Brand setup mentions Step G (backend selection)

14. Test Priority Order

If time is limited, test in this order:

PriorityTestSectionWhy
1Installation2Nothing else works without this
2Full pipeline — blog/technology3.1Validates core product
3Brand setup (all steps A-G)4.1Validates v3.8.0 Steps F + G
4Pipeline — pharma whitepaper3.2Validates industry knowledge packs
5All 19 skills invocation5Validates skill registration
6/contentforge:help with all arguments5 (#19)Validates help accuracy
7Hook tests8Validates compliance guardrails
8Google integration10Validates Sheets/Drive scripts
9Multi-backend I/O11Validates Airtable/local/migration
10Edge cases12Robustness testing
11MCP connectors9Requires external service accounts