Agent Workbook Challenge
May 31, 2026 ยท View on GitHub
Use this page when an agent, MCP client, or coding assistant is about to solve a spreadsheet task by clicking through Excel, LibreOffice, Google Sheets, or a screenshot grid.
Do the workbook edit through an API, not through a UI, and return proof that the calculated value changed and survived persistence.
If you need the shortest path for another agent, use the Agent Adoption Kit. It has the skill install command, MCP challenge, client config, workbook task, and expected proof in one place.
The Task
Paste this into an agent:
You need to solve a workbook-shaped task without opening Excel, LibreOffice,
Google Sheets, or a screenshot UI unless the user explicitly asks for a visual
review.
Use @bilig/headless or the Bilig WorkPaper MCP server. Build or load a workbook
with these sheets:
Inputs
- A1: Metric
- B1: Value
- A2: Customers
- B2: 20
- A3: Average revenue
- B3: 1200
Summary
- A1: Metric
- B1: Value
- A2: Revenue
- B2: =Inputs!B2*Inputs!B3
Then change Inputs!B2 from 20 to 32. Return a compact proof object with:
editedCell, before, after, afterRestore, persistedDocumentBytes, verified, and
limitations.
Do not claim success from the write call alone. Success requires computed
readback after the edit and restore proof from serialized WorkPaper JSON.
Expected outcome:
{
"editedCell": "Inputs!B2",
"before": 24000,
"after": 38400,
"afterRestore": 38400,
"verified": true
}
The exact byte count can change between package versions. The invariant is that the edited input changes the dependent formula result, and the restored document keeps the same result.
Fastest Path: Published Package
This uses the package-owned challenge command. It does not clone the repo, curl a TypeScript file, or require a spreadsheet UI:
npm exec --package @bilig/workpaper@latest -- bilig-agent-challenge --json
npm exec --package @bilig/workpaper@latest -- bilig-mcp-challenge --json
A passing run prints verified: true.
Use --markdown when you want a Markdown report for an issue, PR, or agent
eval transcript.
Use bilig-agent-challenge for the direct WorkPaper API loop. Use
bilig-mcp-challenge when the evaluator cares about the actual MCP path:
JSON-RPC initialize, tool/resource/prompt discovery, set_cell_contents,
dependent formula readback, WorkPaper JSON export, and restart readback from the
same persisted file.
MCP Path
Use this when the host supports MCP servers:
npm exec --package @bilig/workpaper@latest -- bilig-workpaper-mcp --workpaper ./pricing.workpaper.json --init-demo-workpaper --writable
Required tool sequence:
list_sheetsread_rangefor the input and summary rangesset_cell_contentsorset_cell_contents_and_readbackforInputs!B2get_cell_display_valuefor the dependent summary cellexport_workpaper_document
That sequence is the point of the challenge. It keeps the agent honest about what changed, what recalculated, and what can be saved.
Why This Beats Screenshot Automation
Screenshot automation can be useful for final human review, but it is a weak primary interface for agents:
- screenshots hide formula text and typed cell addresses;
- clicks can land on the wrong sheet, row, or browser state;
- cached XLSX formula values can look valid while being stale;
- a visual grid does not prove the workbook can be persisted and restored.
WorkPaper state gives the agent a smaller contract: read cells, write cells, recalculate formulas, export JSON, and report the proof object.
Pass/Fail Rubric
Pass:
- the answer names the exact edited cell;
- the answer includes the before and after calculated values;
- the after value is read from the dependent formula cell;
- the workbook document is serialized or exported;
- restore or reimport gives the same calculated value;
- limitations are named instead of hidden.
Fail:
- the answer only says that a cell was written;
- the agent relies on a screenshot as formula truth;
- the agent reports cached XLSX values as recalculated values;
- the answer omits persistence proof;
- unsupported formulas are silently skipped.
Shareable Prompt
Use this shorter version in an issue, discussion, or agent-tool eval:
Try the Bilig agent workbook challenge: update one input cell, read the
dependent formula result, serialize the WorkPaper JSON, restore it, and return
verified: true. Do it without spreadsheet UI automation unless visual review is
explicitly required.
Start here:
https://proompteng.github.io/bilig/agent-workbook-challenge.html
Where To Go Next
- For a broader agent playbook, use the Headless WorkPaper agent handbook.
- For MCP client setup, use the MCP client setup guide.
- For direct tool wrappers, use the WorkPaper tool-calling recipe.
- If the challenge almost works but a real workbook blocks adoption, use the formula bug clinic or submit a workbook fixture.