LLM Benchmark Costco 🛒

March 11, 2026 · View on GitHub

A curated, searchable database of 378 LLM evaluation benchmarks across 10 capability dimensions — with inline PDF reading, Mermaid build flowcharts, bilingual UI, dark mode, neon glow effects, and automated CI/CD.

🌐 Live Demo · 📊 Browse Benchmarks · 🤝 Contribute

LLM Benchmark Costco Demo

Why LLM Benchmark Costco?

Feature	Costco	PapersWithCode	HuggingFace Datasets	arXiv Search
Curated LLM benchmarks only	✅	❌ (all ML)	❌ (all datasets)	❌
Inline PDF reading	✅	❌	❌	❌
Build process flowcharts	✅	❌	❌	❌
Multi-dim filtering (year/difficulty/openness)	✅	Partial	Partial	❌
Bilingual (EN/ZH)	✅	❌	❌	❌
Related benchmarks & family lineage	✅	❌	❌	❌
Dark mode with neon glow effects	✅	❌	❌	❌
Automated CI/CD deployment	✅	❌	❌	❌

Features

378 Benchmarks across 10 capability dimensions — Agent Capability (71), General Language (39), Multimodal (72), Code (40), Science & Reasoning (18), Safety & Alignment (24), Medical & Health (58), and more.
Neon Glow & Shimmer Effects — Interactive neon glow effect on card hover and a subtle shimmer animation on the logo in dark mode.
Inline PDF Reading — Click any card to open the details drawer and read the full paper without leaving the page. Most entries embed the original arXiv PDF directly.
Build Process Flowcharts — Over 200 benchmarks include Mermaid-rendered diagrams explaining exactly how the dataset was constructed. Now with fullscreen mode for complex flowcharts.
Powerful Filtering — Filter by L1 capability category, year (including 2025/2026 latest), difficulty level (Basic → Frontier), and data openness (Public / Partly / In-house).
Family & Lineage — Explore benchmark families (e.g., MMLU, GAIA, SWE-bench) and related benchmarks to understand the evaluation landscape.
Bilingual UI — Full English and Chinese interface with bilingual data fields.
Automated CI/CD — GitHub Actions automatically validate and deploy updates to GitHub Pages when benchmarks.json is changed.

Quick Start

# Install dependencies
pnpm install

# Local development
pnpm dev

# Build for GitHub Pages
pnpm build:ghpages

This project uses GitHub Actions for automated deployment. Any push to the main branch that includes changes to client/public/benchmarks.json will trigger a new build and deployment to the gh-pages branch.

A daily cron job also runs to sync any external changes to benchmarks.json.

Manual Deployment

If you need to deploy manually:

Fork or clone this repository
Go to Settings → Pages
Set Source to Deploy from a branch → gh-pages
Run pnpm build:ghpages && npx gh-pages -d dist-ghpages to deploy

Access at: https://<username>.github.io/llm-benchmark-costco/

Sub-path configuration: If deploying under a sub-path, set base: '/your-repo-name/ ' in vite.ghpages.config.ts.

Updating Benchmark Data

The data lives in client/public/benchmarks.json. Before updating, read CONTRIBUTING.md for the complete workflow covering data schema, validation, and CI process.

Tech Stack

Layer	Technology
Frontend	React 19 + TypeScript
Styling	Tailwind CSS 4
Build	Vite 7
Routing	Wouter
CI/CD	GitHub Actions
Icons	Lucide React
Diagrams	Mermaid
Deployment	GitHub Pages

Project Structure

llm-benchmark-costco/
├── .github/workflows/              # GitHub Actions CI/CD
│   ├── ci.yml                      # PR validation
│   ├── deploy.yml                  # Deploy on data change
│   └── sync-and-deploy.yml         # Daily sync
├── client/
│   ├── public/
│   │   └── benchmarks.json          # 378 benchmark entries
│   └── src/
│       ├── components/
│       │   ├── BenchmarkCard.tsx     # Card component with neon glow
│       │   ├── BenchmarkDrawer.tsx   # Detail drawer + PDF + flowchart
│       │   ├── FilterBar.tsx         # Filter controls
│       │   └── Navbar.tsx            # Top navigation with logo shimmer
│       ├── contexts/
│       │   └── LangContext.tsx       # i18n (EN/ZH)
│       ├── hooks/
│       │   └── useBenchmarks.ts      # Data loading & filtering
│       └── types/
│           └── benchmark.ts          # TypeScript types
├── scripts/
│   └── validate_benchmarks.py      # Data validation script
├── vite.ghpages.config.ts            # GitHub Pages build config
└── README.md

Contributing

We welcome contributions! The easiest way to contribute is to submit a new benchmark via GitHub Issues using the Submit New Benchmark template — no coding required.

For code contributions, please read CONTRIBUTING.md.

Contributors

License

MIT