LLM Benchmark Costco ๐
March 11, 2026 ยท View on GitHub
A curated, searchable database of 378 LLM evaluation benchmarks across 10 capability dimensions โ with inline PDF reading, Mermaid build flowcharts, bilingual UI, dark mode, neon glow effects, and automated CI/CD.
๐ Live Demo ยท ๐ Browse Benchmarks ยท ๐ค Contribute

Why LLM Benchmark Costco?
| Feature | Costco | PapersWithCode | HuggingFace Datasets | arXiv Search |
|---|---|---|---|---|
| Curated LLM benchmarks only | โ | โ (all ML) | โ (all datasets) | โ |
| Inline PDF reading | โ | โ | โ | โ |
| Build process flowcharts | โ | โ | โ | โ |
| Multi-dim filtering (year/difficulty/openness) | โ | Partial | Partial | โ |
| Bilingual (EN/ZH) | โ | โ | โ | โ |
| Related benchmarks & family lineage | โ | โ | โ | โ |
| Dark mode with neon glow effects | โ | โ | โ | โ |
| Automated CI/CD deployment | โ | โ | โ | โ |
Features
- 378 Benchmarks across 10 capability dimensions โ Agent Capability (71), General Language (39), Multimodal (72), Code (40), Science & Reasoning (18), Safety & Alignment (24), Medical & Health (58), and more.
- Neon Glow & Shimmer Effects โ Interactive neon glow effect on card hover and a subtle shimmer animation on the logo in dark mode.
- Inline PDF Reading โ Click any card to open the details drawer and read the full paper without leaving the page. Most entries embed the original arXiv PDF directly.
- Build Process Flowcharts โ Over 200 benchmarks include Mermaid-rendered diagrams explaining exactly how the dataset was constructed. Now with fullscreen mode for complex flowcharts.
- Powerful Filtering โ Filter by L1 capability category, year (including 2025/2026 latest), difficulty level (Basic โ Frontier), and data openness (Public / Partly / In-house).
- Family & Lineage โ Explore benchmark families (e.g., MMLU, GAIA, SWE-bench) and related benchmarks to understand the evaluation landscape.
- Bilingual UI โ Full English and Chinese interface with bilingual data fields.
- Automated CI/CD โ GitHub Actions automatically validate and deploy updates to GitHub Pages when
benchmarks.jsonis changed.
Quick Start
# Install dependencies
pnpm install
# Local development
pnpm dev
# Build for GitHub Pages
pnpm build:ghpages
Deployment
This project uses GitHub Actions for automated deployment. Any push to the main branch that includes changes to client/public/benchmarks.json will trigger a new build and deployment to the gh-pages branch.
A daily cron job also runs to sync any external changes to benchmarks.json.
Manual Deployment
If you need to deploy manually:
- Fork or clone this repository
- Go to Settings โ Pages
- Set Source to Deploy from a branch โ
gh-pages - Run
pnpm build:ghpages && npx gh-pages -d dist-ghpagesto deploy
Access at: https://<username>.github.io/llm-benchmark-costco/
Sub-path configuration: If deploying under a sub-path, set
base: '/your-repo-name/ 'invite.ghpages.config.ts.
Updating Benchmark Data
The data lives in client/public/benchmarks.json. Before updating, read CONTRIBUTING.md for the complete workflow covering data schema, validation, and CI process.
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 19 + TypeScript |
| Styling | Tailwind CSS 4 |
| Build | Vite 7 |
| Routing | Wouter |
| CI/CD | GitHub Actions |
| Icons | Lucide React |
| Diagrams | Mermaid |
| Deployment | GitHub Pages |
Project Structure
llm-benchmark-costco/
โโโ .github/workflows/ # GitHub Actions CI/CD
โ โโโ ci.yml # PR validation
โ โโโ deploy.yml # Deploy on data change
โ โโโ sync-and-deploy.yml # Daily sync
โโโ client/
โ โโโ public/
โ โ โโโ benchmarks.json # 378 benchmark entries
โ โโโ src/
โ โโโ components/
โ โ โโโ BenchmarkCard.tsx # Card component with neon glow
โ โ โโโ BenchmarkDrawer.tsx # Detail drawer + PDF + flowchart
โ โ โโโ FilterBar.tsx # Filter controls
โ โ โโโ Navbar.tsx # Top navigation with logo shimmer
โ โโโ contexts/
โ โ โโโ LangContext.tsx # i18n (EN/ZH)
โ โโโ hooks/
โ โ โโโ useBenchmarks.ts # Data loading & filtering
โ โโโ types/
โ โโโ benchmark.ts # TypeScript types
โโโ scripts/
โ โโโ validate_benchmarks.py # Data validation script
โโโ vite.ghpages.config.ts # GitHub Pages build config
โโโ README.md
Contributing
We welcome contributions! The easiest way to contribute is to submit a new benchmark via GitHub Issues using the Submit New Benchmark template โ no coding required.
For code contributions, please read CONTRIBUTING.md.
Contributors
License
MIT