ΩmegaWiki

May 11, 2026 · View on GitHub

ΩmegaWiki

Karpathy's LLM-Wiki Vision, Fully Realized

Your AI Research Platform — From Papers to Publications, Powered by Claude Code

From paper ingestion to publication — your research knowledge compounds, never decays.

Team

ΩmegaWiki is built by DAIR Lab at Peking University — a fully agentic platform that automates the complete research pipeline, from knowledge ingestion to paper submission.

Weitong Qian _PKU _{Undergraduate · 2023}	Beicheng Xu _PKU _{Ph.D. · 2023}	Zhongao Xie _PKU _{Undergraduate · 2025}	Bowen Fan _PKU _{Undergraduate · 2024}
Guozheng Tang _PKU _{Undergraduate · 2024}	Xinzhe Wu _PKU _{Undergraduate · 2024}	Jiale Chen _PKU _{Undergraduate · 2024}	Mingtian Yang _PKU _{Undergraduate · 2024}

Run /daily-arxiv for a one-off pass, or /daily-arxiv setup to schedule the same pipeline in GitHub Actions. The skill builds an evidence packet from arXiv + Semantic Scholar + DeepXiv, lets the LLM rank candidates against your wiki interests, and delivers a digest by e-mail. Explicit --mode auto-ingest calls /ingest for high-confidence picks; inform mode just notifies.

🌐 2026-05-06 · Knowledge Graph Visualization — browser + Obsidian

Your research graph now has two ways to explore:

Web UI — run python3 tools/serve.py, open http://localhost:8765/#/graph. Click any node to highlight its neighborhood via BFS, filter by entity type or edge category, double-click to open the full page in the Reader.
Obsidian — run /visualize --obsidian to generate a color-coded graph config, or /visualize --canvas to produce a force-layout Canvas with labeled semantic edges.

🔬 2026-05-06 · Methods — Reusable Techniques are Now First-Class

Architectures, training recipes, evaluation protocols, and other reusable techniques now live in wiki/methods/ as proper wiki entities — with their own pages, source-paper links, and parent/child method chains.

What is ΩmegaWiki?

Andrej Karpathy proposed LLM-Wiki: an LLM that builds and maintains a persistent, structured wiki from your sources — not a throwaway RAG answer, but compounding knowledge that grows smarter with every paper you feed it.

ΩmegaWiki takes that idea and runs the full distance. It's not just a wiki builder — it's a complete research lifecycle platform: from paper ingestion → knowledge graph → gap detection → idea generation → experiment design → paper writing → peer review response. All driven by 24 Claude Code skills, all centered on one wiki as the single source of truth.

Drop your .tex / .pdf files in a folder. Run one command. Get a fully cross-referenced knowledge base — and then use it to generate novel research ideas, design experiments, write papers, and respond to reviewers.

Why Wiki-Centric, Not RAG?

	RAG	ΩmegaWiki
Knowledge persistence	Rediscovered on every query	Compiled once, maintained forever
Structure	Flat chunk store	9 typed entities with relationships
Cross-references	None — chunks are isolated	Bidirectional wikilinks + typed graph
Knowledge gaps	Invisible	Explicitly tracked, drive research
Failed experiments	Lost	First-class anti-repetition memory
Output	Chat answers	Papers, surveys, experiment plans, rebuttals
Compounding	No — same cost every query	Yes — each paper enriches the whole graph

Architecture

Every skill reads from and writes back to the wiki. Knowledge compounds — each new paper enriches the whole graph. Failed experiments aren't discarded; they become anti-repetition memory that prevents re-exploring dead ends.

Quick Start

Prerequisites: Python 3.9+, Node.js 18+

# 1. Clone
git clone https://github.com/skyllwt/OmegaWiki.git
cd OmegaWiki

# 2. Install Claude Code
npm install -g @anthropic-ai/claude-code
claude login

# 3. One-click setup
chmod +x setup.sh && ./setup.sh        # Linux / macOS
# Windows (PowerShell):
#   powershell -ExecutionPolicy Bypass -File .\setup.ps1
# setup creates .venv for OmegaWiki
# the script does not keep your shell activated, but /init will use .venv automatically

# 4. Put your own papers in raw/papers/ (.tex or .pdf)
#    Optional: add intent notes to raw/notes/ and saved pages to raw/web/
#    /init and direct local /ingest will manage generated inputs under raw/discovered/ and raw/tmp/

# 5. Build your wiki
claude
# Then type: /init [your-research-topic]

Manual setup (Linux / macOS)

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env                 # Edit to add API keys
cp config/settings.local.json.example .claude/settings.local.json

Manual setup (Windows / PowerShell)

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
Copy-Item .env.example .env          # Edit to add API keys
Copy-Item config\settings.local.json.example .claude\settings.local.json

Note: native Windows is supported for the local pipeline. Remote-GPU experiments via /exp-run --env remote rely on ssh/rsync/screen and are best run from WSL2 or Linux/macOS.

API Keys

Key	Required?	How to get	What it enables
`ANTHROPIC_API_KEY`	Yes	`claude login` (automatic)	Powers all Claude Code skills
`CLAUDE_CODE_OAUTH_TOKEN`	Optional	`claude setup-token`	GitHub Actions Claude Code auth for Pro/Max users
`SEMANTIC_SCHOLAR_API_KEY`	Optional	semanticscholar.org/product/api (free)	Citation graph, paper search
`DEEPXIV_TOKEN`	Optional	`setup.sh` auto-registers	Semantic search, TLDR, trending
`LLM_API_KEY` + `LLM_BASE_URL` + `LLM_MODEL`	Optional	Any OpenAI-compatible API	Cross-model review; `/daily-arxiv` inform recommendations

Cross-model review: ΩmegaWiki uses a second LLM as an independent reviewer for ideas, experiments, and paper drafts. Works with any OpenAI-compatible API — DeepSeek, OpenAI, Qwen, OpenRouter, SiliconFlow, etc. If not configured, skills still work in Claude-only mode.

Daily arXiv Recommendations

/daily-arxiv runs a one-off fresh-paper recommendation pass even before automation is configured. To schedule the same pipeline in GitHub Actions, copy config/daily-arxiv.yml.example to config/daily-arxiv.yml, then run /daily-arxiv setup. The config stores non-secret preferences such as mode, categories, caps, and schedule; SMTP/API credentials stay in .env or GitHub Actions secrets. In CI inform mode, recommendations can use Claude Code auth (ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN) or the OpenAI-compatible LLM_* review model; auto-ingest still requires Claude Code.

See docs/daily-arxiv-deployment.md for the GitHub Actions setup checklist and symptom-keyed troubleshooting.

Sample digest

A real /daily-arxiv run: ranked recommendations with scores, rationales, wiki connections, and an auto-ingest section.

Skills

24 slash commands spanning the full research lifecycle:

Phase 0: Setup

Command	What it does
`/setup`	First-time configuration (API keys, language, dependencies)
`/reset <scope>`	Destructive cleanup: `wiki \| raw \| log \| checkpoints \| all`

Phase 1: Knowledge Foundation

Command	What it does
`/prefill <domain>`	Optionally seed `foundations/` with background knowledge
`/init [topic]`	Bootstrap a full wiki from user raw sources plus optional discovery
`/ingest <source>`	Parse a paper → wiki pages + cross-references
`/discover`	Recommend ranked next-read papers from anchors, a topic, or the current wiki
`/edit <request>`	Add/remove sources or update wiki content
`/ask <question>`	Query the wiki, crystallize answers back
`/check`	Health scan: broken links, missing cross-refs, consistency

Phase 2: Research Pipeline

Command	What it does
`/daily-arxiv`	Run/manage a daily arXiv recommendation feed (+ optional GitHub Actions scheduler)
`/ideate`	Multi-phase idea generation from cross-topic connections
`/novelty <idea>`	Multi-source novelty verification (web + S2 + wiki + review LLM)
`/review <artifact>`	Cross-model adversarial review for any research artifact
`/exp-design <idea>`	Idea-driven experiment + ablation design
`/exp-run <experiment>`	Implement + deploy + monitor (local or remote GPU)
`/exp-status`	Dashboard for running experiments; auto-collect results
`/exp-eval <experiment>`	Verdict gate → auto-update the linked idea + graph
`/refine <artifact>`	Multi-round: produce → review → fix → re-review

Phase 3: Writing & Submission

Command	What it does
`/survey`	Generate Related Work from wiki knowledge
`/paper-plan <ideas>`	Outline from validated-idea graph + evidence matrix
`/paper-draft <plan>`	Draft LaTeX + figures, section by section
`/paper-compile <dir>`	Compile → PDF, auto-fix, verify page/anonymity
`/research <direction>`	End-to-end orchestrator with human gates
`/rebuttal <reviews>`	Parse reviewer comments → draft point-by-point responses

Wiki Structure

9 Entity Types

Type	Directory	Purpose
Paper	`papers/`	Structured summary: problem/key idea/method/experiment+results/limitations + tldr/contribution_type/datasets
Concept	`concepts/`	Cross-paper technical concept with variants, comparisons, definition, linked ideas
Topic	`topics/`	Research direction map with SOTA tracker, key benchmarks, and open problems (split into known + methodological gaps)
Person	`people/`	Researcher profile with research areas, recent work, and a researcher/team/organization type
Idea	`ideas/`	Research idea with lifecycle, novelty argument & score, target venue
Experiment	`experiments/`	Full record: hypothesis → setup → results → updates to the linked idea
Method	`methods/`	Reusable, citable technique entity (cross-paper); links to source papers and parent/child methods
Summary	`Summary/`	Domain-wide survey across topics
Foundation	`foundations/`	Background knowledge (terminal: receives inward links, writes none)

Knowledge Graph

Semantic relationships are stored in graph/edges.jsonl; bibliographic paper citations are stored separately in graph/citations.jsonl.

Paper-paper semantic edges include same_problem_as, similar_method_to, complementary_to, builds_on, compares_against, improves_on, challenges, and surveys. Paper-concept edges use introduces_concept, uses_concept, extends_concept, and critiques_concept. Workflow edges (supports, contradicts, tested_by, invalidates, addresses_gap, inspired_by, derived_from) span experiments, ideas, methods, and concepts.

All pages use Obsidian [[wikilink]] format — open wiki/ in Obsidian for visual graph exploration.

Automation

GitHub Actions runs the /daily-arxiv recommendation pipeline at UTC 00:17 daily (08:17 Beijing time):

Add SMTP secrets to repo Settings → Secrets when e-mail delivery is enabled: SMTP_HOST, SMTP_PORT, SMTP_USER, SMTP_PASSWORD, SMTP_FROM, DAILY_ARXIV_EMAIL_TO
Optional inform-mode LLM recommendation: add ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN for Claude Code, or LLM_API_KEY, LLM_BASE_URL, and LLM_MODEL for any OpenAI-compatible provider
.github/workflows/daily-arxiv.yml fetches arXiv, deduplicates against the wiki, builds a recommendation context, uploads artifacts, and sends the digest by SMTP

auto-ingest mode is explicit and requires Claude Code in CI, because plain API LLMs cannot invoke slash skills such as /ingest. Use manual dispatch with send_email=false for a dry run without SMTP secrets.

Project Structure

OmegaWiki/
├── CLAUDE.md                    # Runtime schema & rules
├── wiki/                        # Knowledge base (LLM-maintained)
│   ├── papers/                  #   Structured paper summaries
│   ├── concepts/                #   Cross-paper technical concepts
│   ├── topics/                  #   Research direction maps
│   ├── people/                  #   Researcher profiles
│   ├── ideas/                   #   Research ideas (with lifecycle)
│   ├── experiments/             #   Experiment records
│   ├── methods/                 #   Reusable cross-paper method entities
│   ├── Summary/                 #   Domain-wide surveys
│   ├── foundations/             #   Background knowledge (terminal pages)
│   ├── outputs/                 #   Generated artifacts
│   ├── graph/                   #   Auto-generated: edges, context, gaps
│   ├── index.md                 #   Content catalog
│   └── log.md                   #   Chronological log
├── raw/                         # Source materials
│   ├── papers/                  #   User-owned .tex / .pdf files
│   ├── discovered/              #   external papers from /init and explicit /daily-arxiv auto-ingest
│   ├── tmp/                     #   generated prepared local sidecars for /init and direct local /ingest
│   ├── notes/                   #   User-owned .md notes
│   └── web/                     #   User-owned HTML / Markdown
├── tools/                       # Deterministic Python helpers
│   ├── research_wiki.py         #   Wiki engine (20 CLI commands)
│   ├── init_discovery.py        #   /init prepare + plan + fetch helper
│   ├── discover.py              #   /discover candidate gathering, dedup, ranking
│   ├── lint.py                  #   Structural validation (10 checks)
│   ├── reset_wiki.py            #   Scoped destructive cleanup helper
│   ├── fetch_arxiv.py           #   arXiv RSS fetcher
│   ├── fetch_s2.py              #   Semantic Scholar API
│   ├── fetch_deepxiv.py         #   DeepXiv semantic search
│   ├── fetch_wikipedia.py       #   Wikipedia fetcher (used by /prefill)
│   └── remote.py                #   SSH ops for remote experiments
├── .claude/skills/              # 24 Claude Code skill definitions
├── i18n/                        # Bilingual: en/ (canonical) + zh/
├── config/                      # Configuration templates
├── mcp-servers/                 # Cross-model review server
└── .github/workflows/           # Daily arXiv cron

Bilingual Support

ΩmegaWiki ships in English and Chinese:

./setup.sh --lang en   # English (default)
./setup.sh --lang zh   # 中文

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

LLM API Configuration / 大模型 API 配置

ΩmegaWiki runs on Claude Code, which speaks the Anthropic API protocol. You can use Claude directly, or route Claude Code to any third-party provider that exposes an Anthropic-compatible endpoint by overriding a few environment variables.

ΩmegaWiki 基于 Claude Code,Claude Code 使用 Anthropic API 协议通信。你既可以直接使用 Claude,也可以通过覆盖几个环境变量,把 Claude Code 指向任意支持 Anthropic 协议的第三方供应商。

Option A — Native Claude / 原生 Claude

claude login   # OAuth, no manual config / OAuth 登录,无需手动配置

Option B — Third-party Anthropic-compatible API / 第三方 Anthropic 兼容 API

Pick a provider below, paste the snippet into ~/.claude/settings.json (or the project's .claude/settings.json), and replace the <...> placeholder with your own API key. Model names and extra options are taken from each provider's official Claude Code docs — if anything stops working (e.g. a model is renamed), check the provider's website.

从下方任选一个供应商,把对应配置粘贴到 ~/.claude/settings.json(或项目的 .claude/settings.json),并把 <...> 占位符替换为你自己的 API key。模型名与额外选项均来自各供应商官方 Claude Code 文档;若出现问题(例如模型改名),请查询对应官网。

MiMo (小米)

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://api.xiaomimimo.com/anthropic",
    "ANTHROPIC_AUTH_TOKEN": "<your-mimo-key>",
    "ANTHROPIC_MODEL": "mimo-v2.5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "mimo-v2.5",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "mimo-v2.5-pro",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "mimo-v2.5"
  }
}

DeepSeek

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://api.deepseek.com/anthropic",
    "ANTHROPIC_AUTH_TOKEN": "<your-deepseek-key>",
    "ANTHROPIC_MODEL": "deepseek-v4-pro[1m]",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "deepseek-v4-pro[1m]",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "deepseek-v4-pro[1m]",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "deepseek-v4-flash",
    "CLAUDE_CODE_SUBAGENT_MODEL": "deepseek-v4-flash",
    "CLAUDE_CODE_EFFORT_LEVEL": "max"
  }
}

Kimi (Moonshot)

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://api.moonshot.ai/anthropic",
    "ANTHROPIC_AUTH_TOKEN": "<your-moonshot-key>",
    "ANTHROPIC_MODEL": "kimi-k2.5",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "kimi-k2.5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "kimi-k2.5",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "kimi-k2.5",
    "CLAUDE_CODE_SUBAGENT_MODEL": "kimi-k2.5",
    "ENABLE_TOOL_SEARCH": "false"
  }
}

GLM (Z.AI)

{
  "env": {
    "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
    "ANTHROPIC_AUTH_TOKEN": "<your-zai-key>",
    "API_TIMEOUT_MS": "3000000"
  }
}

Z.AI applies a default server-side model mapping, so no explicit ANTHROPIC_MODEL is needed. Z.AI 默认在服务端做模型映射,无需显式设置 ANTHROPIC_MODEL。

Skip the Claude Code onboarding / 跳过 Claude Code 初始引导

When using a third-party key (instead of claude login), Claude Code's first-run onboarding won't complete automatically. Create or edit .claude.json and mark it done:

使用第三方 key 时不会走 claude login,Claude Code 首次启动的引导不会自动完成。创建或编辑 .claude.json,手动标记引导已完成:

macOS / Linux: ~/.claude.json
Windows: <user-home>\.claude.json

{
  "hasCompletedOnboarding": true
}

Then run claude as usual. / 保存后正常运行 claude 即可。

Community / 交流群

Scan to join the ΩmegaWiki WeChat group / 扫码加入微信交流群

Acknowledgments

Andrej Karpathy — for the LLM-Wiki concept that inspired this project
Claude Code — the AI agent runtime that powers ΩmegaWiki

Star History

License

MIT — use it, fork it, build on it.

中文

ΩmegaWiki 是什么？

Andrej Karpathy 提出了 LLM-Wiki 概念：让 LLM 构建并维护一个持久的、结构化的 wiki，而不是一次性的 RAG 回答。知识持续积累，每一篇新论文都让整个知识图谱更强。

ΩmegaWiki 将这个理念完整实现。 它不仅是 wiki 构建器，更是完整的研究全流程平台：从论文摄入 → 知识图谱 → 缺口检测 → 想法生成 → 实验设计 → 论文写作 → 同行评审回复。24 个 Claude Code Skills 驱动，一个 wiki 作为唯一的知识中枢。

为什么选择 Wiki 而不是 RAG？

	RAG	ΩmegaWiki
知识持久性	每次查询都重新发现	编译一次，持续维护
结构	扁平的 chunk 存储	9 种实体类型 + 关系图
交叉引用	无 — chunk 彼此孤立	双向 wikilink + 类型化边
知识缺口	不可见	显式追踪，驱动研究方向
失败实验	丢失	一等公民，防止重复探索
输出	聊天回答	论文、综述、实验方案、审稿回复
复利效应	无 — 每次查询成本相同	有 — 每篇论文丰富整个图谱

快速开始

前置条件： Python 3.9+, Node.js 18+

git clone https://github.com/skyllwt/OmegaWiki.git && cd OmegaWiki

# 安装 Claude Code
npm install -g @anthropic-ai/claude-code
claude login

# 一键配置
chmod +x setup.sh && ./setup.sh --lang zh        # Linux / macOS
# Windows (PowerShell):
#   powershell -ExecutionPolicy Bypass -File .\setup.ps1 -Lang zh
# setup 会为 OmegaWiki 创建 .venv
# 脚本不会把你当前 shell 永久激活，但 /init 会自动使用 .venv

# 把你自己的论文放入 raw/papers/（.tex 或 .pdf）
# 可选：把意图笔记放入 raw/notes/，网页存档放入 raw/web/
# /init 与直接本地 /ingest 会自动管理 raw/discovered/ 与 raw/tmp/ 下的生成内容
# 启动 Claude Code
claude
# 输入：/init [你的研究方向]

Windows 用户：本地 pipeline 已原生支持。/exp-run --env remote 远程 GPU 实验依赖 ssh/rsync/screen，建议在 WSL2 或 Linux/macOS 下运行。

API Key 说明

Key	必须？	获取方式	用途
`ANTHROPIC_API_KEY`	是	`claude login`	驱动所有 Skill
`CLAUDE_CODE_OAUTH_TOKEN`	可选	`claude setup-token`	Pro/Max 用户的 GitHub Actions Claude Code auth
`SEMANTIC_SCHOLAR_API_KEY`	可选	semanticscholar.org（免费）	引用图谱、论文搜索
`DEEPXIV_TOKEN`	可选	`setup.sh` 自动注册	语义搜索、热门趋势
`LLM_API_KEY` + `LLM_BASE_URL` + `LLM_MODEL`	可选	任意 OpenAI 兼容 API	跨模型评审；`/daily-arxiv` inform 推荐

自动化

GitHub Actions 每天 UTC 00:17（北京时间 08:17）运行 /daily-arxiv 推荐 pipeline：拉取 arXiv、按 wiki 去重、构建 recommendation context、上传 artifacts，并可通过 SMTP 发送 digest 邮件。

启用邮件时，在 repo Settings → Secrets 添加：SMTP_HOST、SMTP_PORT、SMTP_USER、SMTP_PASSWORD、SMTP_FROM、DAILY_ARXIV_EMAIL_TO。

CI inform mode 可使用 ANTHROPIC_API_KEY 或 CLAUDE_CODE_OAUTH_TOKEN 启动 Claude Code，也可使用 LLM_API_KEY、LLM_BASE_URL、LLM_MODEL 接入任意 OpenAI-compatible provider。auto-ingest 是显式模式，并且需要 Claude Code，因为普通 API LLM 不能调用 /ingest 这类 slash skill。手动触发时可设置 send_email=false，用于无 SMTP secrets 的 dry run。

Digest 示例 / Sample digest

一次真实的 /daily-arxiv 运行结果：带分数、理由、wiki 关联以及 auto-ingest 区块的推荐 digest。

24 个 Skill 命令

命令	功能
`/setup`	首次配置（API key、语言、依赖）
`/reset`	按范围销毁性清理：`wiki \| raw \| log \| checkpoints \| all`
`/prefill`	可选地预填 `foundations/` 背景知识
`/init`	基于用户 raw 素材并按需做外部发现来搭建 wiki
`/ingest`	消化论文，创建页面 + 交叉引用
`/discover`	从 anchor、topic 或当前 wiki 推荐排序后的下一批待读论文
`/edit`	增删 raw 或更新 wiki
`/ask`	对 wiki 提问
`/check`	wiki 健康检查
`/daily-arxiv`	运行/管理每日 arXiv 推荐 feed（可选 CI 定时）
`/ideate`	跨方向构思研究 idea
`/novelty`	多源新颖性验证
`/review`	跨模型评审
`/exp-design`	idea 驱动的实验设计
`/exp-run`	部署 + 监控实验
`/exp-status`	实验状态看板
`/exp-eval`	裁决 → 自动更新关联 idea
`/refine`	多轮迭代改进
`/survey`	生成 Related Work
`/paper-plan`	idea 图谱 + 实验证据 → 论文提纲
`/paper-draft`	提纲 + wiki → LaTeX 草稿
`/paper-compile`	编译 → PDF，自动修复
`/research`	端到端研究编排器
`/rebuttal`	解析评审意见 → 逐条回复

Built with Claude Code

If this project helps your research, give it a ⭐