README.md
May 16, 2026 · View on GitHub
Sponsors
Swiftproxy(https://www.swiftproxy.net/?ref=AnyCrawl) — High-performance residential proxies built for scraping, automation, and large-scale data collection. Access 80M+ rotating residential IPs across 195+ countries with stable connections, high anonymity, and developer-friendly integration. Ideal for AI agents, crawlers, browser automation, and anti-bot bypass workflows. Free trial available. Use code PROXY90 for an exclusive 10% discount.
AtlasCloud(https://www.atlascloud.ai/?utm_source=github&utm_medium=sponsor&utm_campaign=AnyCrawl) — Atlas Cloud gives developers one API for 300 plus models, covering video, image, and LLM. It includes DeepSeek, GPT, Claude, Flux, Kling, and Seedance.📖 Overview
AnyCrawl is a high‑performance crawling and scraping toolkit:
- SERP crawling: multiple search engines, batch‑friendly
- Web scraping: single‑page content extraction
- Site crawling: full‑site traversal and collection
- High performance: multi‑threading / multi‑process
- Batch tasks: reliable and efficient
- AI extraction: LLM‑powered structured data (JSON) extraction from pages
LLM‑friendly. Easy to integrate and use.
🚀 Quick Start
📖 See full docs: Docs
Generate an API Key (self-host)
If you enable authentication (ANYCRAWL_API_AUTH_ENABLED=true), generate an API key:
pnpm --filter api key:generate
# optionally name the key
pnpm --filter api key:generate -- default
The command prints uuid, key and credits. Use the printed key as a Bearer token.
Run Inside Docker
If running AnyCrawl via Docker:
- Docker Compose:
docker compose exec api pnpm --filter api key:generate
docker compose exec api pnpm --filter api key:generate -- default
- Single container (replace <container_name_or_id>):
docker exec -it <container_name_or_id> pnpm --filter api key:generate
docker exec -it <container_name_or_id> pnpm --filter api key:generate -- default
📚 Usage Examples
💡 Use the Playground to test APIs and generate code in your preferred language.
If self‑hosting, replace
https://api.anycrawl.devwith your own server URL.
Web Scraping (Scrape)
Example
curl -X POST https://api.anycrawl.dev/v1/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
-d '{
"url": "https://example.com",
"engine": "cheerio"
}'
Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
| url | string (required) | The URL to be scraped. Must be a valid URL starting with http:// or https:// | - |
| engine | string | Scraping engine to use. Options: cheerio (static HTML parsing, fastest), playwright (JavaScript rendering with modern engine), puppeteer (JavaScript rendering with Chrome) | cheerio |
| proxy | string | Proxy URL for the request. Supports HTTP and SOCKS proxies. Format: http://[username]:[password]@proxy:port | (none) |
| max_age | number | Cache control (ms). 0 = force refresh (skip cache read); > 0 = accept cached content within this age; omit to use default. | (none) |
| store_in_cache | boolean | Cache control. Whether to store the result in cache. To bypass cache reads, use max_age=0. | true |
More parameters: see Request Parameters.
Cache details (self-host / S3 / map index): see docs/cache.md.
LLM Extraction
curl -X POST "https://api.anycrawl.dev/v1/scrape" \
-H "Authorization: Bearer YOUR_ANYCRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"json_options": {
"schema": {
"type": "object",
"properties": {
"company_mission": { "type": "string" },
"is_open_source": { "type": "boolean" },
"employee_count": { "type": "number" }
},
"required": ["company_mission"]
}
}
}'
Atlas Cloud Provider
AnyCrawl supports Atlas Cloud as an OpenAI-compatible LLM provider for extraction and summarization workloads.
- Official site: Atlas Cloud
- LLM base URL:
https://api.atlascloud.ai/v1 - Recommended env model format:
atlascloud/deepseek-v3
ATLASCLOUD_BASE_URL=https://api.atlascloud.ai/v1
ATLASCLOUD_API_KEY=your-atlascloud-api-key
DEFAULT_LLM_MODEL=atlascloud/deepseek-v3
DEFAULT_EXTRACT_MODEL=atlascloud/deepseek-v3
If you prefer file-based AI config, add an atlascloud provider entry in ai.config.json and map it to any Atlas Cloud model exposed through the OpenAI-compatible chat API.
Site Crawling (Crawl)
Example
curl -X POST https://api.anycrawl.dev/v1/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
-d '{
"url": "https://example.com",
"engine": "playwright",
"max_depth": 2,
"limit": 10,
"strategy": "same-domain"
}'
Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
| url | string (required) | Starting URL to crawl | - |
| engine | string | Crawling engine. Options: cheerio, playwright, puppeteer | cheerio |
| max_depth | number | Max depth from the start URL | 10 |
| limit | number | Max number of pages to crawl | 100 |
| strategy | enum | Scope: all, same-domain, same-hostname, same-origin | same-domain |
| include_paths | array | Only crawl paths matching these patterns | (none) |
| exclude_paths | array | Skip paths matching these patterns | (none) |
| scrape_options | object | Per-page scrape options (formats, timeout, json extraction, etc.), same as Scrape options | (none) |
More parameters and endpoints: see Request Parameters.
Search Engine Results (SERP)
Example
curl -X POST https://api.anycrawl.dev/v1/search \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
-d '{
"query": "AnyCrawl",
"limit": 10,
"engine": "google",
"lang": "all"
}'
Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
query | string (required) | Search query to be executed | - |
engine | string | Search engine to use. Options: google | |
pages | integer | Number of search result pages to retrieve | 1 |
lang | string | Language code for search results (e.g., 'en', 'zh', 'all') | en-US |
Supported search engines
❓ FAQ
- Can I use proxies? Yes. AnyCrawl ships with a high‑quality default proxy. You can also configure your own: set the
proxyrequest parameter (per request) orANYCRAWL_PROXY_URL(self‑hosting). - How to handle JavaScript‑rendered pages? Use the
PlaywrightorPuppeteerengines.
🤝 Contributing
We welcome contributions! See the Contributing Guide.
Backers
Support us with a monthly donation and help us continue our activities. [Become a backer]
📄 License
MIT License — see LICENSE.
🎯 Mission
We build simple, reliable, and scalable tools for the AI ecosystem.