README.md

May 16, 2026 · View on GitHub

AnyCrawl

AnyCrawl

AnyCrawl

Fast Scalable Web Crawling Site Crawling SERP Multi Threading Multi Process Batch Tasks

License: MIT PRs Welcome LLM Ready Documentation

X

Node.js TypeScript Redis

Sponsors

SwiftProxy

Swiftproxy(https://www.swiftproxy.net/?ref=AnyCrawl) — High-performance residential proxies built for scraping, automation, and large-scale data collection. Access 80M+ rotating residential IPs across 195+ countries with stable connections, high anonymity, and developer-friendly integration. Ideal for AI agents, crawlers, browser automation, and anti-bot bypass workflows. Free trial available. Use code PROXY90 for an exclusive 10% discount.

Atlas Cloud

AtlasCloud(https://www.atlascloud.ai/?utm_source=github&utm_medium=sponsor&utm_campaign=AnyCrawl) — Atlas Cloud gives developers one API for 300 plus models, covering video, image, and LLM. It includes DeepSeek, GPT, Claude, Flux, Kling, and Seedance.

📖 Overview

AnyCrawl is a high‑performance crawling and scraping toolkit:

  • SERP crawling: multiple search engines, batch‑friendly
  • Web scraping: single‑page content extraction
  • Site crawling: full‑site traversal and collection
  • High performance: multi‑threading / multi‑process
  • Batch tasks: reliable and efficient
  • AI extraction: LLM‑powered structured data (JSON) extraction from pages

LLM‑friendly. Easy to integrate and use.

🚀 Quick Start

📖 See full docs: Docs

Generate an API Key (self-host)

If you enable authentication (ANYCRAWL_API_AUTH_ENABLED=true), generate an API key:

pnpm --filter api key:generate
# optionally name the key
pnpm --filter api key:generate -- default

The command prints uuid, key and credits. Use the printed key as a Bearer token.

Run Inside Docker

If running AnyCrawl via Docker:

  • Docker Compose:
docker compose exec api pnpm --filter api key:generate
docker compose exec api pnpm --filter api key:generate -- default
  • Single container (replace <container_name_or_id>):
docker exec -it <container_name_or_id> pnpm --filter api key:generate
docker exec -it <container_name_or_id> pnpm --filter api key:generate -- default

📚 Usage Examples

💡 Use the Playground to test APIs and generate code in your preferred language.

If self‑hosting, replace https://api.anycrawl.dev with your own server URL.

Web Scraping (Scrape)

Example


curl -X POST https://api.anycrawl.dev/v1/scrape \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "url": "https://example.com",
  "engine": "cheerio"
}'

Parameters

ParameterTypeDescriptionDefault
urlstring (required)The URL to be scraped. Must be a valid URL starting with http:// or https://-
enginestringScraping engine to use. Options: cheerio (static HTML parsing, fastest), playwright (JavaScript rendering with modern engine), puppeteer (JavaScript rendering with Chrome)cheerio
proxystringProxy URL for the request. Supports HTTP and SOCKS proxies. Format: http://[username]:[password]@proxy:port(none)
max_agenumberCache control (ms). 0 = force refresh (skip cache read); > 0 = accept cached content within this age; omit to use default.(none)
store_in_cachebooleanCache control. Whether to store the result in cache. To bypass cache reads, use max_age=0.true

More parameters: see Request Parameters.

Cache details (self-host / S3 / map index): see docs/cache.md.

LLM Extraction

curl -X POST "https://api.anycrawl.dev/v1/scrape" \
  -H "Authorization: Bearer YOUR_ANYCRAWL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "json_options": {
      "schema": {
        "type": "object",
        "properties": {
          "company_mission": { "type": "string" },
          "is_open_source": { "type": "boolean" },
          "employee_count": { "type": "number" }
        },
        "required": ["company_mission"]
      }
    }
  }'

Atlas Cloud Provider

AnyCrawl supports Atlas Cloud as an OpenAI-compatible LLM provider for extraction and summarization workloads.

  • Official site: Atlas Cloud
  • LLM base URL: https://api.atlascloud.ai/v1
  • Recommended env model format: atlascloud/deepseek-v3
ATLASCLOUD_BASE_URL=https://api.atlascloud.ai/v1
ATLASCLOUD_API_KEY=your-atlascloud-api-key
DEFAULT_LLM_MODEL=atlascloud/deepseek-v3
DEFAULT_EXTRACT_MODEL=atlascloud/deepseek-v3

If you prefer file-based AI config, add an atlascloud provider entry in ai.config.json and map it to any Atlas Cloud model exposed through the OpenAI-compatible chat API.

Site Crawling (Crawl)

Example


curl -X POST https://api.anycrawl.dev/v1/crawl \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "url": "https://example.com",
  "engine": "playwright",
  "max_depth": 2,
  "limit": 10,
  "strategy": "same-domain"
}'

Parameters

ParameterTypeDescriptionDefault
urlstring (required)Starting URL to crawl-
enginestringCrawling engine. Options: cheerio, playwright, puppeteercheerio
max_depthnumberMax depth from the start URL10
limitnumberMax number of pages to crawl100
strategyenumScope: all, same-domain, same-hostname, same-originsame-domain
include_pathsarrayOnly crawl paths matching these patterns(none)
exclude_pathsarraySkip paths matching these patterns(none)
scrape_optionsobjectPer-page scrape options (formats, timeout, json extraction, etc.), same as Scrape options(none)

More parameters and endpoints: see Request Parameters.

Search Engine Results (SERP)

Example

curl -X POST https://api.anycrawl.dev/v1/search \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_ANYCRAWL_API_KEY' \
  -d '{
  "query": "AnyCrawl",
  "limit": 10,
  "engine": "google",
  "lang": "all"
}'

Parameters

ParameterTypeDescriptionDefault
querystring (required)Search query to be executed-
enginestringSearch engine to use. Options: googlegoogle
pagesintegerNumber of search result pages to retrieve1
langstringLanguage code for search results (e.g., 'en', 'zh', 'all')en-US

Supported search engines

  • Google

❓ FAQ

  1. Can I use proxies? Yes. AnyCrawl ships with a high‑quality default proxy. You can also configure your own: set the proxy request parameter (per request) or ANYCRAWL_PROXY_URL (self‑hosting).
  2. How to handle JavaScript‑rendered pages? Use the Playwright or Puppeteer engines.

🤝 Contributing

We welcome contributions! See the Contributing Guide.

Backers

Support us with a monthly donation and help us continue our activities. [Become a backer]

Mocha's backers on Open Collective

📄 License

MIT License — see LICENSE.

🎯 Mission

We build simple, reliable, and scalable tools for the AI ecosystem.


Built with ❤️ by the Any4AI team