codefetch (CLI)

November 25, 2025 · View on GitHub

Command-line interface for Codefetch - convert any codebase or website into AI-friendly markdown documentation.

Installation

Global Installation

npm install -g codefetch
# or
yarn global add codefetch
# or
pnpm add -g codefetch

Local Installation

npm install --save-dev codefetch
# or
yarn add -D codefetch
# or
pnpm add -D codefetch

Direct Usage (No Installation)

npx codefetch

Quick Start

Basic Usage

# Generate markdown from current directory
codefetch

# Generate from specific directory
codefetch /path/to/project

# Fetch from a website
codefetch --url https://example.com

# Fetch from a Git repository
codefetch --url https://github.com/user/repo

Features

Local Codebase Processing

Convert local codebases into structured markdown:

# Include only specific file types
codefetch -e ts,tsx,js,jsx

# Set token limit
codefetch --max-tokens 100000

# Output to specific file
codefetch -o my-codebase.md

# Dry run (output to console)
codefetch --dry-run

# Output as JSON for structured data
codefetch --format json

# Output JSON to specific file
codefetch --format json -o codebase.json

Web Fetching

Fetch and convert Git repositories from GitHub or GitLab:

# Analyze a GitHub repository (uses API by default - faster!)
codefetch --url https://github.com/facebook/react --branch main

# Analyze a GitLab repository
codefetch --url https://gitlab.com/gitlab-org/gitlab-foss --branch master

# Fetch private GitHub repo with token
codefetch --url https://github.com/org/private-repo --github-token ghp_xxxxx
# Or set GITHUB_TOKEN environment variable
export GITHUB_TOKEN=ghp_xxxxx
codefetch --url https://github.com/org/private-repo

# Force git clone instead of API
codefetch --url https://github.com/user/repo --no-api

# Fetch repository without cache
codefetch --url https://github.com/user/repo --no-cache

# Set cache TTL for repository (hours)
codefetch --url https://github.com/user/repo --cache-ttl 24

Configuration

Create a .codefetchrc file for project-specific settings:

{
  "extensions": [".ts", ".tsx", ".js", ".jsx"],
  "excludeFiles": ["*.test.ts", "*.spec.js"],
  "excludeDirs": ["__tests__", "coverage"],
  "maxTokens": 100000,
  "outputFile": "codebase.md",
  "outputPath": "./docs",
  "tokenEncoder": "cl100k",
  "projectTree": 2,
  "projectTreeSkipIgnoreFiles": false
}

Command Line Options

General Options

  • -h, --help - Show help information
  • -v, --verbose - Increase verbosity (use multiple times: -vvv)
  • --dry-run, -d - Output to console instead of file
  • --stdout - Print only the final output to stdout (equivalent to --dry-run --no-summary --verbose 0)
  • -o, --output - Output filename (default: codefetch-output-[timestamp].md)
  • --output-path - Output directory path
  • --max-tokens - Maximum token limit
  • --token-encoder - Token encoder model (cl100k, p50k, r50k, o200k)
  • --token-count-only, -c - Only output token count
  • --disable-line-numbers - Disable line numbers in code blocks
  • --no-summary - Disable the token/model summary box at the end

File Filtering

  • -e, --extension - File extensions to include (e.g., ts,js,py)
  • --include-files - Patterns for files to include
  • --exclude-files - Patterns for files to exclude
  • --include-dir - Directories to include
  • --exclude-dir - Directories to exclude

Web Fetching Options

  • --url - URL to fetch (website or git repository)
  • --max-pages - Maximum pages to crawl (default: 50)
  • --max-depth - Maximum crawl depth (default: 2)
  • --no-cache - Disable cache for this request
  • --cache-ttl - Cache time-to-live in hours (default: 1)
  • --branch - Git branch to fetch (for repositories)
  • --ignore-robots - Ignore robots.txt restrictions
  • --ignore-cors - Ignore CORS restrictions
  • --no-api - Disable GitHub API and use git clone instead
  • --github-token - GitHub API token for private repos

Display Options

  • -t, --project-tree - Show project tree (0=off, 1+=depth)
  • --project-tree-skip-ignore-files - Include files ignored by git/config in the project tree
  • --tracked-models - Label the token summary with specific models
  • --format - Output format (markdown, json) (default: markdown)
  • --enable-line-numbers - Enable line numbers in output (disabled by default to save tokens)

Advanced Options

  • -p, --prompt - Add a prompt: inline text, built-in (fix, improve, codegen, testgen), or file (.md/.txt)
  • --var - Set template variables (e.g., --var PROJECT_NAME="My App")
  • --token-limiter - Token limiting strategy (truncated, spread)

Examples

Analyze a TypeScript Project

codefetch -e ts,tsx --exclude-dir node_modules,dist \
  --max-tokens 50000 -o typescript-analysis.md

Fetch Documentation Repository

codefetch --url https://github.com/org/docs \
  --branch main \
  --max-pages 100 \
  --max-depth 5 \
  --output docs-analysis.md

Analyze a GitHub Repository

# Uses GitHub API by default (faster, no git required)
codefetch --url https://github.com/expressjs/express \
  --branch master \
  -e js --exclude-dir test,examples

# For private repositories
codefetch --url https://github.com/myorg/private-repo \
  --github-token ghp_your_token_here

Use with AI Prompts

# Code review
codefetch -p review --var PROJECT_NAME="MyApp" -o review.md

# Generate tests
codefetch -p testgen -e ts,tsx --include-dir src

# Improve code quality
codefetch -p improve --max-tokens 30000

Track Multiple Models

You can tag the summary with the models you care about:

codefetch --tracked-models gpt-4o,claude-3.5-sonnet
codefetch --tracked-models gpt-4o,claude-3.5-sonnet --no-summary  # hide summary box

JSON Output Format

# Generate JSON output for programmatic access
codefetch --format json -o codebase.json

# Use with jq to query specific files
codefetch --format json | jq '.root.children[] | select(.name == "src")'

Ignore Patterns

Create a .codefetchignore file to exclude files:

# Dependencies
node_modules/
vendor/
.pnpm-store/

# Build outputs
dist/
build/
out/
*.min.js
*.min.css

# Test files
*.test.ts
*.spec.js
__tests__/
coverage/

# Environment and logs
.env*
*.log
.DS_Store

# IDE
.vscode/
.idea/
*.swp

Output Format

The generated markdown includes:

  1. Project Structure - Tree view of the codebase (respects .gitignore, .codefetchignore, and config filters by default)
  2. File Contents - Each file with syntax highlighting
  3. Token Count - Total tokens for AI model context
  4. Metadata - Timestamps and configuration used

Example output structure:

Project Structure:
├── src/
│   ├── index.ts
│   ├── utils/
│   │   └── helpers.ts
│   └── components/
│       └── Button.tsx
└── package.json

src/index.ts:
\`\`\`typescript
1 | import { helper } from './utils/helpers';
2 | 
3 | export function main() {
4 |   console.log('Hello, world!');
5 | }
\`\`\`

[... more files ...]

The project tree automatically hides entries excluded via .gitignore, .codefetchignore, and your include/exclude settings so it lines up with the files that will be embedded into the markdown. Use --project-tree-skip-ignore-files if you temporarily need to inspect the entire directory structure, including ignored paths.

Caching

Web fetching results are cached to improve performance:

  • Default cache location: ~/.codefetch/cache/
  • Default TTL: 1 hour
  • Use --no-cache to bypass
  • Use --cache-ttl to set custom expiration

Performance Tips

  1. Use specific extensions: -e ts,tsx is faster than processing all files
  2. Exclude test files: --exclude-files "*.test.ts,*.spec.js"
  3. Limit crawl depth: --max-depth 2 for faster website fetching
  4. Set reasonable page limits: --max-pages 50 to avoid excessive crawling
  5. Use cache: Subsequent fetches of the same URL will be instant

Troubleshooting

Common Issues

"Token limit exceeded"

  • Use --max-tokens to set a higher limit
  • Use -e to include only specific file types
  • Use --exclude-dir to skip large directories

"Failed to crawl URL"

  • Check if the URL is accessible
  • Try with --ignore-robots if blocked by robots.txt
  • Use --no-cache if cached data is stale

"Command not found"

  • Ensure global installation: npm install -g codefetch
  • Or use npx: npx codefetch

Debug Mode

Use maximum verbosity for debugging:

codefetch -vvv --url example.com

Integration

Package.json Scripts

{
  "scripts": {
    "docs:generate": "codefetch -e ts,tsx -o docs/codebase.md",
    "docs:analyze": "codefetch --token-count-only",
    "review": "codefetch -p review -o review.md"
  }
}

CI/CD Pipeline

# GitHub Actions example
- name: Generate Documentation
  run: |
    npx codefetch \
      -e ts,tsx,js,jsx \
      --exclude-dir node_modules,coverage \
      -o codebase-docs.md

Contributing

See the main repository for contribution guidelines.

License

MIT