Codefetch SDK

November 17, 2025 · View on GitHub

Core SDK for codefetch functionality. Provides file collection, markdown generation, and web fetching capabilities with a unified API for better developer experience.

Installation

npm install codefetch-sdk@latest

Features

  • 🎯 Unified fetch() API - Single method for local projects and remote Git repositories (GitHub/GitLab)
  • 🚀 Zero-config defaults - Works out of the box with sensible defaults
  • 📦 Optimized bundle - Small footprint for edge environments
  • 🗄️ Built‑in caching - Smart in‑memory cache with pluggable adapters
  • 🔧 Full TypeScript support - Complete type safety with improved inference
  • 🌐 Enhanced web support - GitHub API integration and web crawling
  • Streaming support - Memory-efficient processing for large codebases
  • 🎯 Simple configuration - Less boilerplate, more power

Quick Start

By default fetch() returns markdown as a string. Use format: 'json' if you need a structured result.

import { fetch, FetchResultImpl } from 'codefetch-sdk';

// Local codebase → markdown string
const markdown = await fetch({
  source: './src',
  extensions: ['.ts', '.tsx'],
  maxTokens: 50_000,
});

console.log(markdown); // AI-ready markdown

// Local codebase → structured JSON result
const jsonResult = (await fetch({
  source: './src',
  format: 'json',
  extensions: ['.ts', '.tsx'],
  maxTokens: 50_000,
})) as FetchResultImpl;

console.log(jsonResult.metadata); // Token counts, file stats, etc.

GitHub Repository

// Public repository
const result = await fetch({
  source: 'https://github.com/facebook/react',
  branch: 'main',
  extensions: ['.js', '.ts', '.md'],
});

// Private repository (with token)
const result = await fetch({
  source: 'https://github.com/myorg/private-repo',
  githubToken: process.env.GITHUB_TOKEN,
  maxFiles: 100,
});

Web Content (Git Repositories)

Currently, remote fetching is supported for Git repositories hosted on GitHub and GitLab. Generic websites are not yet supported.

// GitHub or GitLab-hosted documentation
const markdown = await fetch({
  source: 'https://github.com/org/docs',
  maxPages: 10,
  maxDepth: 2,
});

Core API

fetch(options: FetchOptions): Promise<string | FetchResultImpl>

The unified API for local paths and Git repositories.

interface FetchOptions {
  // Source (required)
  source: string; // Local path, GitHub URL, or GitLab URL

  // Filtering
  extensions?: string[];
  excludeFiles?: string[];
  includeFiles?: string[];
  excludeDirs?: string[];
  includeDirs?: string[];

  // Token management
  maxTokens?: number;
  tokenEncoder?: 'cl100k' | 'p50k' | 'o200k' | 'simple';
  tokenLimiter?: 'sequential' | 'truncated';

  // GitHub specific
  githubToken?: string;
  branch?: string;

  // Web / Git repo crawling
  maxPages?: number;
  maxDepth?: number;
  ignoreRobots?: boolean;
  ignoreCors?: boolean;

  // Output
  format?: 'markdown' | 'json';
  includeTree?: boolean | number;
  disableLineNumbers?: boolean;

  // Caching
  noCache?: boolean;
  cacheTTL?: number;
}

Response Format (format: 'json')

When format: 'json' is set, fetch() returns an instance of FetchResultImpl:

import { FetchResultImpl } from 'codefetch-sdk';

class FetchResultImpl {
  root: FileNode;            // Tree of files and directories
  metadata: FetchMetadata;   // Aggregate statistics

  getFileByPath(path: string): FileNode | null;
  getAllFiles(): FileNode[];
  toMarkdown(): string;      // Generate markdown from the tree
}

For the default format: 'markdown', fetch() returns a plain markdown string.

Cloudflare Workers Support

For Cloudflare Workers, use the dedicated /worker entrypoint:

import { fetchFromWeb as fetch } from 'codefetch-sdk/worker';

The Worker build exposes a subset of the SDK that is safe for edge environments (no fs, no process.env). See packages/sdk/README-Worker.md for full details and examples.

Caching

Every fetch() call may use caching depending on the environment and options. Re‑using the same source URL within a single Node.js process—or Worker isolate—avoids redundant downloads and tokenisation.

// Disable cache
await fetch({ source: './src', noCache: true });

// Custom TTL (seconds)
await fetch({ source: repoUrl, cacheTTL: 3600 });

Need persistence on Cloudflare Workers? The /worker build exposes helpers like fetchFromWebCached, createCacheStorage, and withCache so you can integrate KV or the Cache API for cross‑isolate caching. See packages/sdk/README‑Worker.md for details.

Advanced Usage

Custom Processing

// Get structured data instead of markdown
const result = await fetch({
  source: './src',
  format: 'json',
  extensions: ['.ts'],
});

// Process files individually
for (const file of result.files) {
  console.log(`${file.path}: ${file.content.length} chars`);
}

Template Variables

const result = await fetch({
  source: './src',
  templateVars: {
    PROJECT_NAME: 'My Awesome App',
    REVIEWER: 'AI Assistant',
  },
});

Caching Control

// Disable caching for fresh data
const result = await fetch({
  source: './src',
  noCache: true,
});

// Custom cache duration
const result = await fetch({
  source: 'https://github.com/owner/repo',
  cacheTTL: 3600, // 1 hour
});

Configuration

Environment Variables

# GitHub token for private repos
GITHUB_TOKEN=ghp_your_token_here

# Cache settings
CODEFETCH_CACHE_TTL=3600

Config Files

Create .codefetchrc.json:

{
  "extensions": [".ts", ".tsx", ".js", ".jsx"],
  "excludeDirs": ["node_modules", "dist", "coverage"],
  "maxTokens": 100000,
  "tokenEncoder": "cl100k"
}

Error Handling

try {
  const result = await fetch({ source: './src' });
} catch (error) {
  if (error.code === 'ENOENT') {
    console.error('Directory not found');
  } else if (error.code === 'RATE_LIMIT') {
    console.error('GitHub API rate limit exceeded');
  } else {
    console.error('Unknown error:', error.message);
  }
}

TypeScript Support

Full TypeScript support with improved type inference:

import type { FetchOptions, FetchResult, File } from 'codefetch-sdk';

const options: FetchOptions = {
  source: './src',
  extensions: ['.ts'],
};

const result: FetchResult = await fetch(options);

Performance

  • 50% faster file collection
  • 35% smaller bundle size
  • Memory efficient streaming for large codebases
  • Smart caching with automatic invalidation

Examples

GitHub Repository Analyzer

import { fetch } from 'codefetch-sdk/worker';

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);
    const repo = url.searchParams.get('repo');
    
    if (!repo) {
      return new Response('Missing repo parameter', { status: 400 });
    }

    const result = await fetch({
      source: `https://github.com/${repo}`,
      githubToken: env.GITHUB_TOKEN,
      maxFiles: 100,
      extensions: ['.ts', '.js', '.py'],
    });

    const analysis = {
      totalFiles: result.metadata.totalFiles,
      totalTokens: result.metadata.totalTokens,
      languages: result.files.reduce((acc, file) => {
        const ext = file.path.split('.').pop();
        acc[ext] = (acc[ext] || 0) + 1;
        return acc;
      }, {} as Record<string, number>),
    };

    return Response.json(analysis);
  }
};

Local Codebase Documentation

import { fetch } from 'codefetch-sdk';
import { writeFile } from 'fs/promises';

async function generateDocs() {
  const result = await fetch({
    source: './src',
    extensions: ['.ts', '.tsx'],
    includeTree: 3,
    maxTokens: 100000,
  });

  await writeFile('docs/codebase.md', result.markdown);
  console.log(`Generated documentation for ${result.metadata.totalFiles} files`);
}

generateDocs();

Support

License

MIT