Codefetch SDK for Cloudflare Workers

November 17, 2025 · View on GitHub

The codefetch-sdk provides a specialized /worker export optimized for Cloudflare Workers environments, with zero file system dependencies and native Web Streams support.

Installation

npm install codefetch-sdk@latest
# or
pnpm add codefetch-sdk@latest
# or
yarn add codefetch-sdk@latest

Worker-Specific Import

// Use the /worker export for Cloudflare Workers
import { fetchFromWeb as fetch } from 'codefetch-sdk/worker';

Features

🎯 Unified fetch() API - Single method for GitHub repos, web content, and more
🚀 Zero nodejs_compat required - Uses native Web APIs
📦 Optimized bundle - Only ≈ 24 KB gzipped for edge performance
🗄️ Built‑in caching - Transparent per‑request memoization with optional KV persistence
🌊 Native streaming - Memory‑efficient processing
🔒 Private repo support - GitHub token authentication
⚡ Fast GitHub fetching - Efficient repository processing
🎯 Simple configuration - Minimal boilerplate

Quick Start

Basic Worker

By default the Worker fetch() returns markdown as a string. Use format: 'json' if you need a structured result.

export default {
  async fetch(request) {
    const markdown = await fetch({
      source: 'https://github.com/facebook/react',
      extensions: ['.js', '.ts', '.md'],
      maxFiles: 50,
    });

    return new Response(markdown, {
      headers: { 'Content-Type': 'text/markdown' },
    });
  },
};

Private Repository

export default {
  async fetch(request, env) {
    const markdown = await fetch({
      source: 'https://github.com/myorg/private-repo',
      githubToken: env.GITHUB_TOKEN,
      extensions: ['.ts', '.tsx'],
      maxFiles: 100,
    });

    return new Response(markdown);
  },
};

Web Content Crawling (Git Repositories)

Currently, the Worker fetcher supports Git repositories hosted on GitHub and GitLab. Generic websites are not yet supported.

export default {
  async fetch(request) {
    const markdown = await fetch({
      source: 'https://github.com/org/docs',
      maxPages: 10,
      maxDepth: 2,
    });

    return new Response(markdown, {
      headers: { 'Content-Type': 'text/markdown' },
    });
  },
};

API Reference

`fetch(options: FetchOptions): Promise<string | FetchResultImpl>`

The unified API for all Worker-compatible content sources (GitHub/GitLab).

interface FetchOptions {
  // Source (required)
  source: string; // GitHub or GitLab URL

  // Filtering
  extensions?: string[];
  excludeFiles?: string[];
  includeFiles?: string[];
  excludeDirs?: string[];
  includeDirs?: string[];

  // Token management
  maxTokens?: number;
  maxFiles?: number;
  tokenEncoder?: 'cl100k' | 'p50k' | 'o200k' | 'simple';
  tokenLimiter?: 'sequential' | 'truncated';

  // GitHub specific
  githubToken?: string;
  branch?: string;

  // Git repo crawling
  maxPages?: number;
  maxDepth?: number;

  // Output
  format?: 'markdown' | 'json';
  includeTree?: boolean | number;
  disableLineNumbers?: boolean;

  // Caching
  noCache?: boolean;
  cacheTTL?: number;
}

When format: 'json' is set, the Worker fetch() returns an instance of FetchResultImpl with a file tree and metadata, identical to the Node build. For the default format: 'markdown', it returns a plain markdown string.

Caching& KV Persistence

codefetch-sdk/worker ships with in‑memory and pluggable caching. When you call fetch() with the same parameters inside the same isolate it can return cached results, saving GitHub quota and reducing latency.

// Disable caching for a single request
await fetch({ source: repoUrl, noCache: true });

Real-World Examples

1. GitHub Repository Analyzer API (JSON result)

export default {
  async fetch(request, env, ctx) {
    const url = new URL(request.url);
    const repo = url.searchParams.get('repo'); // format: owner/name
    
    if (!repo) {
      return new Response('Missing repo parameter', { status: 400 });
    }

    const [owner, name] = repo.split('/');
    
    try {
      // Cache the results
      const cacheKey = new Request(`https://cache.example.com/${owner}/${name}`);
      const cache = caches.default;
      
      let response = await cache.match(cacheKey);
      if (response) {
        return response;
      }

      // Fetch repository as JSON
      const result = await fetch({
        source: `https://github.com/${owner}/${name}`,
        githubToken: env.GITHUB_TOKEN,
        extensions: ['.ts', '.js', '.py', '.go'],
        excludeDirs: ['node_modules', 'vendor', '.git'],
        maxFiles: 100,
        format: 'json',
      });

      // Analyze code
      const analysis = {
        totalFiles: result.metadata.totalFiles,
        totalTokens: result.metadata.totalTokens,
        languages: {},
        totalSize: 0,
        largestFiles: []
      };

      for (const file of result.files) {
        const ext = file.path.split('.').pop();
        analysis.languages[ext] = (analysis.languages[ext] || 0) + 1;
        analysis.totalSize += file.content.length;
        
        if (file.content.length > 10000) {
          analysis.largestFiles.push({
            path: file.path,
            size: file.content.length
          });
        }
      }

      response = Response.json(analysis, {
        headers: {
          'Content-Type': 'application/json',
          'Cache-Control': 'public, max-age=3600'
        }
      });

      // Cache for 1 hour
      ctx.waitUntil(cache.put(cacheKey, response.clone()));
      
      return response;
    } catch (error) {
      return Response.json({ error: error.message }, { status: 500 });
    }
  }
};

2. Documentation Generator

export default {
  async fetch(request, env) {
    const { searchParams } = new URL(request.url);
    const repo = searchParams.get('repo');
    const maxTokens = parseInt(searchParams.get('maxTokens') || '50000', 10);

    const markdown = await fetch({
      source: `https://github.com/${repo}`,
      githubToken: env.GITHUB_TOKEN,
      extensions: ['.md', '.mdx', '.ts', '.js'],
      excludeDirs: ['node_modules', 'test'],
      maxTokens,
      includeTree: true,
    });

    return new Response(markdown, {
      headers: {
        'Content-Type': 'text/markdown',
      },
    });
  },
};

3. Code Search API

export default {
  async fetch(request) {
    const { repo, query } = await request.json();
    
    const result = await fetch({
      source: `https://github.com/${repo}`,
      extensions: ['.js', '.ts', '.jsx', '.tsx'],
      maxFiles: 200
    });

    const results = result.files
      .filter(file => file.content.includes(query))
      .map(file => {
        const lines = file.content.split('\n');
        const matches = [];
        
        lines.forEach((line, index) => {
          if (line.includes(query)) {
            matches.push({
              line: index + 1,
              content: line.trim(),
              context: lines.slice(
                Math.max(0, index - 2),
                Math.min(lines.length, index + 3)
              ).join('\n')
            });
          }
        });

        return { path: file.path, matches };
      })
      .filter(result => result.matches.length > 0)
      .slice(0, 10); // Limit results

    return Response.json({ 
      query,
      resultCount: results.length,
      results
    });
  }
};

Performance Tips

1. Use Cache API

const cache = caches.default;
const cacheKey = new Request(`https://cache.example.com/${repo}`);
const cached = await cache.match(cacheKey);
if (cached) return cached;

2. Stream Large Responses

const { readable, writable } = new TransformStream();
const writer = writable.getWriter();

ctx.waitUntil((async () => {
  const result = await fetch({ source: repoUrl });
  await writer.write(result.markdown);
  await writer.close();
})());

return new Response(readable);

3. Use Durable Objects for State

const id = env.REPO_ANALYZER.idFromName(repoKey);
const analyzer = env.REPO_ANALYZER.get(id);
return analyzer.fetch(request);

Environment Setup

wrangler.toml

name = "codefetch-worker"
main = "src/index.js"
compatibility_date = "2024-01-01"
# No nodejs_compat needed!

[vars]
MAX_FILE_SIZE = "1048576"  # 1MB

# Add GitHub token as secret
# wrangler secret put GITHUB_TOKEN

TypeScript Configuration

// worker-configuration.d.ts
interface Env {
  GITHUB_TOKEN: string;
  CACHE_DURATION: string;
  MAX_FILE_SIZE: string;
}

export default {
  async fetch(
    request: Request,
    env: Env,
    ctx: ExecutionContext
  ): Promise<Response> {
    // Your worker code
  }
};

Error Handling

try {
  const result = await fetch({ source: repoUrl });
} catch (error) {
  if (error.message.includes('404')) {
    return new Response('Repository not found', { status: 404 });
  }
  if (error.message.includes('403')) {
    return new Response('Rate limit exceeded or auth required', { status: 403 });
  }
  if (error.message.includes('Invalid URL')) {
    return new Response('Invalid repository URL', { status: 400 });
  }
  // Log to Workers Analytics
  console.error('Fetch error:', error);
  return new Response('Internal error', { status: 500 });
}

Limitations

No File System - All operations are in-memory
Memory Limits - Workers have 128MB memory limit
CPU Time - Maximum 30 seconds for free plan
Subrequest Limits - 50 subrequests per request
Response Size - 10MB response limit

Testing Locally

# Install Wrangler
pnpm add -D wrangler

# Create worker file
echo 'import { fetch } from "codefetch-sdk/worker";
export default { 
  fetch: () => new Response("OK") 
};' > worker.js

# Test locally
wrangler dev worker.js