PT - Audio Transcription Tool

January 7, 2026 · View on GitHub

An audio transcription tool based on Next.js and OpenAI Whisper API, supporting audio file transcription and intelligent summary generation.

✨ Features

🎯 Support both file upload and URL input
🎙️ Support for Xiaoyuzhou podcast transcription
📝 High-quality audio transcription using OpenAI Whisper API
📊 AI-powered content summarization
🎨 Modern UI design
💾 Download transcripts and summaries
🎵 Built-in audio player
🖥️ CLI tool support (pt command)
📋 SRT subtitle format output
🔄 Chunked processing for large audio files
⚡ Parallel transcription for better performance
📤 Multiple output formats (text, JSON, markdown, SRT)

📦 CLI Installation

Install via npm

npm install -g @winterfx/pt

Configure API Key

Choose one of the following methods:

Option 1: Environment Variable (Recommended)

# Add to ~/.zshrc or ~/.bashrc
export API_KEY="your-api-key"
export BASE_URL="https://api.openai.com/v1"  # optional

Option 2: Config File (~/.pt/.env)

mkdir -p ~/.pt
cat > ~/.pt/.env << 'EOF'
API_KEY=your-api-key
BASE_URL=https://api.openai.com/v1
EOF

Option 3: Current Directory (.env)

cat > .env << 'EOF'
API_KEY=your-api-key
BASE_URL=https://api.openai.com/v1
EOF

🚀 Web App Development

Prerequisites

Node.js 18+
OpenAI API Key
FFmpeg (required for audio processing)

Installing FFmpeg

# macOS
brew install ffmpeg

# Linux (Ubuntu/Debian)
sudo apt-get install ffmpeg

# Windows
choco install ffmpeg

Installation

Clone the repository:

git clone https://github.com/yourusername/podcast-transcription.git
cd podcast-transcription

Install dependencies:

npm install
# or
yarn install
# or
pnpm install

Configure environment variables: Create a .env.local file and add:

API_KEY=your_openai_api_key
BASE_URL=your_endpoint

Start the development server:

npm run dev
# or
yarn dev
# or
pnpm dev

Visit http://localhost:3000 to view the app.

Docker Deployment

Build the Docker image:

docker build -t podcast-transcription .

Run the container:

docker run -p 3000:3000 podcast-transcription

🖥️ CLI Tool

The project includes a command-line tool pt for transcribing audio files directly from the terminal.

CLI Usage

pt <input> [options]

Arguments:

<input> - Local file path or audio URL

Options:

-s, --summary - Generate AI summary after transcription
-l, --language <lang> - Language code: auto, en, zh, etc. (default: auto)
-o, --output <file> - Output file path (default: stdout)
--output-format <format> - Output format: text, json, markdown, srt (default: text)
-q, --quiet - Suppress progress output

CLI Examples

# Transcribe a local audio file
pt /path/to/podcast.mp3

# Transcribe with AI summary
pt podcast.mp3 --summary

# Generate SRT subtitles
pt podcast.mp3 --output-format srt -o subtitles.srt

# JSON output with summary
pt podcast.mp3 --summary --output-format json -o result.json

# Transcribe from URL
pt https://example.com/audio.mp3 --summary

Running the CLI

# Via npm script
npm run pt <input> [options]

# Or after global install
npm link
pt <input> [options]