ASR and STT AI Notebook - Prompts & Outputs

November 21, 2025 ยท View on GitHub

This repository contains prompts and AI-generated outputs exploring Automatic Speech Recognition (ASR) and Speech-to-Text (STT) topics, with a focus on fine-tuning Whisper variants and related models.

Repository Structure

This is primarily a collection of:

  • Research prompts sent to AI assistants
  • AI-generated responses and analyses
  • Notes and insights from hands-on experimentation
  • Technical documentation on ASR/STT model fine-tuning

Content Organization

  • prompts/to-run/: Queued prompts for future exploration
  • prompts/processed/: Completed prompt-output pairs
  • notes/: Consolidated insights and documentation
  • data-preparation/: Data preparation workflows and scripts
  • export/: Compiled PDF exports for offline reading

Exported Materials

Complete Guide

Split by Topic (4 Parts)

  1. Book 1: Core Concepts
  2. Book 2: Advanced Techniques
  3. Book 3: Implementation Details
  4. Book 4: Practical Applications

Podcast Format (NEW!)

Convert the entire notebook into an audio podcast for listening on the go. See PODCAST-WORKFLOW.md for details.

Quick Start:

cd scripts
export OPENROUTER_API_KEY='your-key-here'
./create-podcast.sh

Features:

  • Natural-sounding TTS with Microsoft Edge neural voices
  • SSML-enhanced speech with proper pacing and emphasis
  • Individual audio files by topic or single concatenated podcast
  • Free TTS (edge-tts), minimal cost for AI conversion (~$1-5)

AI/Human Collaboration

Content in this repository is generated through:

  • Prompts authored by Daniel with AI-generated responses
  • Human-written notes refined by AI assistants
  • Collaborative exploration of ASR/STT concepts

Author

Daniel Rosehill danielrosehill.com

With assistance from Claude (Anthropic)