ThoughtPad

February 23, 2025 · View on GitHub

alt text

ThoughtPad is a note-taking utility created for the Linux desktop.

It's a speech-to-text GUI for taking your thoughts and getting them into semi-coherent text!

The app follows a two-stage process to create the cleaned up dictated text entries, both using OpenAI via the Whisper API (API ).

Firstly, speech to text (STT) using Whisper AI. Next, text formatting via GPT 3.5. The optimised transcript can then be quickly downloaded as a markdown file!

Use-Cases

I created this app to streamline a common workflow where I create dictated text that requires light cleaning.

By integrating speech-to-text and text formatting functionalities from the OpenAI API, I developed a single tool to simplify this process. The program also utilizes the LLM for intelligent title generation.

My primary use case for this app is to efficiently record contextual data for a vector database. The app outputs to Markdown, but it is versatile and can be applied to various tasks like diary entries, blog creation using speech-to-text, or any other creative use you can imagine.

Features

Voice recording with real-time waveform visualization
AI-powered transcription using OpenAI's models
Export capabilities to PDF and DocX formats
User-friendly GUI interface
Real-time audio visualization
Cross-platform compatibility

Additional Features

I may get around to adding the following:

User system prompt editing
System prompt library (choose between one of X saved system prompts to edit the text formatting style)

Installation

Clone the repository:

git clone https://github.com/danielrosehill/Thought-Pad.git
cd thoughtpad

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Run the application:

python app/main.py

Or build an executable:

python build.py

The built executable will be available in the dist directory.

Screenshots & Usage Instructions

The main interface has an audio recording tab, an info tab, and a settings tab. Add your OpenAI API key in Settings.

alt text

Select your microphone source and hit record. Pause if required. Reset to cancel the recording (standalone cancel button to be added!)

When you hit stop, the audio will be sent for transcription to Whisper and it wil appear as "raw text".

alt text

Click "format text"!

This will send the raw text collected in stage 1 to GPT 3.5 with a system prompt instructing the LLM to apply some light touch edits to clean up the captured dictation just a little.

alt text

In a few seconds, you'll get back the formatted text (2) and an autogenerated title (1).

alt text

Click on download and you'll generate a markdown file with both.

You can change/set the default download path in the settings page.

alt text

Dependencies

PyQt6 - GUI framework
OpenAI - For audio transcription
SoundDevice - Audio recording
PyQtGraph - Waveform visualization
FPDF2 - PDF export
python-docx - DocX export

Building from Source

The project includes a build script (build.py) that uses PyInstaller to create standalone executables. Run the build script to generate platform-specific executables.

Screenshots

Screenshots of the application can be found in the screenshots directory.

Built by Daniel Rosehill using Sonnet 3.5