VocalFlow

September 6, 2025 · View on GitHub

Universal Voice Dictation System with Enhanced Command Processing

Cross-platform voice-to-text with adaptive learning and intelligent voice commands

Production Ready
Unified architecture with real-time voice dictation and intelligent command processing.
Linux is fully tested. Windows/macOS have basic support.

Overview

VocalFlow is a unified voice dictation system that combines real-time speech-to-text transcription with intelligent voice command processing. The system features adaptive learning, context awareness, and supports over 600 natural language command variations. It creates personalized user profiles that adapt to individual speech patterns and usage habits.

Key Features

Real-Time Voice Dictation	High-quality speech-to-text using Whisper models with adaptive voice learning
Intelligent Voice Commands	607+ natural language command variations with regional dialect support
Context Awareness	Automatically adapts to coding, writing, email, and documentation contexts
Auto-Completion System	Smart phrase completion and expansion with learning capabilities
Workflow Learning	Learns usage patterns and provides time-based workflow suggestions
Cross-Platform Support	Linux (fully tested), Windows/macOS (basic support)

Installation

Prerequisites

Python: 3.8+
Operating System: Linux (Ubuntu 18.04+), Windows 10+, macOS 10.14+
Memory: 4GB RAM minimum
Audio: Working microphone

Linux Installation (Recommended)

# Install system dependencies
sudo apt update
sudo apt install -y python3-pip python3-venv portaudio19-dev xdotool xclip gnome-screenshot

# Clone repository
git clone https://github.com/R3DK3LL/VocalFlow.git
cd VocalFlow

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt
pip install pyyaml

# Test installation
python3 -m vocalflow.main --command-status

Windows/macOS Installation

# Clone repository
git clone https://github.com/R3DK3LL/VocalFlow.git
cd VocalFlow

# Create virtual environment
python3 -m venv venv
# Windows: venv\Scripts\activate
# macOS: source venv/bin/activate

# Install dependencies
pip install -r requirements.txt
pip install pyyaml

# Test basic functionality
python3 -m vocalflow.main --list-devices

Quick Start

Basic Voice Dictation

# Start dictation with default settings
python3 -m vocalflow.main

# Create named profile
python3 -m vocalflow.main --profile work

# Use clipboard output
python3 -m vocalflow.main --output clipboard

Voice Commands Enabled

# Start with full command processing
python3 -m vocalflow.main --enable-commands

# Check command system status
python3 -m vocalflow.main --command-status

# Disable commands for dictation only
python3 -m vocalflow.main --disable-commands

Available Voice Commands

Once running, you can use natural voice commands:

Application Control:

"open browser" / "launch browser" / "fire up browser"
"open terminal" / "launch terminal"
"take screenshot" / "capture screen"

System Control:

"volume up" / "increase volume" / "louder"
"volume down" / "decrease volume" / "quieter"
"volume mute" / "mute"

Text Operations:

Natural dictation automatically appears as typed text
Commands are executed instead of being transcribed

Architecture

Unified Processing Pipeline

Microphone → Audio Processing → Whisper Transcription → Enhanced Processing
                                                             ↓
                                               Is it a voice command?
                                                    ↙        ↘
                                          Execute Command    Format & Output Text

Core Components

vocalflow/main.py: Core dictation engine with integrated command processing
linguistic_bridge.py: Command recognition and processing logic
enhanced_agents.py: Intelligent agent system with context awareness
linguistic/: Command variations database and matching engine

Intelligent Features

Context Awareness: Automatically detects and adapts to:

Coding contexts (Python, JavaScript, etc.)
Email composition
Documentation writing
Creative writing

Auto-Completion: Expands common phrases:

"good morning" → "Good morning, I hope you're doing well."
"thank you for" → "Thank you for your time and consideration."
"def main" → "def main():\n pass"

Workflow Learning: Tracks patterns and suggests improvements based on usage time and frequency.

User Profiles

Profile Management

# List available profiles
python3 -m vocalflow.main --list-profiles

# Create new profile
python3 -m vocalflow.main --profile newuser

# Use existing profile
python3 -m vocalflow.main --profile work

Profile Features

Profiles automatically learn and adapt:

Voice characteristics and audio thresholds
Common vocabulary and corrections
Command usage patterns
Context preferences

Profiles are stored in ~/.vocalflow/profiles/ with individual learning data.

Command System

Natural Language Processing

The system recognizes commands using:

607 command variations across multiple categories
Regional dialects: UK, US, Australian/New Zealand
Natural speech patterns: including fillers, polite prefixes
Confidence scoring: ensures accurate command recognition

Command Categories

Browser Operations: Open/close web browsers with intelligent application discovery System Control: Volume, brightness, screenshot functionality
File Operations: Create, save, open files with context awareness Navigation: Scroll, page navigation, window switching Text Editing: Select, copy, paste, undo operations

Adaptive Learning

When commands aren't recognized, the system:

Analyzes failed attempts for learning opportunities
Suggests appropriate command categories
Provides confidence scores for suggestions
Learns from user corrections over time

Configuration Options

Command Line Options

# Profile management
--profile NAME          Use specific voice profile
--list-profiles         Show available profiles

# Command processing
--enable-commands       Enable voice commands (default)
--disable-commands      Dictation only mode
--command-status        Show command system status

# Audio settings
--audio-device ID       Use specific audio device
--list-devices         Show available audio devices

# Output options
--output METHOD         clipboard, xdotool, wintype
--model SIZE           tiny, base, small, medium, large
--language CODE        Language for recognition (en, es, etc.)

# Debugging
--verbose              Enable detailed logging

Advanced Configuration

The system supports extensive customization through:

Custom command variations in YAML format
Configurable confidence thresholds
Context-specific vocabulary
Learning rate adjustments

Troubleshooting

Common Issues

Audio Device Problems:

# List available devices
python3 -m vocalflow.main --list-devices

# Test specific device
python3 -m vocalflow.main --audio-device 0

Command Recognition Issues:

# Check command system
python3 -m vocalflow.main --command-status

# Test with verbose logging
python3 -m vocalflow.main --verbose

Missing Dependencies:

# Linux: Install system packages
sudo apt install portaudio19-dev xdotool xclip gnome-screenshot

# Python: Install missing packages
pip install pyyaml numpy sounddevice faster-whisper pyperclip

Performance Optimization

Use smaller Whisper models for faster processing
Adjust audio device buffer sizes
Configure appropriate confidence thresholds
Use GPU acceleration when available

Development

Project Structure

VocalFlow/
├── vocalflow/              # Core dictation system
│   ├── main.py            # Main application with integrated processing
│   └── ...
├── linguistic/            # Command recognition system
│   ├── command_variations.yaml    # 607+ command variations
│   └── enhanced_command_matcher.py
├── enhanced_agents.py      # Complete intelligent agent system
├── linguistic_bridge.py   # Integration bridge
└── requirements.txt       # Python dependencies

Adding Custom Commands

Extend the system by modifying linguistic/command_variations.yaml:

command_categories:
  custom_operations:
    my_command:
      variations:
        - "custom phrase"
        - "alternative phrase" 
      base_regex: "(custom|alternative).*phrase"
      confidence_threshold: 0.7

API Usage

from vocalflow.main import VocalFlowSystem
from enhanced_agents import EnhancedVocalFlowAgent

# Create system with enhanced commands
system = VocalFlowSystem(enable_commands=True)
system.start()

# Or use agent directly
agent = EnhancedVocalFlowAgent()
result = agent.process_text_enhanced("open browser")

Contributing

Contributions are welcome, particularly for:

Windows/macOS platform improvements
Additional command variations and languages
Performance optimizations
Documentation enhancements

Development Setup

git clone https://github.com/R3DK3LL/VocalFlow.git
cd VocalFlow
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install pyyaml

# Run tests
python3 -m vocalflow.main --command-status
python3 -m vocalflow.main --verbose --disable-commands

System Requirements

Minimum Requirements

Python 3.8+
4GB RAM
Audio input device
2GB disk space (including Whisper models)

Recommended Requirements

Python 3.10+
8GB RAM
Quality microphone
GPU for faster processing
Linux environment for full feature support

License

MIT License - see LICENSE file for details.

Acknowledgments

faster-whisper for efficient speech recognition
sounddevice for cross-platform audio
Computational linguistics research for natural language processing techniques

A production-ready voice dictation system with intelligent command processing

GitHub • Issues • Discussions

# CI configuration updated