VocalFlow
September 6, 2025 · View on GitHub
Universal Voice Dictation System with Enhanced Command Processing
Cross-platform voice-to-text with adaptive learning and intelligent voice commands
Production Ready
Unified architecture with real-time voice dictation and intelligent command processing.
Linux is fully tested. Windows/macOS have basic support.
Overview
VocalFlow is a unified voice dictation system that combines real-time speech-to-text transcription with intelligent voice command processing. The system features adaptive learning, context awareness, and supports over 600 natural language command variations. It creates personalized user profiles that adapt to individual speech patterns and usage habits.
Key Features
| Real-Time Voice Dictation | High-quality speech-to-text using Whisper models with adaptive voice learning |
| Intelligent Voice Commands | 607+ natural language command variations with regional dialect support |
| Context Awareness | Automatically adapts to coding, writing, email, and documentation contexts |
| Auto-Completion System | Smart phrase completion and expansion with learning capabilities |
| Workflow Learning | Learns usage patterns and provides time-based workflow suggestions |
| Cross-Platform Support | Linux (fully tested), Windows/macOS (basic support) |
Installation
Prerequisites
- Python: 3.8+
- Operating System: Linux (Ubuntu 18.04+), Windows 10+, macOS 10.14+
- Memory: 4GB RAM minimum
- Audio: Working microphone
Linux Installation (Recommended)
# Install system dependencies
sudo apt update
sudo apt install -y python3-pip python3-venv portaudio19-dev xdotool xclip gnome-screenshot
# Clone repository
git clone https://github.com/R3DK3LL/VocalFlow.git
cd VocalFlow
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
pip install pyyaml
# Test installation
python3 -m vocalflow.main --command-status
Windows/macOS Installation
# Clone repository
git clone https://github.com/R3DK3LL/VocalFlow.git
cd VocalFlow
# Create virtual environment
python3 -m venv venv
# Windows: venv\Scripts\activate
# macOS: source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
pip install pyyaml
# Test basic functionality
python3 -m vocalflow.main --list-devices
Quick Start
Basic Voice Dictation
# Start dictation with default settings
python3 -m vocalflow.main
# Create named profile
python3 -m vocalflow.main --profile work
# Use clipboard output
python3 -m vocalflow.main --output clipboard
Voice Commands Enabled
# Start with full command processing
python3 -m vocalflow.main --enable-commands
# Check command system status
python3 -m vocalflow.main --command-status
# Disable commands for dictation only
python3 -m vocalflow.main --disable-commands
Available Voice Commands
Once running, you can use natural voice commands:
Application Control:
- "open browser" / "launch browser" / "fire up browser"
- "open terminal" / "launch terminal"
- "take screenshot" / "capture screen"
System Control:
- "volume up" / "increase volume" / "louder"
- "volume down" / "decrease volume" / "quieter"
- "volume mute" / "mute"
Text Operations:
- Natural dictation automatically appears as typed text
- Commands are executed instead of being transcribed
Architecture
Unified Processing Pipeline
Microphone → Audio Processing → Whisper Transcription → Enhanced Processing
↓
Is it a voice command?
↙ ↘
Execute Command Format & Output Text
Core Components
- vocalflow/main.py: Core dictation engine with integrated command processing
- linguistic_bridge.py: Command recognition and processing logic
- enhanced_agents.py: Intelligent agent system with context awareness
- linguistic/: Command variations database and matching engine
Intelligent Features
Context Awareness: Automatically detects and adapts to:
- Coding contexts (Python, JavaScript, etc.)
- Email composition
- Documentation writing
- Creative writing
Auto-Completion: Expands common phrases:
- "good morning" → "Good morning, I hope you're doing well."
- "thank you for" → "Thank you for your time and consideration."
- "def main" → "def main():\n pass"
Workflow Learning: Tracks patterns and suggests improvements based on usage time and frequency.
User Profiles
Profile Management
# List available profiles
python3 -m vocalflow.main --list-profiles
# Create new profile
python3 -m vocalflow.main --profile newuser
# Use existing profile
python3 -m vocalflow.main --profile work
Profile Features
Profiles automatically learn and adapt:
- Voice characteristics and audio thresholds
- Common vocabulary and corrections
- Command usage patterns
- Context preferences
Profiles are stored in ~/.vocalflow/profiles/ with individual learning data.
Command System
Natural Language Processing
The system recognizes commands using:
- 607 command variations across multiple categories
- Regional dialects: UK, US, Australian/New Zealand
- Natural speech patterns: including fillers, polite prefixes
- Confidence scoring: ensures accurate command recognition
Command Categories
Browser Operations: Open/close web browsers with intelligent application discovery
System Control: Volume, brightness, screenshot functionality
File Operations: Create, save, open files with context awareness
Navigation: Scroll, page navigation, window switching
Text Editing: Select, copy, paste, undo operations
Adaptive Learning
When commands aren't recognized, the system:
- Analyzes failed attempts for learning opportunities
- Suggests appropriate command categories
- Provides confidence scores for suggestions
- Learns from user corrections over time
Configuration Options
Command Line Options
# Profile management
--profile NAME Use specific voice profile
--list-profiles Show available profiles
# Command processing
--enable-commands Enable voice commands (default)
--disable-commands Dictation only mode
--command-status Show command system status
# Audio settings
--audio-device ID Use specific audio device
--list-devices Show available audio devices
# Output options
--output METHOD clipboard, xdotool, wintype
--model SIZE tiny, base, small, medium, large
--language CODE Language for recognition (en, es, etc.)
# Debugging
--verbose Enable detailed logging
Advanced Configuration
The system supports extensive customization through:
- Custom command variations in YAML format
- Configurable confidence thresholds
- Context-specific vocabulary
- Learning rate adjustments
Troubleshooting
Common Issues
Audio Device Problems:
# List available devices
python3 -m vocalflow.main --list-devices
# Test specific device
python3 -m vocalflow.main --audio-device 0
Command Recognition Issues:
# Check command system
python3 -m vocalflow.main --command-status
# Test with verbose logging
python3 -m vocalflow.main --verbose
Missing Dependencies:
# Linux: Install system packages
sudo apt install portaudio19-dev xdotool xclip gnome-screenshot
# Python: Install missing packages
pip install pyyaml numpy sounddevice faster-whisper pyperclip
Performance Optimization
- Use smaller Whisper models for faster processing
- Adjust audio device buffer sizes
- Configure appropriate confidence thresholds
- Use GPU acceleration when available
Development
Project Structure
VocalFlow/
├── vocalflow/ # Core dictation system
│ ├── main.py # Main application with integrated processing
│ └── ...
├── linguistic/ # Command recognition system
│ ├── command_variations.yaml # 607+ command variations
│ └── enhanced_command_matcher.py
├── enhanced_agents.py # Complete intelligent agent system
├── linguistic_bridge.py # Integration bridge
└── requirements.txt # Python dependencies
Adding Custom Commands
Extend the system by modifying linguistic/command_variations.yaml:
command_categories:
custom_operations:
my_command:
variations:
- "custom phrase"
- "alternative phrase"
base_regex: "(custom|alternative).*phrase"
confidence_threshold: 0.7
API Usage
from vocalflow.main import VocalFlowSystem
from enhanced_agents import EnhancedVocalFlowAgent
# Create system with enhanced commands
system = VocalFlowSystem(enable_commands=True)
system.start()
# Or use agent directly
agent = EnhancedVocalFlowAgent()
result = agent.process_text_enhanced("open browser")
Contributing
Contributions are welcome, particularly for:
- Windows/macOS platform improvements
- Additional command variations and languages
- Performance optimizations
- Documentation enhancements
Development Setup
git clone https://github.com/R3DK3LL/VocalFlow.git
cd VocalFlow
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install pyyaml
# Run tests
python3 -m vocalflow.main --command-status
python3 -m vocalflow.main --verbose --disable-commands
System Requirements
Minimum Requirements
- Python 3.8+
- 4GB RAM
- Audio input device
- 2GB disk space (including Whisper models)
Recommended Requirements
- Python 3.10+
- 8GB RAM
- Quality microphone
- GPU for faster processing
- Linux environment for full feature support
License
MIT License - see LICENSE file for details.
Acknowledgments
- faster-whisper for efficient speech recognition
- sounddevice for cross-platform audio
- Computational linguistics research for natural language processing techniques
A production-ready voice dictation system with intelligent command processing
GitHub • Issues • Discussions