handy
January 9, 2025 ยท View on GitHub
A powerful voice-controlled productivity tool that combines speech recognition with AI assistance and keyboard automation. This tool allows users to control their computer, generate code, and interact with AI using voice commands triggered by keyboard shortcuts.
Features
- ๐๏ธ Local transcription using MLX Whisper
- ๐ค AI-powered command execution and responses using OpenRouter (Claude)
- ๐ป Code generation through voice commands
- โจ๏ธ Keyboard shortcut automation
- ๐ Direct text input from voice
- ๐ Smart context awareness using clipboard
Keyboard Shortcuts
- CTRL + SHIFT (Left): Execute voice commands for keyboard shortcuts
- CTRL + CMD (Right): Transcribe voice to text
- SHIFT + ALT (Left): Get AI assistance with context awareness
- CTRL + ALT + CMD (Left): Generate code from voice input with context support
Requirements
openai
pynput
sounddevice
mlx_whisper
pydantic
pyperclip
numpy
python-dotenv
Environment Variables
The following environment variables need to be set:
OPENROUTER_API_KEY=your_openrouter_api_key
Installation
- Clone the repository:
git clone [repository-url]
- Install dependencies:
uv pip install -r requirements.txt
- Set up environment variables as described above.
Usage
- Run the main script:
uv run handy.py
- Use keyboard shortcuts to activate different modes:
- Hold the designated key combination
- Speak your command
- Release the keys to process the command
Examples
-
Code Generation:
- Hold
CTRL + ALT + CMD(Left) - Say "create a Python function to sort a list"
- Release keys to get the generated code
- Hold
-
AI Assistance:
- Hold
SHIFT + ALT(Left) - Select text for context (optional)
- Ask your question
- Release to get AI response
- Hold
-
Voice Transcription:
- Hold
CTRL + CMD(Right) - Speak your text
- Release to transcribe
- Hold
-
Keyboard Commands:
- Hold
CTRL + SHIFT(Left) - Say "copy" or "paste" or other keyboard shortcuts
- Release to execute the command
- Hold
Architecture
AudioRecorder: Handles real-time audio recording and processingKeyboardShortcut: Manages keyboard combinations and actionsContextManager: Handles clipboard-based context awareness- AI Integration: Uses OpenRouter with Claude for intelligent responses
- MLX Whisper: Provides fast and accurate speech-to-text conversion
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
- MLX Whisper for speech recognition
- OpenRouter and Claude for AI capabilities
- The open-source community for various dependencies