VoiceForge AI

dummy model design of our idea

Inspiration

I was inspired by my own frustration with repetitive PC tasks—opening multiple applications, navigating settings, managing files, and even uninstalling unwanted software. I noticed that existing voice assistants could only perform basic commands and lacked deep system integration or Hindi support. I wanted to build something that offers true hands-free PC mastery and empowers users of all abilities.

What it does

VoiceForge AI listens to your voice in Hindi or English, converts your command into text, understands the intent and context, and then executes multi-step workflows on your Windows PC. For example, you can say:

“Minecraft game ko delete kar mere PC se”
and it will automatically navigate Settings → Apps → Minecraft → Uninstall, handle permissions, and confirm the task—all without you lifting a finger. Other capabilities include:

Launching or closing any application

Organizing files and folders

Sending emails with attachments

Running system checks and maintenance tasks

Controlling media playback

Custom multi-step macros you define

How we built it

Speech Recognition: Python SpeechRecognition library with Google Speech API fallback + Whisper for offline mode
Natural Language Processing: OpenAI GPT-3.5 fine-tuned for intent recognition and context handling
System Control: Windows Automation API via pywinauto and PyAutoGUI for mouse/keyboard simulation
Screen Recognition: OpenCV + Tesseract OCR to read UI elements and click buttons in any app
Local Storage & Caching: SQLite for command history, Redis for fast in-memory caching
UI PyQt6 for a modern desktop interface with real-time waveform visualization
Security & Privacy: Option for full offline processing, encrypted local data storage

Challenges we ran into

Accomplishments that we're proud of

Delivered seamless bilingual voice control with 95%+ command accuracy in both Hindi and English.
Built a fully offline mode that processes critical commands locally, preserving user data privacy.
Created a modular architecture that allows third-party app integrations without custom coding.

What we learned

The importance of noise-robust audio preprocessing when dealing with diverse voice accents and backgrounds.
How to leverage GPT models for reliable intent extraction and context management in an end-to-end automation flow.

What's next for VoiceForge AI

Expand cross-platform support to include macOS and Linux desktops.
Introduce customizable user macros via a visual workflow editor.
Integrate with popular enterprise tools (e.g., Microsoft Office, Slack) via official APIs.
Add voice-driven scripting capabilities for advanced power users.
Launch a developer SDK for community-contributed plugins and integrations.

Built With

openai
opencv
pyqt6
python
speechrecognition
whisper

Updates

yashraj sachin ghemud started this project — Oct 13, 2025 03:51 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.