Inspiration

I was inspired by my own frustration with repetitive PC tasks—opening multiple applications, navigating settings, managing files, and even uninstalling unwanted software. I noticed that existing voice assistants could only perform basic commands and lacked deep system integration or Hindi support. I wanted to build something that offers true hands-free PC mastery and empowers users of all abilities.

What it does

VoiceForge AI listens to your voice in Hindi or English, converts your command into text, understands the intent and context, and then executes multi-step workflows on your Windows PC. For example, you can say:

“Minecraft game ko delete kar mere PC se”
and it will automatically navigate Settings → Apps → Minecraft → Uninstall, handle permissions, and confirm the task—all without you lifting a finger. Other capabilities include:

  • Launching or closing any application
  • Organizing files and folders
  • Sending emails with attachments
  • Running system checks and maintenance tasks
  • Controlling media playback
  • Custom multi-step macros you define

How we built it

  • Speech Recognition: Python SpeechRecognition library with Google Speech API fallback + Whisper for offline mode
  • Natural Language Processing: OpenAI GPT-3.5 fine-tuned for intent recognition and context handling
  • System Control: Windows Automation API via pywinauto and PyAutoGUI for mouse/keyboard simulation
  • Screen Recognition: OpenCV + Tesseract OCR to read UI elements and click buttons in any app
  • Local Storage & Caching: SQLite for command history, Redis for fast in-memory caching
  • UI PyQt6 for a modern desktop interface with real-time waveform visualization
  • Security & Privacy: Option for full offline processing, encrypted local data storage

Challenges we ran into

Accomplishments that we're proud of

  • Delivered seamless bilingual voice control with 95%+ command accuracy in both Hindi and English.
  • Built a fully offline mode that processes critical commands locally, preserving user data privacy.
  • Created a modular architecture that allows third-party app integrations without custom coding.

What we learned

  • The importance of noise-robust audio preprocessing when dealing with diverse voice accents and backgrounds.
  • How to leverage GPT models for reliable intent extraction and context management in an end-to-end automation flow.

What's next for VoiceForge AI

  • Expand cross-platform support to include macOS and Linux desktops.
  • Introduce customizable user macros via a visual workflow editor.
  • Integrate with popular enterprise tools (e.g., Microsoft Office, Slack) via official APIs.
  • Add voice-driven scripting capabilities for advanced power users.
  • Launch a developer SDK for community-contributed plugins and integrations.

Built With

Share this project:

Updates