Inspiration

Ever since watching Iron Man, I was fascinated by the idea of having my own Jarvis-like assistant. Being deeply interested in technology, I love building new tools and experimenting with things that look both cool and useful.

I always wanted a virtual assistant, but unlike Siri or Alexa, I wanted something truly personal assistant that runs on my laptop, responds to my voice, and helps me directly with system controls and daily tasks. That dream led to the creation of Ultron.

What I Learned

Building Ultron taught me how to integrate many technologies into one modular system:

  • Speech-to-Text (STT) → Capturing my voice commands and transcribing them accurately.
  • Text-to-Speech (TTS) → Giving Ultron a natural voice response using PowerShell/.NET or pyttsx3.
  • Wake-word & Hotkeys → Setting up Porcupine/OpenWakeWord and global hotkeys for hands-free or keyboard-triggered activation.
  • Website & App Control → Opening apps like Chrome, Gmail, YouTube, or custom sites with voice commands.
  • Site Search → Performing searches directly on any website (e.g., “Search BestBuy for RTX 4070”).
  • Weather Forecasting → Using the Open-Meteo API for real-time current, daily, and forecast weather.
  • Audio Controls → Adjusting volume, muting/unmuting, and switching between audio outputs like headphones and speakers.
  • Display & Brightness → Changing brightness, toggling Night Light, and switching display modes (extend, clone, projector, etc.).
  • Wi-Fi Management → Checking status, turning Wi-Fi on/off, connecting/disconnecting from networks.
  • Power & Battery → Putting the system to sleep, shutting down, restarting, locking, and checking battery percentage.
  • Window Controls → Minimizing, maximizing, and closing the current active window.
  • Utilities → Taking and saving screenshots.
  • Gemini Fallback → Answering general knowledge or open-ended queries using Gemini when no direct intent is matched.

How I Built It

Ultron was built with:

  • Python as the core language.
  • pynput for hotkeys and keyboard event handling.
  • Porcupine / OpenWakeWord for wake-word detection.
  • Requests for HTTP API calls.
  • python-dotenv for managing environment variables.
  • Custom skills modules for weather, system control, app launching, site search, etc.
  • Gemini API fallback for general knowledge and Q&A.

Challenges Faced

  • Handling natural speech variations, e.g., “weather now”, “today’s forecast”, or “how’s the weather in Newark tomorrow”.
  • Designing a site search that works on any website, not just pre-defined ones.
  • Managing system-level controls like brightness, volume, Wi-Fi, and apps across different environments.
  • Debugging geocoding errors, where words like “tomorrow” were misread as city names.
  • Keeping Ultron both lightweight and modular while adding more features.

Conclusion

Ultron represents my personal vision of a next-gen assistant:

  • Voice-first,
  • Modular and extendable,
  • Able to control my laptop, apps, and the web,
  • And always evolving with new features as I continue to learn.

Built With

Share this project:

Updates