Inspiration

Have you thought about our generation's dependence on instant gratification? Everything is right there at our fingertips and with just a touch, a prompt, or a voice command, things are done at our behest.

What if we change that? What if we take away the "instant" and the "gratification"?

What if we put BakuDeku on your computer? We wanted to create a disruptive desktop companion that breaks your daily norm. Instead of an AI that politely helps you, we built two agents that observe your every move and intervene with needless visual novel-style sass and audio.

What it does

BakuDeku bot is an Electron app that runs in the background of your device, tracking your actions to ensure you stay productive... under their conditions, of course.

  • Status Reports: Every 30 minutes, it runs a status check. If it sees you’ve been inactive for too long, it will scold you.
  • Input Monitoring: Using uiohook-napi, it tracks your keystroke counts and clicks. Too many clicks? I guess you’ll be hearing from Bakugou.
  • Screen Awareness: It takes screenshots every 10 minutes, analyzing them via the Google Cloud Vision API to see if you’re doing something... indecent or distracting.
  • App Tracking: Opening social media or Steam? With active-win tracking your windows, you’re getting an earful from Deku.

The agents communicate via ElevenLabs AI, using custom-tuned voices and base64 audio data delivered directly to a transparent, frameless overlay UI.

How we built it

We built the desktop client using Electron, React, and TypeScript, utilizing a frameless overlay that toggles interactivity via setIgnoreMouseEvents. For the backend, we used FastAPI and Python to orchestrate the AI logic.

For the "brains," we integrated:

  • Gemini (google-genai): The orchestrator that decides how the agents react to your screen.
  • Google Cloud Vision: To give the agents "eyes" to see your desktop.
  • ElevenLabs: To give agents the voice and personality of your favorite My Hero characters (couldn't be me, my goat Hiromi Higuruma is unique 🐐🐐)
  • System Hooks: uiohook-napi to monitor and occasionally simulate user input.

Challenges we ran into

Integrating global input monitoring in Electron was hard. Getting uiohook-napi and active-win to behave across different environments required some deep dives into system-level permissions, something that Mac and Linux really disliked (I use Arch btw).

We also had to figure out how to make the overlay UI completely transparent and "click-through" so it didn't actually prevent you from working unless the agents wanted to get in your way.

Accomplishments that we're proud of

We’re really proud of the seamless orchestration between the Python backend and the Electron frontend. Being able to capture a screen, analyze it with Vision API, generate a snarky response with Gemini, and play back an ElevenLabs voice line in near real-time feels like magic (even if that magic is just Bakugou yelling at you).

What we learned

We learned a ton about asynchronous task execution in FastAPI and how to handle base64 audio streaming in a React frontend. We also got a crash course in Google Cloud’s Authentication Default Credentials (ADC) and how to manage multiple third-party APIs without the latency killing the "vibe" of the agent.

What's next for BakuDeku bot

We aim to have BakuDeku bot be a more permanent part of your life. For the future, we want to integrate the following:

  • Local Summarization: Implementing Gemma for local context processing.
  • 24/7 Voice Surveillance: Using ElevenLabs ConvAI for real-time vocal feedback.
  • More "Friends": Expanding the app to include more simultaneous agents from the UA roster.
  • Literally living in your walls: Integration with other IoT devices to make your MHA experience really permanent. Phone apps, Google Nest, etc. will all contribute to a beautiful friendship with your bae Bakugou.

Built With

Share this project:

Updates