Inspiration I’ve always felt that modern AI assistants like Siri or Alexa feel "disembodied" and blind. They can't see what I'm working on, they forget our conversations, and they have no personality. I wanted to build something that felt like a true companion—someone who lives on my desktop, watches my back while I code, and actually understands the context of my screen. Mizune was born from the idea of merging high-fidelity 3D interaction with a powerful, autonomous multi-agent brain.
What it does Mizune is a 3D anime companion that acts as an autonomous OS controller and coding coach. She doesn't just chat; she takes action. Through a multi-agent system, she can:
See your screen: Detect bugs in real-time and offer suggestions. Control your PC: Launch apps, manage system settings, and automate repetitive tasks. Manage your workflow: She integrates directly with GitLab to track issues and manage merge requests autonomously. Emote and Interact: She features full lip-sync and dynamic expressions based on the conversation's emotional context. How we built it The project is built on a modular multi-agent architecture.
The Brain: Powered by Google Gemini 2.5 Flash for complex planning and reasoning. The Eyes: Uses Groq Vision and Gemini Vision to analyze screen captures every 30 seconds. The Hands: A custom ActionExecutorAgent that uses PyAutoGUI and keyboard hooks to physically interact with the OS. GitLab Integration: We integrated the GitLab MCP server protocol to allow Mizune to act as a junior developer, managing repo issues and code status. The Soul: A Three.js and Electron frontend that renders a VRM model with real-time frequency-based lip-sync. Challenges we ran into Building an "autonomous" agent is a safety nightmare. I had to implement a strict safety classification system to ensure Mizune doesn't accidentally delete files or perform dangerous terminal commands without confirmation. Another huge hurdle was the latency in voice interaction—I solved this by implementing a 3-layer token economy and TTS caching system to keep her responses snappy and "human-like."
Accomplishments that we're proud of I’m incredibly proud of the "Coding Coach" mode. Seeing Mizune actually look at a snippet of buggy code on my screen and tease me about a missing semicolon—before I even realized it was gone—felt like magic. It moved the project from being a "tool" to being a "partner."
What we learned I learned that for an AI to be truly useful on a desktop, vision is not optional—it’s the foundation. I also learned a lot about agentic workflows; specifically, how to break down a vague human request like "Help me with this project" into atomic, executable steps that an API can actually handle.
Log in or sign up for Devpost to join the conversation.