BuildBuddy

Inspiration

We were all interested in building PCs, especially our teammate Zayd, who previously founded a computer building company. Through that experience, we saw firsthand how intimidating and error-prone the process can be, especially for beginners. Even with YouTube tutorials, it’s easy to get stuck, make mistakes, or have to rewind and rewatch steps. We wanted to make building feel guided, intuitive, and accessible to anyone.

What it does

BuildBuddy uses your camera and an AI agent powered by Gemma to guide you through building a PC in real time. It understands what you’re doing, provides step-by-step instructions, answers questions, and helps prevent mistakes. Instead of following scattered tutorials, users get a personalized, interactive experience tailored to their exact build.

How we built it

We built BuildBuddy using a Python backend and an Electron-based desktop frontend. The system captures live camera input, processes frames, and sends them to a multimodal AI model (Gemma) for reasoning. We implemented real-time prompting, chat interaction, and voice input, allowing users to communicate naturally with the system while building.

The backend is a local Gemma model run on a ASUS GX-10. We implemented a flow similar to tools like Claude Code that tracks session context to allow for fast response times that are still useful to the user, as well as a dynamic diagram over video feed.

Challenges we ran into

One of the biggest challenges was making the system feel truly real-time. Balancing performance with accurate AI responses required optimizing when and how often we processed frames. We also had to carefully design prompts so the AI would give precise, actionable guidance instead of generic responses. To solve this, we contemplated manual solutions before eventually deciding to allow Claude to take an iterative, research-like approach, trying many different solutions and quickly chipping away at the problem.

Accomplishments that we're proud of

We built a fully interactive system that combines vision, voice, and AI reasoning into one seamless experience. Our app can understand what the user is doing through the camera, automatically generate guidance, and respond to questions in context. We also implemented a clean, modern interface with voice input and a locally-run, chat-based assistant that makes the experience feel natural, responsive, and intuitive.

What we learned

We learned how powerful multimodal AI can be when combined with real-time interaction, particularly surrounding fuzzy logic related to vision. We were surprised how important prompt design turned out to be; it just as important as the underlying models. We also gained experience building a full-stack system that integrates computer vision, AI reasoning, and frontend UX into one cohesive product.

What's next for BuildBuddy

Next, we want to add full voice interaction with real-time text-to-speech so the assistant can guide users hands-free. We also plan to improve visual guidance by highlighting components and showing exactly where parts should go. Long term, we want to expand beyond PC building and create a system that can guide users through any hands-on task.

Built With

Updates

Sheraz Fayyaz started this project — Apr 26, 2026 10:58 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.