Inspiration

Growing up, we loved playing with LEGO. There was something that just clicked with us about those visual instructions. There was no complex language, just clear steps that anyone could follow. As we grew older, we saw the same elegance in IKEA manuals, which made building their furniture much easier.

But when we tried working on something without a manual, that elegance was gone. We watched ourselves struggle to fix household items, endlessly scrolling through wordy guides or scrubbing through 20-minute YouTube videos just to find one specific instruction. It was ineffective and frustrating.

If only everything could be as simple as a LEGO set. 

That's how Buildo was born. We wanted to create a sort of manual generator, a one-stop platform that takes the chaotic reality of building and tinkering and transforms it into the clean, step-by-step visual simplicity we loved as kids.

What it does

Visual Diagnosis: Users snap a photo of their broken item or materials that they have. Buildo uses Gemini 3 Pro Preview to analyze the objects. 

Smart Toolbox Scanning: No need to type out a list of tools. Users can just upload an image of their open toolbox. The Gemini 3 Pro Preview model recognizes the available equipment and adjusts the generated instructions to match what the user actually owns.

Intelligent Part Substitution: In the Briefing Phase, users can tap any item that they do not have, and Buildo searches for and suggests safe household alternatives. For example, "Use a rubber band for temporary grip instead of a clamp".

Adaptive Questions: If the initial photo is ambiguous, Buildo asks clarifying questions to ensure the generated manual fits the user's exact goals.

Manual and Blueprint Generation: Using Gemini’s Image Generation capabilities in Gemini 3 Pro Image Preview, it generates a Visual Style Manifest to ensure total consistency. It then creates a step-by-step guide where every illustration is rendered in a clean, isometric technical style. This allows the user to focus strictly on the assembly task.

Functional Audit: In the final phase, Buildo performs a Complexity Audit. For simple items, it scores the visual match out of 10. For complex mechanisms, it generates an interactive Functional Checklist that the user must physically confirm to achieve a 10/10 score.

How we built it

The Brain: We used Gemini 3 Pro Preview model for all logic-heavy tasks like diagnosis, tool matching, and safety auditing. We strictly enforced JSON Schemas to make Gemini return structured arrays for parts lists, safety warnings, and step sequences, allowing us to render a rich UI rather than just text bubbles.

The Images: We utilized Gemini 3 Pro Image Preview to generate all the images shown in the blueprint.

Google AI Studio: We used Google AI Studio Code Assistant to help write the bulk of the code. It helped us to turn our ideas into code. Google AI Studio also provided an intuitive interface where we could test the web app as we built it, allowing us to continue building on our initial ideas.

Challenges we ran into

Rate Limits: As we pushed the Gemini models to their limits with complex, multi-modal chains of images and videos, we frequently hit 429 Resource Exhausted errors. We had to remove the use of animations and video in our web app to ensure that the rate limit would not be hit so easily and to ensure the user experience remained smooth, even when the API was busy.

Inconsistent images: A major technical hurdle was consistency in Gemini generated content, especially in the images. An image generated in step 1 was sometimes completely different at step 5. We solved this by creating a Visual Style Manifest system, extracting the object's visual characteristics in the first phase and injecting it into every subsequent image generation call to keep the images consistent.

Accomplishments that we're proud of

Rapid Execution: We are incredibly happy to be able to build a fully functional, multi-modal application in such a short period of time. Going from a rough idea to a polished web app was a massive challenge, but the long hours and the ease of Gemini and Google AI Studio allowed us to move at lightning speed.

What we learned

Turning Ideas into Reality is Becoming Much Easier: We learned just how drastically the barrier to entry for building world-class software has lowered. Tools like Google AI Studio acted as a force multiplier, allowing us to translate a complex concept into a deployed application in a few days rather than months. It taught us that the only limit now is our imagination, not just our coding speed.

The Rapid Advancements in Multimodal AI: We were in awe to see just how far AI has come in a few years. We learned that modern Gemini models do not just generate text anymore. They have become reasoning engines that can see, understand context, and structure data. Realizing we could chain Vision, Text, and Image Generation into a single seamless workflow was a powerful lesson in the new capabilities of modern development.

What's next for Buildo

Video Generation: Using the Veo 3.0 models to generate short video clips for complex movements that are hard to show in static images.

Augmented Reality Overlay: Using the camera to highlight exactly where a screw should go in real-time, rather than just showing a generated image.

Built With

Share this project:

Updates