Homepage showing how users can upload furniture manuals and instantly convert them into interactive 3D guides.
Search dialog where users browse and select from a public library of converted IKEA assembly manuals.
Library view displaying a collection of manuals transformed into clean, technical 3D visualizations.
Step-by-step assembly interface with real-time extraction and visualization of parts and instructions for each manual step.

Inspiration

Everyone who has ever assembled flat-pack furniture knows the pain — confusing arrows, unclear diagrams, and the eternal question: which screw is this?
We wanted to turn that frustration into clarity. Inspired by IKEA manuals, we asked: What if AI could actually read an instruction manual and bring it to life in 3D?
That idea became the foundation for our project — transforming static manuals into interactive, visual, and conversational guides anyone can understand.

What It Does

Our system transforms furniture manuals (PDFs or images) into interactive 3D visualizations with voice-powered assistance.

Upload a manual and the system automatically:

Extracts parts, step numbers, and text using Gemini 2.5 Pro for structured understanding
Generates Scene JSON and step code via Gemini 2.5 Flash
Renders step-by-step animations with labeled components
Lets users talk to their manual — for voice commands and spoken responses

The result: a blueprint-style 3D visualization showing each assembly step clearly, with part callouts and thumbnails for quick reference.

Some Perspectives From A First Time Hacker (Senan's Writings and Ramblings)

This was my first hackathon, and before we get to the technical part of it all, I’d love to share some thoughts.

I’ve never attended a hackathon before, thus that first-time hacker badge which I wear with pride (metaphorically — no badges were given out, unfortunately). This isn’t to say I haven’t built software — over the past 4 years of my university career, I’ve built out a great many software projects, both on my own and in class, some of which I even launched and got real users for. I’ve also had the privilege of 6 internships where I worked on building digital products at scale with thousands to millions of DAU.

You may think this has prepared me for a hackathon — if you do, you’re wrong. A hackathon is a different beast entirely. You meet strangers over Discord (in my case at least), you spend an hour thinking of an idea, and then you get to building. No clear product requirements or sprint plans. No such thing as a QA team or in-depth code reviews — it’s a run and pray to whichever higher power you do or don’t believe in that it’ll work, most of the time.

Now, despite having spent the past 8 years of my life coding in some capacity, this happens to be the first time I’ve done so while I’m on 3 hours of sleep and an uncountable number of Red Bulls — I lost track after 6. It’s weird that in a weekend I’ve learned more about myself and building than at almost any other point in my professional and academic career thus far. Learning to delegate tasks with people I’ve known for an hour, trying to parse through pages of documentation, and stitching together what I can with the wonders of vibe coding due to some unbelievable time constraints — it’s certainly an experience, and one I’m glad to have had.

As I write this, it’s 3:21 a.m., and I’ve been awake for some 22 hours. And yet this small building in a big city is roaring alive with kids from all over the world. All working away toward the same goal. The energy of the people around me is one I don't think I have the words to describe. It's reignited a passion for building and developing that I almost forgot I had. It’s a surreal feeling and a reminder of just how great a weekend this has been. From meeting amazing people in industry and academia (my glorious king David Malan), getting to see a campus that’s 230 years older than the country I’m from (Leafs fan for life, by the way), to scouring an entire building to find a desk comfy enough to sleep on — and most importantly, learning more than I ever thought I could in the span of 36 hours. I’m lucky to be leaving this hackathon a better developer and a more ambitious person than when I came in.

Thanks, HackHarvard, and all the amazing folks who made this weekend possible :)

How We Built It

Frontend

Next.js + React + TypeScript for a clean, minimal interface centered on 3D visualization
Three.js renders procedural step animations from Gemini’s generated Scene JSON
Web Speech API handles real-time voice input and text-to-speech playback

Backend

Next.js API routes orchestrate uploads and extraction (see frontend/app/api/extract-steps/route.ts)
An Express + TypeScript service powers Gemini interactions
Google Generative AI SDK bridges backend calls to Gemini 2.5 Pro (for extraction) and Gemini 2.5 Flash (for code generation)

Pipeline

PDFs are converted to page images via PDF.js
Text and diagram data are extracted
Gemini 2.5 Pro identifies tools, parts, and instructions
Gemini 2.5 Flash generates Scene JSON describing 3D structures and assembly sequences
The frontend renders dynamic Three.js visualizations for each step

Challenges We Faced

Parsing diverse and inconsistent manual layouts
Getting Gemini to produce consistent and valid Scene JSON
Designing minimal 3D representations that still convey assembly detail
Implementing a pseudo-sandboxed 3D runner that safely executes generated code inside the browser
Making voice interactions feel responsive despite latency from text processing

Accomplishments We’re Proud Of

Automatically structuring assembly data from unmodified PDF manuals
Rendering clean, animated 3D sequences that mirror instruction steps
Prototyping a voice interface that enables natural spoken queries
Delivering a cohesive full-stack system that runs end-to-end on real IKEA manuals

What We Learned

Gemini Pro handles spatial reasoning surprisingly well for 2D-to-3D mapping
In-browser code execution needs strict guardrails even when sandboxed by convention
Minimalism in technical drawing-style visualization improves clarity over photorealism

What’s Next

Wire the voice interface to live Gemini QA responses for real conversational help
Integrate ElevenLabs AI voices for natural, expressive speech
Add multilingual support and simplified steps for accessibility
Introduce AR and VR modes, letting users project assembly instructions directly onto their workspace or view full-scale models in immersive 3D
Export sequences to GIF or MP4 for retailer product pages
Expand beyond furniture to cover any type of technical manual