Inspiration

We noticed we often struggle with following furniture assembly manuals. It's often hard to visualize how exactly things fit together. We wanted to make the process of assembling much less frustrating by turning the dense instructions into something much more intuitive to follow: clear, procedural explainer videos.

What it does

Assembli first allows users to upload a PDF of some instruction manual. It then generates step-by-step explainer videos with narration, on-screen text, 2D exploded-view animations, and a parts/tools tracker to help users assemble their furniture. The final output is an MP4 file.

How we built it

For the UI and web app, we used Next.js. To understand the manual, we used Claude to read the PDF and produce structured scene JSON, which specifies all the steps, warnings, tools, etc that are part of the explainer video. To actually generate the video, we used Remotion, with ElevenLabs being used for the voice narration. To store all relevant files during the process, we used an S3 bucket with AWS.

Challenges we ran into

One challenge we had was that the LLM may not produce the expected scene JSON required for Remotion. To solve this issue, we added Zod to validate the JSON data we receive against an expected schema. Another challenge was that the LLM was inconsistent in reading accurately from the instruction manual. This is likely just the limitation of using an LLM for this.

Accomplishments that we're proud of

We're proud that we were able to get the whole core idea into a real workflow all in one app, as we weren't even sure if it was possible. Our vision for an app that can take an instruction PDF and turn it into a simpler form was mostly accomplished, at least at a basic level.

What we learned

We learned how a big part of successfully using LLM models for a solution lies a lot in creating a clean, quality structure around it. The experience relies heavily on expected schemas, validation, confidence handling, etc. We needed to ground the AI output in what is actually there in the document.

What's next for Assembli

Right now our app mostly displays everything in 2D form, showing animations essentially in 2.5D. We would like to take this further by making actual 3D rendered scenes to allow the user to really visualize what is happening during a specific step. Then the ideal next step from there is to add AR features, so the user can visualize what the assembled pieces should look like in the real world at a specific step.

Built With

Share this project:

Updates