Inspiration
What if a real-world video could instantly become a playable, animated 3D scene?
We envisioned a future where AI doesn’t just describe what’s in a video — it reconstructs it, breathes motion into it, and lets you interact with it in a real-time 3D environment. That’s the core vision behind Video ➡️ Interactive 3D by Thirteen Labs.
What it does
Our project automatically turns a video (e.g. from a Raspberry Pi live feed) into a fully interactive 3D experience — complete with geometrically accurate models, animation, and game-like interactivity in the browser.
You upload a video — we give you a moving, explorable 3D scene.
Our AI-powered pipeline detects key objects, understands their motion, position, geometry, and texture, and rebuilds the scene using clean Three.js code, with animations driven by real-world trajectories.
How we built it
We built a full end-to-end multimodal pipeline:
1) Live Camera Feed:
- A Raspberry Pi running QNX streams live video to a web server via FFmpeg.
2) Video Understanding (Twelve Labs): The video is analyzed to extract:
- Object identities and relationships
- Text descriptions (including movement and orientation)
- Keyframe image snapshots
3) 3D Code Generation (Gemini 2.5 Pro): These descriptions are sent to Gemini, which:
- Reconstructs geometry using primitives like BoxGeometry and CylinderGeometry
- Outputs clean Three.js code with animated motion tracks
- Adds physics-like animation via AnimationClip, QuaternionKeyframeTrack, and VectorKeyframeTrack
- Embeds animation metadata into model.userData for runtime playback
4) Rendering (Three.js): The code is wrapped and executed safely in-browser, allowing the user to interact with the generated model — rotate it, zoom in/out, and watch it animate.
5) Frontend (Next.js): Our frontend lets users upload videos, browse past generations, and view the 3D models — all rendered client-side with high performance.
We built this using:
- QNX – for bare-metal real-time video feed from the Raspberry Pi
- Twelve Labs – to extract motion-aware object data from videos
- Gemini (Google) – to generate Three.js geometry and animation code
- Three.js – for real-time 3D rendering in the browser
- Next.js – to build a sleek frontend and handle API routes
- FFmpeg – for handling video encoding and streaming
Technical Highlights
- Modular Three.js code generation with ES6 exports
- Animation system using AnimationMixer and keyframe tracks for position + rotation
- Dynamic model construction using THREE.Group() for hierarchical scene graphs
- Full front-to-back API stack to manage, store, and replay 3D scenes
- Intelligent motion translation (e.g. “object rolled forward 3 meters” → animated 3D trajectory)
Challenges we ran into
- QNX + Raspberry Pi + camera = dependency nightmares (shoutout to OpenCV incompatibility)
- Coordinating asynchronous pipelines between Twelve Labs, Gemini, and rendering
- Ensuring generated Three.js code was safe, modular, and animatable
- Handling large file sizes, polling logic, and timeout constraints during generation
Accomplishments that we're proud of
- Created a video-to-3D pipeline that automatically animates motion, not just structure
- Successfully converted live video into an interactive, moving model in the browser
- Made our scene viewer modular, clean, and production-ready using Three.js best practices
- Built a novel demo that bridges AI, 3D graphics, and live video
What we learned
- How to orchestrate AI models across modalities (video → text → 3D code)
- Deep integration of animation logic in Three.js
- Handling real-world constraints like frame drops, timeouts, and messy data
- Designing scalable code wrappers to execute model-generated Three.js safely
What's next for Video ➡️ Interactive 3D by Thirteen Labs
- Improve realism with texture mapping, lighting, and physics
- Enable multi-object scenes and collision-based game logic
- Add custom user prompts (e.g., "make the car bounce")
- Support upload from mobile, not just Raspberry Pi
- Let users embed their 3D scenes into websites or portfolios
Built With
- QNX
- Raspberry Pi
- Twelve Labs
- Gemini
- Next.js
- Three.js
- FFmpeg
- and more...
Built With
- fastapi
- ffmpeg
- next.js
- python
- qnx
- raspberry-pi
- react
- typescript
Log in or sign up for Devpost to join the conversation.