Inspiration

What if a real-world video could instantly become a playable, animated 3D scene?

We envisioned a future where AI doesn’t just describe what’s in a video — it reconstructs it, breathes motion into it, and lets you interact with it in a real-time 3D environment. That’s the core vision behind Video ➡️ Interactive 3D by Thirteen Labs.

What it does

Our project automatically turns a video (e.g. from a Raspberry Pi live feed) into a fully interactive 3D experience — complete with geometrically accurate models, animation, and game-like interactivity in the browser.

You upload a video — we give you a moving, explorable 3D scene.

Our AI-powered pipeline detects key objects, understands their motion, position, geometry, and texture, and rebuilds the scene using clean Three.js code, with animations driven by real-world trajectories.

How we built it

We built a full end-to-end multimodal pipeline:

1) Live Camera Feed:

  • A Raspberry Pi running QNX streams live video to a web server via FFmpeg.

2) Video Understanding (Twelve Labs): The video is analyzed to extract:

  • Object identities and relationships
  • Text descriptions (including movement and orientation)
  • Keyframe image snapshots

3) 3D Code Generation (Gemini 2.5 Pro): These descriptions are sent to Gemini, which:

  • Reconstructs geometry using primitives like BoxGeometry and CylinderGeometry
  • Outputs clean Three.js code with animated motion tracks
  • Adds physics-like animation via AnimationClip, QuaternionKeyframeTrack, and VectorKeyframeTrack
  • Embeds animation metadata into model.userData for runtime playback

4) Rendering (Three.js): The code is wrapped and executed safely in-browser, allowing the user to interact with the generated model — rotate it, zoom in/out, and watch it animate.

5) Frontend (Next.js): Our frontend lets users upload videos, browse past generations, and view the 3D models — all rendered client-side with high performance.

We built this using:

  • QNX – for bare-metal real-time video feed from the Raspberry Pi
  • Twelve Labs – to extract motion-aware object data from videos
  • Gemini (Google) – to generate Three.js geometry and animation code
  • Three.js – for real-time 3D rendering in the browser
  • Next.js – to build a sleek frontend and handle API routes
  • FFmpeg – for handling video encoding and streaming

Technical Highlights

  • Modular Three.js code generation with ES6 exports
  • Animation system using AnimationMixer and keyframe tracks for position + rotation
  • Dynamic model construction using THREE.Group() for hierarchical scene graphs
  • Full front-to-back API stack to manage, store, and replay 3D scenes
  • Intelligent motion translation (e.g. “object rolled forward 3 meters” → animated 3D trajectory)

Challenges we ran into

  • QNX + Raspberry Pi + camera = dependency nightmares (shoutout to OpenCV incompatibility)
  • Coordinating asynchronous pipelines between Twelve Labs, Gemini, and rendering
  • Ensuring generated Three.js code was safe, modular, and animatable
  • Handling large file sizes, polling logic, and timeout constraints during generation

Accomplishments that we're proud of

  • Created a video-to-3D pipeline that automatically animates motion, not just structure
  • Successfully converted live video into an interactive, moving model in the browser
  • Made our scene viewer modular, clean, and production-ready using Three.js best practices
  • Built a novel demo that bridges AI, 3D graphics, and live video

What we learned

  • How to orchestrate AI models across modalities (video → text → 3D code)
  • Deep integration of animation logic in Three.js
  • Handling real-world constraints like frame drops, timeouts, and messy data
  • Designing scalable code wrappers to execute model-generated Three.js safely

What's next for Video ➡️ Interactive 3D by Thirteen Labs

  • Improve realism with texture mapping, lighting, and physics
  • Enable multi-object scenes and collision-based game logic
  • Add custom user prompts (e.g., "make the car bounce")
  • Support upload from mobile, not just Raspberry Pi
  • Let users embed their 3D scenes into websites or portfolios

Built With

  • QNX
  • Raspberry Pi
  • Twelve Labs
  • Gemini
  • Next.js
  • Three.js
  • FFmpeg
  • and more...

Built With

Share this project:

Updates