Interactive 3D Experience
Home Page
Analysis Page
Video Stream Recorder
Raspberry Pi + QNX :)

Inspiration

What if a real-world video could instantly become a playable, animated 3D scene?

We envisioned a future where AI doesn’t just describe what’s in a video — it reconstructs it, breathes motion into it, and lets you interact with it in a real-time 3D environment. That’s the core vision behind Video ➡️ Interactive 3D by Thirteen Labs.

What it does

Our project automatically turns a video (e.g. from a Raspberry Pi live feed) into a fully interactive 3D experience — complete with geometrically accurate models, animation, and game-like interactivity in the browser.

You upload a video — we give you a moving, explorable 3D scene.

Our AI-powered pipeline detects key objects, understands their motion, position, geometry, and texture, and rebuilds the scene using clean Three.js code, with animations driven by real-world trajectories.

How we built it

We built a full end-to-end multimodal pipeline:

1) Live Camera Feed:

A Raspberry Pi running QNX streams live video to a web server via FFmpeg.

2) Video Understanding (Twelve Labs): The video is analyzed to extract:

Object identities and relationships
Text descriptions (including movement and orientation)
Keyframe image snapshots

3) 3D Code Generation (Gemini 2.5 Pro): These descriptions are sent to Gemini, which:

Reconstructs geometry using primitives like BoxGeometry and CylinderGeometry
Outputs clean Three.js code with animated motion tracks
Adds physics-like animation via AnimationClip, QuaternionKeyframeTrack, and VectorKeyframeTrack
Embeds animation metadata into model.userData for runtime playback

4) Rendering (Three.js): The code is wrapped and executed safely in-browser, allowing the user to interact with the generated model — rotate it, zoom in/out, and watch it animate.

5) Frontend (Next.js): Our frontend lets users upload videos, browse past generations, and view the 3D models — all rendered client-side with high performance.

We built this using:

QNX – for bare-metal real-time video feed from the Raspberry Pi
Twelve Labs – to extract motion-aware object data from videos
Gemini (Google) – to generate Three.js geometry and animation code
Three.js – for real-time 3D rendering in the browser
Next.js – to build a sleek frontend and handle API routes
FFmpeg – for handling video encoding and streaming

Technical Highlights

Modular Three.js code generation with ES6 exports
Animation system using AnimationMixer and keyframe tracks for position + rotation
Dynamic model construction using THREE.Group() for hierarchical scene graphs
Full front-to-back API stack to manage, store, and replay 3D scenes
Intelligent motion translation (e.g. “object rolled forward 3 meters” → animated 3D trajectory)

Challenges we ran into

QNX + Raspberry Pi + camera = dependency nightmares (shoutout to OpenCV incompatibility)
Coordinating asynchronous pipelines between Twelve Labs, Gemini, and rendering
Ensuring generated Three.js code was safe, modular, and animatable
Handling large file sizes, polling logic, and timeout constraints during generation

Accomplishments that we're proud of

Created a video-to-3D pipeline that automatically animates motion, not just structure
Successfully converted live video into an interactive, moving model in the browser
Made our scene viewer modular, clean, and production-ready using Three.js best practices
Built a novel demo that bridges AI, 3D graphics, and live video