Inspiration
Watching a grandmother struggle to understand her granddaughter's birthday video because it was in English, not Spanish, broke our hearts. That moment exposed a painful truth: millions of precious memories are trapped behind language barriers, and traditional video playback keeps us as passive observers rather than active participants in our own memories.
We asked ourselves: what if you could literally step into your videos and explore them like a 3D space? What if your voice could transcend language barriers while keeping its emotional soul intact?
Rewind was born to transform how we experience memories—making them explorable in 3D and accessible in any language, all while keeping you at the center through your own cloned voice.
What it does
Rewind transforms ordinary videos into extraordinary experiences through three innovations:
3D Spatial Exploration - Converts 2D videos into navigable 3D point clouds. Step inside your memories, explore frozen moments from any angle, and click on objects to hear what's happening. It's like being inside a photograph.
AI Scene Understanding - TwelveLabs detects objects, people, and actions automatically. Google Gemini generates natural descriptions of each scene. Search your memories like "show me all moments with Emma smiling."
VoiceBridge Technology - ElevenLabs clones your voice from just 30 seconds of audio. Translate scene descriptions into 29+ languages while preserving your unique vocal signature. Now your grandmother in Mexico can hear your voice narrating in perfect Spanish.
The result? Universal accessibility without losing authenticity. Your memories become explorable 3D spaces that anyone, anywhere, can understand—in your voice.
How we built it
Frontend - React 18 with Three.js for WebGL-powered 3D rendering. Tailwind CSS for our cosmic-themed glassmorphic UI. Canvas API for animated star fields and particle systems.
Backend - FastAPI handling async operations. FFmpeg extracting video frames. Firebase managing auth, storage, and database.
AI Pipeline - TwelveLabs for video analysis. Google Gemini for scene descriptions and translation. ElevenLabs for voice cloning and synthesis. MiDaS for depth estimation from 2D frames.
Key Flow:
- User uploads video → Firebase Storage
- FFmpeg extracts frames → MiDaS generates depth maps
- TwelveLabs analyzes scenes → Gemini creates descriptions
- User records 30-second voice sample → ElevenLabs clones voice
- User selects language → Gemini translates → ElevenLabs synthesizes in cloned voice
- Three.js renders 3D memory space with interactive narration
Team Division:
- Peace Enesi: 3D rendering and depth processing pipeline
- Ohinoyi Moiza: Frontend UI and VoiceBridge interface
- Joanna Chimalilo: Backend API and AI service integration
Challenges we ran into
Depth Estimation Inconsistencies - Monocular depth from 2D videos produced flickering artifacts with moving objects. We implemented temporal smoothing algorithms and hybrid depth calibration to stabilize the 3D reconstruction.
3D Performance on Lower-End Devices - Rendering thousands of point cloud vertices caused frame drops. We added Level-of-Detail systems, instanced rendering, frustum culling, and Web Workers for background processing to maintain 60fps.
Voice Quality Across Languages - ElevenLabs worked perfectly for English but lost emotional nuance in tonal languages like Mandarin. We fine-tuned parameters per language family and adjusted prosody preservation techniques.
Narration Generation Latency - The full pipeline (translate → synthesize → upload) took 8-12 seconds. We implemented aggressive caching by (scene_id, language, voice_id), pre-generated demo narrations, and added animated loading states to improve perceived speed.
Dependency Conflicts - React 19, Three.js 0.180, and @react-three/fiber had breaking peer dependency issues. We systematically downgraded to stable versions (React 18.3, Three.js 0.160, @react-three/fiber 8.15) and used legacy peer deps flags.
Disk Space Crisis - During development, we hit 100% disk usage which blocked npm installs. Had to aggressively clear caches, delete old node_modules folders, and manage storage throughout the hackathon.
Accomplishments that we're proud of
It Actually Works - We built a functional demo that chains three complex AI services (TwelveLabs → Gemini → ElevenLabs) with real 3D rendering. You can upload a video, clone your voice, and hear yourself speaking French.
VoiceBridge Technology - The emotional impact of hearing your own voice speaking a language you don't know is magical. We created something that preserves human connection across language barriers.
Award-Winning Design - Our cosmic-themed landing page with glassmorphic UI, animated star fields, and smooth Three.js orb animations looks production-ready. The wormhole effect in the final CTA is mesmerizing.
Real-Time 3D Performance - Optimizing Three.js to render complex point clouds at 60fps on various devices taught us advanced graphics programming we'll use forever.
24-Hour Full-Stack Build - We went from concept to deployable demo in one hackathon, with working frontend, backend, AI pipeline, and infrastructure. The team collaboration was seamless despite working on different continents.
Solving Real Problems - This isn't just cool tech—it solves actual pain points. Families separated by language, educators reaching global students, content creators accessing international audiences. We built something meaningful.
What we learned
Technical Deep Dives
- Advanced Three.js optimization: instanced rendering, LOD systems, shader programming
- AI service orchestration with proper error handling and retry logic
- Web Audio API for real-time waveform visualization
- Monocular depth estimation and 3D reconstruction techniques
- Firebase architecture for real-time collaborative applications
Product Insights
- The most powerful technology preserves human emotion—voice cloning resonates because it keeps you in the narration
- Accessibility doesn't mean compromise—you can make content universal without losing authenticity
- Demo-driven development works—we built for the pitch first, ensuring every feature tells a story
Hackathon Strategy
- Clearly defined roles prevent overlap and enable parallel development
- Documenting decisions in README files saved hours of repeated explanations
- Distinguishing between "vision" and "24-hour demo" kept us focused
- Sometimes downgrading packages is smarter than fighting cutting-edge bugs
Team Collaboration
- Async communication across time zones requires crystal-clear documentation
- Trust your teammates' domain expertise—micromanaging kills velocity
- Celebrate small wins during the grind—they keep morale high at 3am
What's next for Rewind
Immediate Features
- Complete MiDaS depth pipeline for any uploaded video (currently demo scenes)
- Mobile AR app - explore memories in your living room with phone camera
- Social sharing with embedded narrations
- Batch processing for multiple videos at once
Advanced Capabilities
- VR integration for fully immersive memory exploration (Oculus, Vision Pro)
- Real-time collaboration—multiple users exploring the same memory space together
- AI-powered memory search: "Show me all moments with grandma smiling" or "Find the part where we opened gifts"
- Emotion detection to adjust narration tone based on facial expressions
- Voice aging - hear how your childhood voice would sound narrating recent memories
Enterprise Applications
- Sports analysis platforms for coaches reviewing plays in 3D
- Medical training for surgical procedure exploration from any angle
- Real estate virtual tours with multilingual narration
- Corporate training videos accessible in employees' native languages
- Documentary filmmaking with interactive 3D exploration
Community Impact Our vision is simple: language should never be a barrier to sharing life's precious moments. Every grandmother should hear her grandchild's laughter in her native tongue. Every family separated by borders should feel connected through shared memories. Every memory deserves to be explored, not just watched.
Rewind is just the beginning of universal memory accessibility.




Log in or sign up for Devpost to join the conversation.