Inspiration
The original spark came from the Talking Tom Cat childhood game, a simple but powerful idea where a character listens to you and repeats what you say. That playful interaction left a strong impression on us: it proved that voice alone can create emotional connection and engagement.
We wanted to revisit that concept with today technology. With modern 3D graphics, real-time audio processing, and AI-driven systems, avatars are no longer just toys, they are becoming interfaces. While our current implementation focuses on echoing the user voice with realistic animations, the same foundation can naturally evolve into avatars that understand, respond, and answer inquiries.
What it does
Talking Avatar Game is an interactive 3D web experience where users communicate with a virtual avatar using their voice. The system records speech, processes it, and makes the avatar repeat it back with synchronized lip movements, facial expressions, and realistic animations, creating a natural "echo" interaction loop.
But, beyond entertainment, the platform can serve as :
- A virtual assistant or tutor for online training sessions
- A spoken feedback interface for learning environments
- A foundation for accessibility tools, such as speech visualization or sign-language avatars
How we built it
We built the project as a real-time, browser-based 3D application with a strong focus on performance and interactivity.
Frontend & Framework
- Next.js with TypeScript for scalability and structure
3D & Animation
- Three.js with React Three Fiber for rendering
- @react-three/drei for camera, environment, and helpers
- GSAP for smooth animation transitions and camera motion
- Natural idle animations, blinking, and expression handling
Audio & Speech
- Browser-based audio recording (Web Audio API)
- Speech-to-Text for transcription using Speech Recognition Browser API
- Text-to-Speech for audio playback
- Real-time lip-sync driven by phoneme analysis
UX
- Responsive UI
- Clear visual feedback for recording, processing, and speaking states
- Scene presets with dynamic lighting and camera transitions
Challenges we ran into
- Real-time lip synchronization: Mapping audio output accurately to mouth movements required careful timing and fine-tuning.
- Audio latency management: Ensuring smooth playback without noticeable delays between recording, processing, and speaking.
- 3D performance in the browser: Balancing visual quality with performance across devices.
- State coordination: Synchronizing audio states, avatar animations, and UI feedback in real time.
Accomplishments that we're proud of
- Achieving real-time voice interaction with synchronized 3D avatars entirely in the browser.
- Building a clean, extensible codebase suitable for future AI integrations.
- Designing a system that goes beyond a demo and can be reused for education, accessibility, and training.
What we learned
- Real-time interaction is as much about UX timing as it is about technology.
- Browser-based 3D and audio have matured enough for serious interactive applications.
- Small animation details (blinks, idle motion, camera easing) dramatically improve perceived realism.
What's next for Talking Avatar Game
- Online learning integration: Use avatars as virtual instructors for training and formation sessions.
- Sign language avatars: Extend speech transcription into real-time sign-language animation for accessibility.
- Multilingual support with dynamic voice and lip-sync adaptation.
- Avatar personalization: Custom faces, expressions, and emotional responses.
- Analytics & feedback: Track engagement and learning effectiveness in educational contexts.
Built With
- 3d
- gsap
- lipsync
- nextjs
- react
- react-three-fiber
- realtime
- speech-recognition
- typescript
Log in or sign up for Devpost to join the conversation.