Talking Avatar Game

welcome screen
Customization modal
Man avatar
Female avatar
Avatars in blender

Inspiration

The original spark came from the Talking Tom Cat childhood game, a simple but powerful idea where a character listens to you and repeats what you say. That playful interaction left a strong impression on us: it proved that voice alone can create emotional connection and engagement.

We wanted to revisit that concept with today technology. With modern 3D graphics, real-time audio processing, and AI-driven systems, avatars are no longer just toys, they are becoming interfaces. While our current implementation focuses on echoing the user voice with realistic animations, the same foundation can naturally evolve into avatars that understand, respond, and answer inquiries.

What it does

Talking Avatar Game is an interactive 3D web experience where users communicate with a virtual avatar using their voice. The system records speech, processes it, and makes the avatar repeat it back with synchronized lip movements, facial expressions, and realistic animations, creating a natural "echo" interaction loop.

But, beyond entertainment, the platform can serve as :

A virtual assistant or tutor for online training sessions
A spoken feedback interface for learning environments
A foundation for accessibility tools, such as speech visualization or sign-language avatars

How we built it

We built the project as a real-time, browser-based 3D application with a strong focus on performance and interactivity.

Frontend & Framework

Next.js with TypeScript for scalability and structure

3D & Animation

Three.js with React Three Fiber for rendering
@react-three/drei for camera, environment, and helpers
GSAP for smooth animation transitions and camera motion
Natural idle animations, blinking, and expression handling

Audio & Speech

Browser-based audio recording (Web Audio API)
Speech-to-Text for transcription using Speech Recognition Browser API
Text-to-Speech for audio playback
Real-time lip-sync driven by phoneme analysis

UX

Responsive UI
Clear visual feedback for recording, processing, and speaking states
Scene presets with dynamic lighting and camera transitions

Challenges we ran into

Real-time lip synchronization: Mapping audio output accurately to mouth movements required careful timing and fine-tuning.
Audio latency management: Ensuring smooth playback without noticeable delays between recording, processing, and speaking.
3D performance in the browser: Balancing visual quality with performance across devices.
State coordination: Synchronizing audio states, avatar animations, and UI feedback in real time.

Accomplishments that we're proud of

Achieving real-time voice interaction with synchronized 3D avatars entirely in the browser.
Building a clean, extensible codebase suitable for future AI integrations.
Designing a system that goes beyond a demo and can be reused for education, accessibility, and training.

What we learned

Real-time interaction is as much about UX timing as it is about technology.
Browser-based 3D and audio have matured enough for serious interactive applications.
Small animation details (blinks, idle motion, camera easing) dramatically improve perceived realism.

What's next for Talking Avatar Game

Online learning integration: Use avatars as virtual instructors for training and formation sessions.
Sign language avatars: Extend speech transcription into real-time sign-language animation for accessibility.
Multilingual support with dynamic voice and lip-sync adaptation.
Avatar personalization: Custom faces, expressions, and emotional responses.
Analytics & feedback: Track engagement and learning effectiveness in educational contexts.