Inspiration

Voice chats feel kind of empty. You talk, but the call itself doesn’t change based on how people act or move. We wanted to see what would happen if body language had some effect, even in a simple way.

What it does

Babel Synth is a voice chat that reacts to how you move and express yourself. While you talk, the camera tracks basic facial expressions, hand movement, and posture. Those signals are processed live and used to affect sound in real time.

There are no buttons or manual controls. The system runs in the background and responds continuously as the conversation happens.

How we built it

Lyria - Used for sound generation and audio processing based on the incoming vision data.

MediaPipe - Used for computer vision because it runs directly in the browser and provides reliable face, hand, and pose tracking without sending video to a server.

Next.js - Used for the frontend because it makes building real-time UIs fast and manageable, especially with hooks and client-side rendering.

TypeScript - Helps keep the project stable by catching bugs early, which matters a lot when dealing with complex real-time data.

WebRTC - Used for low-latency audio streaming between users so the voice chat feels immediate and natural.

WebSockets - Used to stream vision data to the backend in real time and keep all clients in sync.

FastAPI - Used for the music gen backend because it’s fast to build with and works well for async and real-time.

Web Audio API - Used on the client side to play and control audio with low latency.

Challenges we ran into

Real-time stuff breaks easily. At some point the values were stuck and nothing updated but there were no errors. Calibration was also annoying because everyone’s neutral position is different.

Accomplishments that we're proud of

We managed to get vision, audio, and networking working together in one project. The voice chat actually responds to movement and doesn’t feel completely fake.

What we learned

Computer vision data is messy and needs filtering. Also, low latency is more important than being super accurate for this kind of project.

What's next for Babel Synth

We want to clean up the sound reactions and maybe support more people in one room. Also better calibration so it works without explaining too much.

Built With

Share this project:

Updates