Babel Synth

Inspiration

Voice chats feel kind of empty. You talk, but the call itself doesn’t change based on how people act or move. We wanted to see what would happen if body language had some effect, even in a simple way.

What it does

Babel Synth is a voice chat that reacts to how you move and express yourself. While you talk, the camera tracks basic facial expressions, hand movement, and posture. Those signals are processed live and used to affect sound in real time.

There are no buttons or manual controls. The system runs in the background and responds continuously as the conversation happens.

How we built it

Lyria - Used for sound generation and audio processing based on the incoming vision data.

MediaPipe - Used for computer vision because it runs directly in the browser and provides reliable face, hand, and pose tracking without sending video to a server.

Next.js - Used for the frontend because it makes building real-time UIs fast and manageable, especially with hooks and client-side rendering.

TypeScript - Helps keep the project stable by catching bugs early, which matters a lot when dealing with complex real-time data.

WebRTC - Used for low-latency audio streaming between users so the voice chat feels immediate and natural.

WebSockets - Used to stream vision data to the backend in real time and keep all clients in sync.

FastAPI - Used for the music gen backend because it’s fast to build with and works well for async and real-time.

Web Audio API - Used on the client side to play and control audio with low latency.

Challenges we ran into

Real-time stuff breaks easily. At some point the values were stuck and nothing updated but there were no errors. Calibration was also annoying because everyone’s neutral position is different.

Accomplishments that we're proud of

We managed to get vision, audio, and networking working together in one project. The voice chat actually responds to movement and doesn’t feel completely fake.

What we learned

Computer vision data is messy and needs filtering. Also, low latency is more important than being super accurate for this kind of project.

What's next for Babel Synth

We want to clean up the sound reactions and maybe support more people in one room. Also better calibration so it works without explaining too much.

Built With

fastapi
lyria
mediapipe
nextjs
python
typescript
webrtc
websockets

Submitted to

United Hacks V6

Created by

I created the front end and the camera and voice call features. First time working with teammates on a project like this but it was fun and I learned a lot.

Noma4321 malka
I built the camera and vision system, including real-time feature extraction from facial expressions and body movement, smoothing and calibration logic, and the frontend hooks that turn webcam input into live data sent to the backend.

Philip Sagan

Updates

Philip Sagan started this project — Jan 18, 2026 11:53 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.