Inspiration

Val worked as a concierge with a deaf resident. He needed to communicate effectively. Writing notes felt slow and impersonal. Calling a video relay service was overkill. He wanted a spontaneous solution.

What it does

Kine is a bi-directional sign language bridge. In Signing Mode, a Deaf user signs into their camera and is heard instantly through spoken audio. In Listening Mode, a hearing user speaks and their words are translated into ASL gloss displayed via a signing avatar. No menus, no typing, just point and communicate.

How we built it

We utilized Google MediaPipe for tracking. Landmark data goes to Gemini 3.0. ElevenLabs handles the voice synthesis. The interface uses React plus Framer Motion.

Challenges we ran into

Distinguishing active signing from idle hand movement required building a custom motion detection system with silence thresholds. Facial grammar, eyebrow raises, mouth shapes , carries grammatical meaning in ASL but is subtle to capture. Keeping latency low enough for conversation to feel natural, not transactional, was a constant battle.

Accomplishments that we're proud of

By leveraging Gemini 3.0's vision capabilities and a custom prompt, we successfully enabled the model to interpret landmarks with impressive accuracy.

What we learned

Speed beats perfect accuracy for spontaneous communication. ASL is not English — facial expressions and spatial grammar carry meaning that word-for-word translation destroys. Running ML on the edge (MediaPipe + Landmark in-browser) is critical for real-time responsiveness. Accessibility isn't a feature, it's an architecture decision.

What's next for Kine

QA testing with Deaf users. Expanding sign vocabulary beyond the current recognition set. Improving the avatar output with generative signing. Reducing latency further. Beta launch.

Built With

  • elevenlabs
  • gemini3.0
  • mediapipe
  • react
  • react-19
  • tailwind-css-4
  • tensorflow
  • typescript
  • vercel
  • vitest
Share this project:

Updates