Ruva

Ruva – AI-Powered Speech Tutor

Inspiration

Public speaking is universally terrifying. Whether it's the pressure of a group discussion, the sudden panic of going blank during a presentation, or the frustration of stuttering when the spotlight hits — speech anxiety holds brilliant people back.

We realized that existing solutions either just record your voice or offer generic advice. Even typical Discord-style practice communities lack structure and flexibility, especially for people with little to no exposure to public speaking.

So we asked:

What if practice felt like training with a coach who actually knows you?

Ruva was inspired by the idea of creating a safe, adaptive, intelligent sandbox where users could practice real-world speaking scenarios — from one-on-one debates to rapid-fire JAM sessions — with an AI that remembers their struggles and tracks their growth over time.

What It Does

Ruva is a modern AI-powered speech tutor designed to dismantle speech anxiety through personalized coaching powered by a native RAG architecture.

Instead of static feedback, Ruva:

Tracks historical strengths and weaknesses
Identifies filler word usage patterns
Detects drops in vocal intensity
Measures pacing and pauses
Monitors improvement across sessions

Users can practice inside four distinct training rooms:

Debate Mode

Face off against an AI opponent or debate another human while an AI acts as the judge.

Group Discussion Mode

Join multiplayer rooms (2+ participants) guided by an AI facilitator that manages flow and engagement.

JAM (Just-A-Minute) Mode

A high-pressure single-player mode designed to improve spontaneous speaking ability.

Reading Mode

Practice pronunciation, pacing, and clarity in a structured solo environment.

Behind the scenes, Ruva performs real-time analysis of:

Speech transcription
Prosody (pitch, jitter, shimmer)
Pauses
Sentiment
Speaking confidence indicators

All to generate actionable, personalized coaching feedback.

How We Built It

We redesigned the system architecture from the ground up to support real-time, low-latency interactions.

Frontend

Built using:

React
TypeScript
Vite
Redux (state management)

The UI supports responsive multiplayer sessions and live feedback visualization.

Backend

Python
FastAPI
WebSockets

WebSockets enable real-time bidirectional communication required for:

multiplayer rooms
live transcription
AI facilitation
audio streaming pipelines

Data & Memory Layer

We implemented a hybrid storage architecture:

MongoDB → persistent storage for user progress (core to RAG memory)
Redis → high-speed session state caching during live rooms

AI & Audio Engine

Ruva’s intelligence stack includes:

Google Gemini API (core reasoning engine)
Whisper (speech-to-text transcription)
Silero VAD (voice activity detection)
Parselmouth (scientific prosody analysis)

Together, they enable real-time speech understanding and personalized coaching.

Challenges We Ran Into

Handling real-time audio streaming was one of the toughest challenges.

We had to:

synchronize frontend audio streams through WebSockets
segment speech efficiently using Silero VAD
pipeline audio into Whisper transcription
minimize latency without breaking conversation flow

Another major challenge was building multiplayer facilitation logic.

For Group Discussion and Debate Mode, Gemini needed to:

listen to multiple speakers
track conversation context
identify speaker turns
intervene naturally as a moderator or judge

All without disrupting human interaction dynamics.

Accomplishments We're Proud Of

Our biggest achievement is the native RAG-based coaching memory system.

Instead of analyzing speech in isolation, Ruva remembers things like:

“You struggled with filler words last Tuesday — let's check improvement today.”

That transforms Ruva from a tool into a mentor-like experience.

We're also proud of:

migrating to a scalable React + FastAPI + WebSocket architecture
enabling real-time multiplayer speaking environments
implementing experimental body-language tracking using periodic visual snapshots

What We Learned

This project became a masterclass in real-time system engineering.

We gained hands-on experience with:

WebSocket lifecycle management
distributed real-time state synchronization
audio streaming pipelines
low-latency speech processing architectures
advanced prompt engineering with Gemini

We also explored designing specialized AI personas that act as:

judges
facilitators
coaches

inside different speaking environments.

What's Next for Ruva

Our immediate roadmap includes:

integrating Gemini Live Multimodal APIs
reducing response latency
supporting interruption-aware conversation handling
introducing additional structured speaking rooms
Implementing natural voice support using third party providers for more personlisation

Our long-term vision:

Launch Ruva as a full mobile application and make personalized speech coaching accessible anywhere.