Inspiration

The single most difficult thing we faced upon entering college was not something we anticipated being so hard. It wasn’t the rigor of the courses, nor the challenge of newfound independence—it was interacting with other human beings. We suddenly found ourselves in entirely unfamiliar terrain. Interactions were abundant but fleeting; there were people everywhere, yet no real sense of connection. Making friends, something that once felt natural, suddenly became much harder. This is where our idea stems from—our shared inability to communicate and socialize meaningfully with others. We found ourselves ranting about the same struggles week after week, grappling with a growing dearth of human connection in college. Eventually, we realized this wasn’t just our problem. So we decided to do something about it—for ourselves and for others like us. When Gemini came along, it presented the perfect opportunity to turn this shared struggle into a solution.

What it does

Our application helps users improve their communication skills by offering a range of focused options, including general communication, interview preparation, public speaking, and professional growth. Each of these options is further divided into specific, real-world scenarios. For instance, the general communication module includes personalized situations such as networking events, first dates etc. Once users select the area they want to work on, they receive personalized guidance from an AI-powered bot. The system analyzes both their written responses and audio-visual inputs to provide comprehensive feedback aimed at improving clarity, confidence, and overall communication effectiveness. In addition, the application features a general AI roleplay feature that allows users to practice for any specific situation, along with games such as Synonym Sprint designed to enhance vocabulary and linguistic agility.

How we built it

It is built on a fast, modern tech stack made for real-time interaction and smooth user experience.

The frontend is a React 19 app with TypeScript and Vite, ensuring quick performance and efficient builds. Its unique “Neo-Pop” design system is created using Tailwind CSS, with bold colors, high-contrast borders, and strong shadows for a premium, standout look. Framer Motion adds life to the interface through smooth transitions, animated vocal visualizations, and floating AI coach notifications.

At the core of Commic is Gemini 1.5 Pro, integrated using the Vertex AI SDK for Firebase. This enables low-latency multimodal analysis of both text and images directly from the client. A custom multimodal loop captures camera frames every 8 seconds via a hidden HTML5 Canvas and sends them with conversation context to Gemini for real-time behavioral coaching. The Web Speech API is used for accurate live speech-to-text, supporting both spoken feedback and dictation.

Firebase handles the backend with Firestore and Authentication, securely managing user profiles, goals, and assessment history.

Challenges we ran into

Our biggest challenge was delivering real-time multimodal coaching without slowing down the interface. Early versions relied on Cloud Functions, but cold-start delays caused noticeable lag, so we switched to direct client-side calls using the Vertex AI Firebase SDK. Synchronizing continuous speech recognition with React state updates was another hurdle, which we solved using ref-based control to avoid lost or jumping transcripts. Capturing visual context efficiently also required innovation, achieved through an off-screen canvas that samples video frames without affecting the main camera feed. Finally, we refined how AI feedback appears on screen, designing Neo-Pop style floating tips that stay helpful without distracting users while they speak.

Accomplishments that we're proud of

Seamless Multimodal Coaching: We successfully built a system that analyzes not just what you say, but how you say it (posture, eye contact, and confidence) using a single AI pipeline. The "Live Coach" HUD: The floating tip system feels like a futuristic companion, providing instant value (e.g., "Great eye contact!" or "Try to slow down") that feels integrated into the practice session. Written Dictation: Adding a microphone-to-writing feature transformed the assessment experience, making it accessible and efficient for users to articulate complex thoughts. Visual Aesthetic: We moved away from the boring "SaaS Blue" aesthetic and created a bold, memorable design that makes practicing communication skills feel like a game rather than a chore.

What we learned

We learned that running Gemini directly on the client is key to making real-time multimodal AI feel truly responsive. It dramatically shortened feedback loops and made live analysis practical for web apps. Through testing, we found that capturing visual frames every 8 seconds strikes the right balance between cost, performance, and timely feedback. We also realized how important communication design is: users feel more at ease when they can see what the AI is doing. Simple activity cues like an oscilloscope or an “Analyzing…” indicator help set expectations and build trust during processing time.

What's next for communication skills helper

Granular Vocal Tones: We plan to go beyond transcripts by integrating emotional sentiment analysis from audio files to provide feedback on sarcasm, enthusiasm, and vocal range. Multi-user Practice: Implementing real-time "Roleplay Rooms" where two users can practice together while a single AI moderator provides collaborative feedback. Gamified Progression: A dedicated "Skill Tree" that unlocks new scenario challenges (like TED Talks or Pitch Competitions) as the AI detects improvement in core metrics.

Built With

Share this project:

Updates