Inspiration
Communication is more than just words — it includes tone, intent, and emotion. In fast-paced meetings and virtual conversations, people often miss critical meaning. This problem is even more significant for individuals with autism, ADHD, or language-processing challenges. We were inspired to build Clarity to bridge the gap between and , helping people better understand conversations in real time.
What it does
Clarity is a real-time meeting assistant that listens to conversations and provides:
- Simplified summaries of what was just said
- Emotional context (e.g., urgency, frustration, calmness)
- Automatic extraction of action items and deadlines
- Suggested calendar updates based on tasks
How we built it
AI pipeline (index.js)
- The pipeline starts with an
inputProcessorthat prepares the incoming transcript and emotion data. - From there, we run multiple tasks in parallel using
Promise.all()for speed:- simplifier for summary generation
- explainer for vocabulary/context support
- emotionDetector for classifying across 10 emotions
- replyGenerator for generating 1–3 suggested replies
- taskExtractor for identifying action items
- After that, a responseGenerator creates a supportive final response.
The results are then stored using
store.save()into a memoryStore / fileStore.- LLM integration
The processing is powered by Gemini 2.5 Flash through the Generative Language API, which enables fast real-time inference.
Overall flow:
Camera + Microphone → Browser Detection/Transcription → Express Backend → Parallel AI Pipeline → Overlay Results + Memory Storage
Challenges we ran into
- Maintaining low latency for real-time feedback
- Handling noisy or overlapping speech input
- Accurately detecting emotion from limited context
- Extracting structured action items from unstructured conversations
- Designing a UI that provides value without overwhelming the user
Accomplishments that we're proud of
- Built a working real-time audio → insight pipeline
- Successfully simplified complex speech into concise summaries
- Implemented emotion detection in an accessible and intuitive way
- Automated action item tracking from live conversations
- Designed an accessibility-first solution that benefits a wide audience
What we learned
- Real-time systems require careful trade-offs between speed and accuracy
- Simplification is more challenging than basic summarization
- Emotion detection is nuanced and context-dependent
- Accessibility-focused design improves usability for everyone
- Clear UX is critical when presenting complex AI outputs
What's next for Clarity
- Improve emotion detection using multimodal inputs (voice + facial cues)
- Integrate with platforms like Zoom, Microsoft Teams, and Google Meet
- Add speaker identification and role-based task assignment
- Personalize summaries based on user preferences
- Optimize for large-scale deployment and enterprise use

Log in or sign up for Devpost to join the conversation.