Inspiration

Communication is more than just words — it includes tone, intent, and emotion. In fast-paced meetings and virtual conversations, people often miss critical meaning. This problem is even more significant for individuals with autism, ADHD, or language-processing challenges. We were inspired to build Clarity to bridge the gap between and , helping people better understand conversations in real time.

What it does

Clarity is a real-time meeting assistant that listens to conversations and provides:

  • Simplified summaries of what was just said
  • Emotional context (e.g., urgency, frustration, calmness)
  • Automatic extraction of action items and deadlines
  • Suggested calendar updates based on tasks

How we built it

AI pipeline (index.js)

  • The pipeline starts with an inputProcessor that prepares the incoming transcript and emotion data.
  • From there, we run multiple tasks in parallel using Promise.all() for speed:
    • simplifier for summary generation
    • explainer for vocabulary/context support
    • emotionDetector for classifying across 10 emotions
    • replyGenerator for generating 1–3 suggested replies
    • taskExtractor for identifying action items
  • After that, a responseGenerator creates a supportive final response.
  • The results are then stored using store.save() into a memoryStore / fileStore.

    • LLM integration
  • The processing is powered by Gemini 2.5 Flash through the Generative Language API, which enables fast real-time inference.

Overall flow:
Camera + Microphone → Browser Detection/Transcription → Express Backend → Parallel AI Pipeline → Overlay Results + Memory Storage

Challenges we ran into

  • Maintaining low latency for real-time feedback
  • Handling noisy or overlapping speech input
  • Accurately detecting emotion from limited context
  • Extracting structured action items from unstructured conversations
  • Designing a UI that provides value without overwhelming the user

Accomplishments that we're proud of

  • Built a working real-time audio → insight pipeline
  • Successfully simplified complex speech into concise summaries
  • Implemented emotion detection in an accessible and intuitive way
  • Automated action item tracking from live conversations
  • Designed an accessibility-first solution that benefits a wide audience

What we learned

  • Real-time systems require careful trade-offs between speed and accuracy
  • Simplification is more challenging than basic summarization
  • Emotion detection is nuanced and context-dependent
  • Accessibility-focused design improves usability for everyone
  • Clear UX is critical when presenting complex AI outputs

What's next for Clarity

  • Improve emotion detection using multimodal inputs (voice + facial cues)
  • Integrate with platforms like Zoom, Microsoft Teams, and Google Meet
  • Add speaker identification and role-based task assignment
  • Personalize summaries based on user preferences
  • Optimize for large-scale deployment and enterprise use

Built With

Share this project:

Updates