Inspiration

Conversations move fast, and a lot of communication happens through subtle reactions that are easy to miss: hesitation, confusion, discomfort, skepticism, or the moment someone wants to speak.

For many, especially people who struggle with reading facial cues in real time, that gap can turn everyday conversations into something stressful and unpredictable. Missing those signals can lead to interrupting unintentionally, overexplaining, using the wrong tone, or not noticing when someone is uncomfortable.

We built Aura because we wanted to create something that actively supports conversation in the moment.

What it does

Aura is a real-time meeting coach that stays in the corner of your screen during a conversation.

It analyzes the other person’s visible reactions and gives live, practical coaching such as:

  • what to change in your delivery right now
  • what you should say next
  • when to simplify, pause, clarify, or lower pressure
  • when a phrase or topic previously caused visible discomfort

Aura also includes an optional demo overlay that lets users see what the system is tracking on screen, including the selected face, landmarks, and live coaching output.

How we built it

We used Opennote to collaborate together and brainstorm ideas

We built Aura as a macOS floating overlay app using SwiftUI.

The system combines several layers:

  • live screen capture to detect faces during online meetings
  • Apple Vision landmarks to track eyes, brows, lips, gaze direction, head pose, and facial geometry
  • a local emotion classification model to add a fast affect prior on the selected face
  • temporal smoothing and cue fusion so the app does not overreact to single frames
  • live speech transcription so the advice is grounded in what the user is actually saying
  • DeepSeek through Featherless to generate exact next-line coaching in real time
  • a conversation memory layer that tracks phrasing which previously caused visible discomfort and warns before repeating it

Challenges we ran into

The hardest problem was accuracy.

Facial reaction detection in real meeting footage is messy:

  • faces are small
  • lighting is inconsistent
  • gallery view makes tracking harder
  • subtle expressions are easy to overread
  • single-frame emotion predictions can be noisy and misleading

We also ran into product challenges:

  • keeping the overlay unobtrusive while still making it impressive enough for a demo
  • making the AI advice useful instead of generic
  • preventing the system from changing its recommendation too fast
  • avoiding unsafe design choices like inferring sensitive identity traits from reactions

A lot of the work ended up being calibration, smoothing, fallback behavior, and deciding when not to be overly confident.

Accomplishments that we're proud of

We are proud that Aura goes beyond a chatbot and works in a real-world setting.

Some things we are especially proud of:

  • building a live floating meeting assistant instead of a static demo
  • combining vision, speech, local inference, and LLM-generated coaching into one system
  • making the core UI practical: the main output is what to change
  • adding conversation-memory warnings so Aura can learn what phrasing creates friction
  • using Featherless in a meaningful way for real-time next-line generation

Most importantly, we turned a vague idea into something that actually helps during a live interaction.

What we learned

We learned that building socially aware AI is much harder than just attaching an LLM to a webcam feed.

A few major takeaways:

  • raw emotion classification is not enough; practical coaching matters more
  • timing and stability are as important as model quality
  • subtle UI decisions determine whether the product feels helpful or distracting
  • confidence and uncertainty need to be shown honestly
  • safety matters a lot when building systems that interpret human behavior

What's next for Aura

Our next steps are focused on making Aura more accurate, more personalized, and more useful in real conversations.

We want to:

  • improve reaction accuracy with better local affect models and stronger cue fusion
  • support longer-term per-person communication memory
  • better understand both sides of the conversation, not just the user’s speech
  • make the coaching more personalized to different meeting contexts like interviews, classes, and team meetings
  • expand accessibility settings for different sensory and communication preferences
  • continue refining the unobtrusive mode so it feels natural to use every day

Long term, we see Aura as a real-time communication support layer that helps users navigate conversations with more confidence and less guesswork.

Built With

  • coreml
  • featherless
  • huggingface
  • opennote
  • swift
Share this project:

Updates