Inspiration

Movies are getting worse and worse due to short form content such as Instagram reels, Tiktok and, overall dopamine frying. We are trying to understand what makes an impactful movie and what truly brings out deep emotions in viewers. Since feedback is often delayed, subjective, limited to surveys or box office performance, we hope to read the behind the screens connection between the audience and film-and ultimately bring back movies to their prime.

What it does

AffectLens is a real-time audience emotion analytics platform that:

  1. Detects faces from live or recorded video
  2. Classifies viewer facial expressions using a Vision Transformer model
  3. Tracks emotional changes across time
  4. Generates an emotional timeline aligned to film scenes
  5. Aggregates group-level engagement metrics
  6. Using the HuggingFace pipeline: pipeline("image-classification", model="trpakov/vit-face-expression")

we classify emotions such as: Happy, Sad, Angry, Fear, Surprise, Disgust, Neutral Each frame produces a probability distribution over emotions

How we built it

We used the pretrained Vision Transformer model: trpakov/vit-face-expression. This model is accessed via the HuggingFace image-classification pipeline for clean integration and fast deployment. We also have face Detection to ensure accurate emotion classification that detect faces using OpenCV Haar cascades Crop and isolate the face region Run the emotion classifier only on detected faces.

We also used countless probability distributions and lots of tweaking to get the emotions to correctly show on the screen.

Challenges we ran into

MediaPipe version conflicts caused inference failures. We switched to OpenCV Haar cascades for stability during demo conditions. Top-k confidence logic and secondary prediction overrides were also very difficult as well as temporal smoothing.

Accomplishments that we're proud of

Successfully integrated a Vision Transformer into a real-time system. Built a full-stack emotion analytics pipeline and designed an interpretable engagement metric. Created a visual emotional arc aligned to film scenes and achieved stable live inference during demo conditions.

What we learned

Vision Transformers are powerful but computationally expensive. Emotion detection is probabilistic, not binary. Human expressions are subtle and context-dependent. Real-time AI systems require robustness over perfection. UX matters just as much as model accuracy as well.

We also gained experience in: Model deployment, API design with FastAPI, Real-time video processing and ML system debugging under time pressure.

What's next for AffectLens

We hope to add multi-person audience aggregation, scene-to-scene comparative analytics, attention tracking (gaze + blink detection) and a director dashboard with emotional heatmaps.

Built With

Share this project:

Updates