Inspiration

 I’ve been obsessed with this one stat: U.S. manufacturers lose around $12B a year because knowledge disappears at shift change. The outgoing shift knows the weird machine quirks, the half-finished fixes, the safety stuff, and it all gets handed off through rushed talking or sticky notes. Then the next shift walks in basically blind, repeats the same mistakes, and causes downtime that never needed to happen. When I saw the Tractian sponsor track at UGAHacks 11, it clicked. Tractian’s whole thing is eliminating manufacturing downtime, and shift handoffs feel like one of the biggest problems nobody has really solved. I wanted to build something that doesn’t just “log what happened,” but actually makes sure the incoming shift understands what they need to know.
 That’s where the attention idea came from. If someone zones out during a critical safety warning, the system should catch it and show it again. No guessing.

What it does

Vigil is a B2B SaaS platform for manufacturing shift handoffs with three main loops:

1) Capture Outgoing workers speak or type their end-of-shift briefing naturally. Gemini processes the raw text and structures it into categorized, severity-ranked items, each with a clear action required for the incoming shift. Gemini also performs named entity recognition, extracting and tagging manufacturing-specific entities like machine IDs, part numbers, and failure modes into a structured taxonomy. 2) Review with attention tracking When the incoming worker reviews the briefing, MediaPipe Face Mesh monitors engagement and focus in real time through the webcam. It tracks head pose, eye gaze direction, and blink rate to estimate attention levels. Each briefing item is presented one at a time. If the worker’s attention drops below threshold on a critical item, the system flags it. After the full review, missed items are resurfaced. Example: “Your attention at 23% during the bearing temperature warning. Here it is again.” 3) Knowledge graph Every briefing feeds a persistent knowledge base. The NER-tagged entities from Gemini link issues to specific machines and part numbers across shifts. When Machine 7’s bearing temperature is mentioned across three consecutive shifts, Vigil surfaces it as a recurring pattern. Over time, tribal knowledge stops living in people’s heads and becomes institutional memory.

How we built it

Backend: Python FastAPI + SQLite

AI structuring: Gemini 2.0 Flash with a prompt that extracts machine IDs, categories (safety/maintenance/quality/production), severity, and action items. If there’s no API key, it falls back to a rule-based keyword parser.

Frontend: React + Vite + Tailwind The review flow is the main feature. It’s a card UI showing one item at a time, with a sidebar that shows live engagement gauges. Framer Motion handles the smooth transitions.

Attention tracking: MediaPipe Face Mesh runs client-side through the webcam, tracking 468 facial landmarks to compute head pose estimation, eye gaze direction, and blink rate. These signals feed into an engagement scoring algorithm that produces real-time focus and attention metrics. When a review is complete, the system cross-references attention scores against severity to determine which critical items need resurfacing.

Audio Briefings: ElevenLabs text-to-speech generates natural-sounding audio for each briefing item, so incoming workers who are hands-busy on the floor can listen to the handoff instead of reading it. The audio playback also integrates with the attention tracking, so the system knows if a worker tuned out during an audio briefing too.

Voice input: Web Speech API for hands-free speech-to-text.

Data pipeline: Every structured briefing also updates the knowledge graph, tracking machine-specific issues, counting repeats, and surfacing recurring patterns by frequency.

Challenges we ran into

MediaPipe attention scoring: MediaPipe gives raw facial landmarks, not an attention score. I had to build the engagement algorithm myself by combining head pose angle (are they looking at the screen), eye aspect ratio (are their eyes open), and gaze direction into a single 0 to 1 metric. Getting the weights to feel accurate without being jittery took a lot of trial and error.

Making the demo obvious: The pitch only works if the judge feels the “oh wow” moment. I designed the flow so it naturally catches someone zoning out and then resurfaces the missed item without me needing to over-explain it.

Messy briefings: Real shift notes are rambling and full of shorthand. Getting Gemini to reliably extract IDs and severity took a bunch of prompt iterations. The fallback parser covers weird outputs.

Building solo: Full stack app plus AI plus real-time webcam tracking plus a polished UI as one person means you have to cut hard. I dropped features like collaboration and audio playback to focus entirely on the attention-resurfacing loop, since that’s what wins demos.

Accomplishments that we're proud of

  • Built a complete attention-aware review system as a solo hacker in 48 hours, from backend API to real-time webcam tracking to polished UI.
  • The demo sells itself: judges put on the headset, zone out on an item, and the system catches them. No explanation needed.

What we learned

  • Writing the pitch before the code changed everything. The demo moment dictated the architecture.
  • Attention tracking turns information from “something you saw” into “something you actually understood.”
  • Manufacturing is insanely underserved by modern software. The people doing the real work usually have the worst tools.

What's next for Vigil

  • Mobile app so workers can record and review on the floor
  • Trend analysis across weeks, not just back-to-back shifts
  • Integrating an existing word order management system so flagged items auto-create work orders
  • Team comprehension profiles so managers can see patterns and improve training

Built With

Share this project:

Updates