Inspiration

Study tracking today requires constant manual logging—starting timers, selecting subjects, tagging sessions. Every interaction pulls students out of focus, and the data it produces is shallow. Time spent alone says very little about the learning journey.

We felt this pain as students ourselves. Studying often meant long, lonely hours with no feedback, no guidance, and no clear signal of progress beyond vague time totals. Even worse, traditional tools ignore context: what material was studied, how engaged we were, or where we struggled.

When we explored Meta Ray-Ban Smart Glasses, we saw a rare opportunity. Hands-free, first-person, and always present, they offered a way to make study tracking automatic and distraction-free. Instead of asking students to log learning, SmartSight observes learning as it happens—quietly and passively.


What it does

SmartSight transforms Meta Ray-Ban Smart Glasses into an AI-powered study coach. This iOS application leverages the glasses’ hands-free camera and microphone to provide seamless, distraction-free analysis, replacing tedious manual tracking with context-aware, first-person perspective (POV) data capture. The core purpose is to automatically log and analyze study sessions so the student can focus on learning.

The system’s contextual AI passively analyzes each study session to continuously understand:

  1. Learning Content — the subject, subtopic, and key visible text from study materials
  2. Learning State — whether the student is actively engaged, passively reviewing, or distracted

Students can also invoke a voice-based AI assistant for immediate, hands-free help, receiving guidance (not direct answers) without needing to touch their phone. Following each session, SmartSight presents a personalized learning dashboard that turns invisible effort into visible insight—showing metrics like time per topic, active vs. passive learning ratio, help count, and distraction count. This detailed analysis helps students truly understand and optimize their learning process.


How we built it

Photo analysis

During a study session, the glasses capture periodic first-person images. Each image is sent to GPT-4.1 Vision through the OpenAI Responses API, where we enforce strict structured output using a JSON Schema instead of free-form text. This gives us consistent, machine-readable “study telemetry” on every capture:

  • topic — high-level subject label (prefer Biology, Chemistry, Math, English)
    Primary routing signal for dashboards, analytics, and tutor framing.

  • subtopic — finer-grained classification (e.g., Algebra – Quadratic Equations)
    Enables more specific coaching prompts and meaningful session breakdowns.

  • extractedText — the most salient text visible in the student’s view
    Anchors the tutor to the exact worksheet, page, or problem being studied.

  • isStudying — whether the image actually shows study material
    Filters noise and prevents false analytics.

  • isActive — problem-solving vs. reading
    Acts as a proxy for learning mode and effort intensity.

  • isDistracted — whether a phone is centrally visible in the view
    Provides a lightweight, countable distraction signal.

Because the output is schema-validated, we can reliably persist it, compute analytics from it, and safely feed it back into the tutor without brittle parsing.

Realtime AI Tutor

Each study session continuously generates structured context from what the student is actually seeing. This context becomes part of the system prompt for the AI and is updated as the study material changes.

We built SmartSight around one core idea: the AI should act like a study coach, not an answer machine.

The Realtime AI Coach is intentionally constrained through prompting to:

  • Avoid giving direct answers
  • Ask guiding and reflective questions
  • Encourage recall and reasoning

Voice-first interaction & highlights

SmartSight is designed to be completely hands-free. We use on-device transcription to detect voice commands during study sessions. Simple phrases like “Highlight” instantly capture the current study context such as image, topic, and subtopic without interrupting focus.

These highlights create a searchable library of important moments, allowing students to revisit exactly what they found important and when.

Measuring learning friction, not just time

Beyond time tracking, SmartSight captures help count—the number of times the AI tutor is invoked during a session. This becomes a powerful signal:

  • Frequent help requests can indicate friction or difficulty
  • Low help usage during long sessions may indicate confidence or passive review

Challenges we ran into

One of the biggest challenges was working with the Meta Wearables Device Access Toolkit so early in its release cycle. Because the SDK was still fresh, there were strict constraints around how much data we could access and how much control we had over core device behaviors.

We explored more “hacky” interaction patterns to work around these limitations, but ran into friction quickly. For example, audio cues triggered during session stream start and photo capture interfered heavily with real-time voice interaction. To make the experience usable, we had to carefully coordinate system states—pausing the 5-second photo capture loop whenever a real-time AI conversation started, and resuming it only after the interaction ended. As a result, visual context could not be updated during live voice interactions, and the AI had to rely on the most recently captured context from before the conversation began.

Realtime interaction itself introduced another layer of complexity. We ran into unexpected audio format mismatches across the mobile client, backend, and AI services that significantly slowed development. Resolving these issues required deep debugging and validating assumptions at every layer of the pipeline.

Latency in photo-based analysis was another major constraint. End-to-end image capture, upload, and AI inference took roughly 6–7 seconds, which made true real-time visual feedback impossible. We experimented with multiple models and iterated extensively on system prompts to reduce latency, but ultimately couldn’t push performance far enough for fully real-time interaction. This remains one of the most important challenges to solve in future iterations of the product.

Despite these obstacles, support from the onsite Meta team and insights gained from forking and adapting a reference Meta iOS repository helped us push far beyond our initial expectations. What started as a tightly constrained prototype evolved into a surprisingly capable, end-to-end system.


Accomplishments that we're proud of

  • Achieved a complete basic loop of fully hands-free study tracking with zero manual logging: Automatic topic detection and session analytics without user input
  • Context-aware AI guidance grounded in real study material and voice interaction
  • A seamless end-to-end system spanning wearables, mobile, backend, and AI

Most importantly, we built something that respects students’ attention and directly addresses their pain points.


What we learned

Product Scoping:
The hackathon reinforced the importance of disciplined scoping under tight timelines. We learned to prioritize perfecting a single end-to-end smart glasses study session before expanding into additional features.

Value Proposition Discovery:
We learned to focus on features made possible by the unique hands-free and first-person POV capabilities of the Meta Ray-Ban glasses—rather than replicating traditional phone-based study apps with a new interface.

We would like to thank Cassio Machado from the Meta team for guiding us through the product ideation process and helping us ground our development decisions in what truly matters for the user.


What's next for SmartSight

Next, we plan to:

  • Expand subject and subtopic coverage with finer granularity
  • Use captured data to deliver insights and guidance for future study planning
  • Continue refining the AI tutoring system prompts for better learning support
  • Improve engagement modeling over longer time horizons
  • Add personalized insights and study recommendations
  • Support multi-device and multi-session use
  • Harden system stability for production-scale deployment
  • Investigate and implement on-device vision model processing to replace or supplement cloud AI to significantly reduce the end-to-end feedback latency.

Our long-term vision is simple: make learning journey visible—without ever interrupting learning itself.

Built With

+ 2 more
Share this project:

Updates