Inspiration
The inspiration for Marvin began with a simple, shared frustration: the "morning scramble." For millions of adults living with ADHD, this daily routine isn't just an inconvenience; it's a significant source of anxiety and a major hurdle to starting the day. We were driven to find a solution for this.
The core challenge lies in executive dysfunction. Up to 90% of adults with ADHD report persistent, significant issues with inattention and forgetfulness, and 40-60% struggle profoundly with the executive functions, like planning, sequencing, and working memory, that mornings demand. This can manifest as frantically searching for misplaced keys, forgetting to take vital medication, or leaving a work laptop at home. This is often called the "ADHD tax", the real-world cost of late fees, lost opportunities, and the constant, exhausting mental load of trying to compensate.
We saw that existing tools are failing. A to-do list app on your phone is useless when your phone is the item you can't find. We realized the new generation of AR hardware, specifically the Snap Spectacles, presented a unique opportunity. We could build a tool that wasn't reactive (waiting for you to ask) but proactive, an "external executive function" that could see what you see and provide gentle, contextual guidance in your real-world environment. That idea became Marvin.
What it does
Marvin is an AR-powered morning assistant for Snap Spectacles that provides real-time guidance to individuals with ADHD:
- Put on Glasses: The user puts on their Snap Spectacles, and the Marvin Lens activates.
- Scan the Room: As the user looks around their apartment, the live video feed is streamed to the Gemini Live API.
- Real-time Recognition: Marvin identifies key objects. When the user's gaze falls on their laptop, Marvin's voice (powered by ElevenLabs) might say, "Great, you've got your laptop. Don't forget the charger!" An AR overlay gently highlights the laptop.
- Contextual Reminders: The user then looks at their keys. Marvin recognizes them and says, "There are your keys! Ready to go."
- Stateful Memory: The user glances at their medicine bottle. The system queries our Letta Cloud vector database and recognizes this is the first time the user has looked at the medicine today. Marvin says, "Remember to take your medication." If the user looks at the bottle again 10 minutes later, the stateful memory will know it's already been addressed and will remain silent, preventing annoying, repetitive reminders.
For our 36-hour hackathon demo, we successfully trained Marvin to recognize and provide contextual guidance for five essential items: a laptop, keys, medicine, a bowl (for breakfast), and a phone.
How we built it
Our system architecture is a pipeline:
AR Frontend (Dev 1): The core user experience was built in Lens Studio for the Snap Spectacles (2024). This developer was responsible for capturing the camera feed, sending it to the Gemini WebSocket, and rendering the final AR overlays (SpectaclesInteractionKit.lspkg) and audio.
AI Orchestration (Dev 2): This developer lived in Supabase Edge Functions. They built three key serverless functions:
- ai-coordination: The "brain" that receives a trigger from the Lens.
- letta-sync: A function to check the user's "state" against our Letta Cloud and Chroma vector database (e.g., "Has the user seen their keys yet today?").
- voice-synthesis: A function that takes a text prompt (e.g., "You found your keys!") and pipes it to the ElevenLabs API to generate natural-sounding audio.
Backend & Database (Dev 3): This developer set up our entire Supabase backend, including the database schema (for user state, object logs), Realtime subscriptions, and RLS (Row Level Security) policies to ensure user data was secure.
TDD & DevOps (Dev 4): This developer was our guardian. They set up the entire Jest testing framework, wrote integration tests for all our Edge Functions (mocking API calls), and managed our CI/CD pipeline and Git merges, ensuring we didn't break the build in our race to the finish.
The data flow is: Spectacles -> Gemini Live -> Lens Studio -> Supabase Edge Function -> Letta Cloud (Memory) -> ElevenLabs (Voice) -> Spectacles (Audio/AR).
Challenges we ran into
- The 36-Hour Clock: Our biggest challenge was time. We had to scope down from 20 recognizable objects to just 5. We had to be ruthless with our tasklist.md, assigning hour-by-hour goals for each developer.
- Massive Integration: We were integrating five brand-new, cutting-edge services (Lens Studio for Spectacles 2024, Gemini Live, Supabase, Letta, and ElevenLabs). Getting them all to "talk" to each other was a massive challenge. We spent hours debugging asynchronous function calls and mismatched API contracts.
- Achieving Low Latency: For an AR assistant to be helpful, it must be real-time. A 2-second delay would make the product unusable. We had to optimize every step, from the Gemini WebSocket connection to the Edge Function cold starts, to get the round-trip latency under our 100ms goal.
- New Hardware: The Snap Spectacles (2024) are a new developer platform. We had to learn on the fly, experimenting with Lens Studio's InternetModule and Remote Service Gateway to get the external API calls working correctly.
Accomplishments that we're proud of
- Shipping a Fully-Functional E2E Product: In 36 hours, we didn't just build a single feature; we built a complete, end-to-end product with an AR frontend, a serverless AI backend, stateful memory, and a voice interface.
- Sub-100ms Latency: Achieving this speed was a huge technical win. It proved that real-time, AI-powered AR assistance is not just a future dream but is possible today.
- The "Magic Moment": The first time we put on the glasses, looked at a set of keys, and heard Marvin's voice instantly say, "There are your keys!" was an incredible moment. It felt less like a hack and more like a real product.
- Disciplined Workflow: Following our TDD (Test-Driven Development) plan, even in a fast-paced hackathon, saved us. Our "Dev 4" (testing) built a solid foundation that allowed the other three developers to integrate their complex pieces without collapsing the whole structure.
What we learned
- Supabase is a Perfect AI "Glue" Layer: Supabase Edge Functions were the perfect tool for this. They are incredibly fast, scalable, and ideal for orchestrating multiple third-party AI APIs (vision, memory, voice) in a single, serverless workflow.
- Proactive vs. Reactive: We learned that the true power of AI assistance (especially for ADHD) is being proactive, not reactive. Marvin doesn't wait for you to ask, "Where are my keys?" It sees you looking for them and helps. This is a fundamental shift in the human-computer-interaction paradigm.
- Stateful Memory is Non-Negotiable: The "aha" moment in our design was adding Letta Cloud. Without it, Marvin would just be an "object recognizer." With stateful memory, it becomes a true assistant that knows not to repeat itself and understands the context of your morning.
What's next for Marvin
Marvin is a powerful product, and we're excited about its future. Our next steps are:
- True Personalization: Allow users to train Marvin on their own unique items (a specific water bottle, a backpack, a notebook) and record their own voice prompts.
- Deeper Task Sequencing: Move beyond simple object recognition to understanding sequences. For example, "You've got your laptop, but you haven't put it in your bag yet," or "You picked up your medicine bottle but you haven't been to the kitchen to get water."
- Calendar Integration: Connect Marvin to the user's Google Calendar. (e.g., "I see you have your gym bag, but you don't have a workout scheduled today. Did you mean to grab your work laptop?").
- Expand Beyond Mornings: Adapt the system to help with other executive-functioning tasks, like following a recipe, packing for a trip, or completing a multi-step work task.
Built With
- chroma
- elevenlabs
- gemini
- glsl
- javascript
- jest
- kotlin
- lens
- letta
- metal
- snap
- supabase
- swift
- ts-jest
- typescript

Log in or sign up for Devpost to join the conversation.