FocusAR: Enhancing Attention with Augmented Reality

Category: Health | AR | AI

Team: FocusAR Team


Overview

FocusAR is an innovative augmented reality (AR) application designed to assist individuals who frequently zone out or have attention deficit hyperactivity disorder (ADHD). It records ongoing conversations and displays key points from current and previous discussions on the opposite side of the screen, ensuring users stay engaged and informed.


Features

Real-Time Conversation Summaries

FocusAR captures and processes conversations in real time, presenting concise summaries to help users maintain focus.

Historical Conversation Access

Users can access summaries of past conversations, aiding in recalling important details and reducing the need for repetition.

Augmented Reality Interface

The app overlays conversation summaries onto the user's content, allowing seamless integration into daily activities without disrupting the natural flow of interactions.

Context-Aware AI

FocusAR uses the Gemini API, chosen for its affordability and efficiency, to power its AI. Custom memory files help the AI track conversation context, ensuring accurate and consistent summaries as the conversation progresses.


How It Works

  1. Recording Conversations: FocusAR uses speech recognition to capture ongoing conversations.
  2. Summarization with Gemini API: The app processes the transcribed text and generates concise summaries highlighting key points.
  3. Context Tracking: Custom memory files ensure the conversation is retained for accurate and consistent summaries.
  4. AR Display: Summaries are displayed on the opposite side of the screen using augmented reality technology, allowing users to follow along without distraction.

Tech Stack

  • Gemini API – The core AI engine for real-time conversation summarization and contextual understanding.
  • React + Vite – The frontend framework that delivers a fast and responsive AR interface for displaying conversation summaries.
  • PyDub & PyAnnote – PyDub processes audio data efficiently, while PyAnnote enables speaker diarization, helping to distinguish between different speakers.
  • FastAPI & Uvicorn – Used to build and deploy backend services with high performance and scalability.
  • Firebase – A real-time cloud database that stores conversation data and ensures smooth synchronization across devices.
  • face-api.js – A JavaScript library for facial detection and recognition, enhancing user interaction through facial analysis.
  • React Speech Recognition – Enables voice input and processing, allowing seamless hands-free interaction with the application.

Development Process

FocusAR was developed during the Unction 2024 Hackathon, during which our team had only 36 hours to bring the idea to life. We focused on collaboration and rapid iteration, and each team member contributed expertise in AI, AR development, and design, allowing us to tackle different aspects of the project simultaneously.

Steps We Took:

  1. Initial Concept & Design: We brainstormed the app's core features and defined the key use cases, focusing on helping individuals with ADHD stay engaged in conversations.
  2. Technology Exploration: Initially, we aimed to run LLaMA on a local host for privacy reasons, but we pivoted to OpenAI's ChatGPT due to performance trade-offs. However, after assessing the cost, we ultimately switched to Gemini for a more cost-effective solution.
  3. Database Selection: We experimented with AWS for the first time but realized there was still much to learn to use it effectively, so we opted for Firebase due to its ease of integration and real-time capabilities.
  4. Rapid Prototyping: We used PyDub & PyAnnote to coert the text to speech.
  5. AI Integration: We integrated the Gemini API for conversation summarization and created custom memory files to ensure accurate and consistent summaries.
  6. User Interface: We designed the AR interface to be intuitive and unobtrusive, allowing users to follow along with conversation summaries without distraction.
  7. Testing & Refinement: With limited time, we tested and refined the app to ensure it met the needs of users with attention-related challenges.

The fast-paced nature of the hackathon pushed us to work efficiently, and our team's strong collaboration helped us achieve our goal within the tight deadline. We were thrilled to win Best Health Hack for our innovative approach to leveraging AR and AI in inclusion tech.


Future Plans

  • Customizable Summary Formats: Allow users to adjust the presentation of conversation summaries.
  • Multilingual Support: Expand the app's abilities to support various languages.
  • AR Device Compatibility: Enhance compatibility with a broader range of AR devices, including smart glasses.
  • Face Recognition & Contact Management: Add the ability to search and save faces in the database, creating new contacts when someone is referred to by name in conversation. Gesture Controls Integrate gesture controls using reliable libraries to improve user interaction with the UI. This will make it more intuitive and enhance control without feeling like a gimmick.

Impact

FocusAR helps users maintain focus and engage in conversations by providing an intuitive, nonintrusive solution to common attention-related challenges. By leveraging AR technology, we have a solution that empowers users to improve information retention and enhance everyday interactions.


Acknowledgments

Special thanks to the McHacks 2025 Hackathon organizers and participants for their invaluable support throughout the competition. Their encouragement allowed us to bring FocusAR to life.

Share this project:

Updates