Inspiration

Technology has always been about making life easier, but for many — especially older adults, people with dementia, or those with disabilities — it often remains inaccessible and stands as a barrier rather than a bridge. Remembering past conversations, recalling important details, or even keeping up with social interactions can be overwhelming. What if technology could do more than just provide information, but also help you connect with others on a deeper level, recall cherished memories, and navigate conversations effortlessly?

By combining breakthrough AR/VR technology with cutting-edge AI agent networks, we envisioned a system that doesn’t just respond to commands but actively helps users engage with the world around them, without requiring them to remember every detail themselves.

What it does

That’s why we built JARVIS—an accessible, intelligent assistant designed to enhance social connection and memory recall. Whether you're having a conversation, trying to remember a name, or looking for common interests to spark a discussion, JARVIS is always there. It scours the internet to find mutual interests, recalls past conversations, and provides helpful context, ensuring that every interaction feels effortless and deeply personal.

For individuals with dementia, JARVIS serves as a memory aid, helping them remember people, past experiences, and meaningful connections. For those with physical disabilities, it provides a seamless, hands-free way to interact with the world. By bridging the gap between human memory and technological capability, JARVIS doesn’t just make life easier—it makes life more fulfilling.

How we built it

At its core, JARVIS gives the Vision Pro the ability to truly understand its environment by combining speech recognition, facial recognition, and memory recall into a specialized agentic AI reasoning system. Whenever you engage in a conversation, JARVIS leverages the Vision Pro’s eye-tracking system to identify the person you’re speaking with. If you've met them before, JARVIS instantly retrieves a summary of your past interactions. If not, it automatically scrapes the internet in real-time to gather relevant information — helping you bring up key details effortlessly.

JARVIS remembers everyone you meet, storing conversation transcripts and contextual knowledge it gathers from the web. Using Retrieval-Augmented Generation (RAG), it pulls in accurate, relevant context from this vast memory to enhance interactions. Beyond conversation, JARVIS responds to custom gestures to activate its agentic system, which intelligently determines the best response by assigning tasks to specialized agents, including:

  • A manager that evaluates each request and directs it to the right agent
  • An administrative agent with access to APIs for tasks like automatically sending text messages
  • A technical expert trained on a corpus of computer science papers
  • A healthcare expert with a database of medical knowledge
  • A memory retrieval agent that references your past conversations
  • A standard LLM for general reasoning

All of this is seamlessly integrated into a custom augmented reality HUD, naturally overlaying information onto the user’s surroundings. We combined these elements into a low-latency, always-available assistant that understands both spoken and unspoken intent—bridging the gap between human intelligence and technological capability.

Challenges we ran into + Accomplishments that we're proud of

Bringing JARVIS to life came with its share of challenges, from technical hurdles to ensuring the experience felt natural and effortless.

  • Processing Speed vs. AI Depth: AI models can be incredibly powerful, but they often come at the cost of speed. To ensure JARVIS responds instantly while maintaining deep reasoning, we optimized our pipeline by strategically balancing different models. We used ChatGPT’s API when agents required external AI tools, Groq for rapid inference, and a locally run lightweight embedding model to keep our vector database highly efficient with minimal latency.
  • Seamless Multimodal Input: Merging voice, vision, and gesture recognition into a unified system required careful coordination. We fine-tuned how JARVIS prioritizes different inputs, ensuring that it responds dynamically and appropriately based on real-time context.
  • AR Interaction Design: One of our biggest challenges was avoiding information overload. Too much on-screen data could make the experience feel distracting rather than helpful. We iterated on different ways to surface relevant information naturally, ensuring that JARVIS enhances rather than disrupts the user’s field of view.
  • Real-Time Environmental Awareness: Because JARVIS operates without explicit prompts, it needs to be constantly aware of its surroundings while respecting privacy and context. Striking this balance was difficult. In the end, we leaned toward showcasing the full capabilities of the technology, but looking ahead, refining which features to include—and when to activate them—will be a key consideration.

Despite these challenges, we built a system that goes beyond simple commands—JARVIS intuitively bridges the gap between human intent and machine capability, redefining how we interact with technology.

What we learned

What's next for J.A.R.V.I.S. — Augment Human Connection with Infinite Memory

Built With

  • all-minilm
  • all-minilm-l6-v2-agentic-ai:-langchain
  • avfoundation
  • beautiful-soup
  • beautifulsoup-facial-recognition:-dlib
  • cncontact
  • dlib
  • face-recognition
  • face-recognition-frontend:-swiftui
  • groq
  • iris
  • langchain
  • llama
  • llama-transcription:-whisper-(multilingual)-web-scraping:-selenium
  • openai
  • python
  • realitykit
  • selenium
  • sql
  • sql-database:-intersystems-iris
  • swift
  • swiftgestures
  • whisper
Share this project:

Updates