Inspiration

As a lifelong fan of Iron Man, I've always been fascinated by JARVIS—the epitome of a sci-fi AI assistant that not only understands but also anticipates needs in real-time. My goal with GARVIS was to turn that cinematic vision into reality. The advent of the Gemini API has finally made this possible, overcoming previous limitations like the rate limits of GPT-4 Vision and the accuracy issues of other image models. GARVIS isn't just a step towards the future—it's a leap towards making advanced mixed reality assistants accessible to everyone.

What it does

GARVIS transforms the concept of virtual assistants with its groundbreaking integration of augmented reality (AR) and artificial intelligence (AI). This platform goes beyond simple voice commands and text responses; it interacts with its environment visually and audibly. Whether you need navigation aids, instructional overlays, or complex computational assistance, GARVIS perceives, analyzes, and augments your reality to provide unmatched interactive experiences.

How we built it

The construction of GARVIS involved a synergy of several advanced technologies and platforms:

  • Unity Engine: Served as the backbone for creating immersive AR experiences.
  • C# Scripts: Powered the core logic and interaction mechanisms.
  • Meta Mixed Reality SDK & Meta Voice SDK: Enabled robust mixed reality and voice processing capabilities.
  • Gemini Vision API & Gemini Pro API: These APIs were crucial for real-time image analysis and conversational intelligence, pushing the boundaries of what's possible in AI.
  • Python, Flask, and ngrok: These tools were used to establish a reliable server environment for handling API requests.
  • Meta Quest 3: Provided the hardware foundation for deploying this sophisticated AR application.

Challenges we ran into

Adapting to the privacy constraints imposed by Meta, particularly the prohibition against capturing passthrough images directly, posed a significant challenge. We innovated a workaround by capturing images through Oculus casting to a PC, allowing us to process visual information without breaching user privacy.

Accomplishments that we're proud of

GARVIS is a pioneer—It's the first application of its kind to integrate the Gemini Vision API into a mixed reality environment:

  • Innovative Integration: Seamlessly blending AI with AR to provide a holistic and interactive user experience.
  • Visual Intelligence: Utilizing cutting-edge image analysis to understand and interact with the user's environment.

What we learned

This project deepened our understanding of the complexities involved in merging AI with AR/VR technologies. We gained valuable insights into user privacy considerations and explored innovative solutions to integrate real-time AI processing within these constraints.

What's next for GARVIS

  • Enhancing Privacy Features: We plan to continue advancing our data privacy protocols to ensure user trust and safety.
  • Expanding AI Capabilities: Future updates will include even more sophisticated AI features to enhance the utility and responsiveness of GARVIS.
  • Broadening Market Reach: We aim to make GARVIS accessible in diverse sectors such as education, healthcare, and enterprise, transforming how professionals interact with technology in their fields.

Forget about paying $700 plus $24/month for a Humane Pin. With GARVIS, you gain access to an all-encompassing AI assistant that doesn't just tell you but shows you—bringing the power of AR visualizations and interactive assistance into your hands at a fraction of the cost.

Built With

Share this project:

Updates