Project Inspiration & Vision

Inspired by Turing City, where AGI is commonplace, and we explored an idea human-centered: a tool designed to help people remember the location of objects of interest and provide agentic guidance back to them with voice and spatial cues.

Given our strong interest in augmented reality (AR) and computer graphics, we focused on creating a digital twin environment that allows users to visualize the locations of previously handled objects. This environment is powered by novel Gaussian splatting technology, enabling an intuitive and immersive spatial memory experience. As we refined the idea, we uncovered several compelling real-world use cases.

Use Cases

  • Everyday users can benefit by remembering the locations of important personal items

  • Senior users can use the app to strengthen memory and improve cognitive ability

  • Users in low-light or hazardous environments can rely on spatial audio and AR guidance

    • Examples include cave divers, mine workers, and first responders

Team Structure & Collaboration

Our team consisted of four members:

  • Two primarily focused on backend

  • Two primarily focused on frontend

Despite these roles, we frequently crossed into each other’s domains—through discussion, collaboration, and the power of vibe coding—to ensure cohesion across the project.

Development Phases

The project was divided into three main phases:

Phase 1: Ideation

  • Created multiple flowcharts for planning and tech-stack clarity

  • Chose sponsor tracks and APIs to guide development

  • Selected:

    • LiveKit for mobile data collection, streaming, and aggregation
    • Overshoot for video-to-text and object recognition
    • PostgreSQL for database
    • Google Gemini for future text analysis, information summary, and voice prompts

Phase 2: Core Development

  • Built a web app to display Gaussian splat results in a 3D rendering. Web app also supports advanced object detection and tracking using Overshoot.

  • Developed a mobile app with AR capabilities to track users' objects of interest and be able to lead users back to the last points when the objects were visible. Implemented a feature to detect objects leaving a user’s hand via camera input.

  • Implemented custom UI/UX flow and from-scratch animations to improve user experience of both the web app and mobile app.

Phase 3: Final Polish

  • Integrated all individual components, and applied a stylish final design.

  • Have a full system of servers, endpoints, and a centralized database to collect and process information needed to localize users and track objects.

  • Refined the mobile web app experience.

Challenges & Solutions

One of our biggest challenges was navigating the documentation for sponsor APIs. Speaking directly with sponsors at their tables proved invaluable for understanding best practices and implementation details. At times, directly adding functionality to sponsor software proved important to progress and provided a strong learning experience.

Another challenge was managing the workload of integrating many complex features. Thanks to strong planning and prior experience in our respective areas, this process went smoothly for the most part. Effective splitting of work, while balancing rest and collaboration during the process, was expectedly challenging, but ultimately highly rewarding.

Reflections & Takeaways

Through this project, we gained a deeper understanding of the current capabilities of AI and the exciting developments just beyond the horizon. Along the way, we also bonded during late-night, snowy walks around campus—often searching for a classroom that wasn’t already taken over by hackers. The technical difficulties of the project, alongside the many internal and external challenges we faced during development, allowed us to truly appreciate the value of "learning by failing" and ultimately learn a lot of new paradigms.

We are confident that this application provides genuine value to its target audience. Overall, the project was a race to the finish line without a second to spare, and we wouldn’t have had it any other way.

Built With

Share this project:

Updates

posted an update

Had a great time building Omni and diving into the LiveKit and Overshoot SDKs at NexHacks.

LiveKit forms the backbone of our unified passive perception and 3D digtal twin reconstruction system; We send the poses and images from iOS ARKit to the GPU server continuously in real time via LiveKit byte stream and both load them into Depth-Anything3, as well as send the images to Overshoot for real time perception of the user's interactions with the world.

Take a look at our iOS integration of the LiveKit SDK in the main branch, and the NodeJS SDK integration in branch overshoot. You can find our usage of Depth-Anything3 for point cloud creation in dap3_web on the main branch; Image files received through LiveKit can easily be fed into the da3_streaming and our dap3.py scripts.

We loved how simple LiveKit makes it to connect devices that are not on the same network, and how easy Overshoot makes it to do VLM inference with huge models like Qwen3VL 30B with sub-second inference times, and no infrastructure provisioning. We are excited about the possibilities for vision on wearable devices.

Log in or sign up for Devpost to join the conversation.