Lucid Prism: A MR LLM-Based Multi-Modal Agent System
Inspiration
The idea for Lucid Prism emerged from everyday scenarios where using a smartphone was inconvenient or impractical—like cooking with messy hands or multitasking. I wanted to create a system that seamlessly blends human expression with machine understanding, leveraging the immersive capabilities of MR (Mixed Reality) and the power of LLMs (Large Language Models). The goal was to design a virtual assistant that feels intuitive, responsive, and truly helpful in mixed environments, pushing the boundaries of what multi-modal interaction can achieve.
What it does
Lucid Prism is a multi-modal assistant system that integrates MR, LLMs, and cloud-based APIs to provide three distinct functionalities:
- Conversational Assistant: Enables seamless voice-based interactions, enhanced by real-time camera views and memory storage for context continuity.
- Spatial Computing Assistant: Allows users to interact with virtual environments by describing objects and generating textures using voice commands.
- Object Transformation Assistant: Quickly transforms real-world objects into virtual assets through image capture, background removal, and 3D reconstruction.
How we built it
Conversational Assistant:
- Leveraged Meta’s Voice SDK for speech-to-text functionality.
- Developed a custom camera access hack to bypass Meta’s restrictions, enabling real-time visual context transmission to the Claude API.
- Integrated a cloud-based memory storage system to maintain conversational continuity by saving user inputs, API outputs, and logs.
- Leveraged Meta’s Voice SDK for speech-to-text functionality.
Spatial Computing Assistant:
- Utilized Meta’s Depth API to extract spatial metadata from Unity scenes.
- Converted scene prefabs into JSON files for LLM processing.
- Integrated Stable Diffusion API to generate and apply textures dynamically to Unity GameObjects.
- Utilized Meta’s Depth API to extract spatial metadata from Unity scenes.
Object Transformation Assistant:
- Created a cloud pipeline for background removal in captured images.
- Used Meshy’s API for fast 3D reconstruction of processed images into virtual objects.
- Created a cloud pipeline for background removal in captured images.
Challenges we ran into
- Camera Access Limitations: Developing a workaround for Meta’s camera restrictions required creative problem-solving and custom hacks to ensure smooth integration.
- Contextual Memory: LLM APIs lack built-in memory capabilities, which necessitated designing a robust cloud-based memory storage system.
- Latency: Real-time interactions with multiple APIs introduced latency issues, requiring optimizations in data transmission and processing.
- Spatial Understanding: Ensuring accurate metadata extraction and meaningful LLM responses based on scene prefabs was a significant technical challenge.
Accomplishments that we're proud of
- Successfully implemented a camera access hack that enhances interaction between the assistant and the user’s environment.
- Designed a memory system that simulates human-like conversational continuity for the assistant.
- Enabled real-time texture generation and application in Unity using voice commands, bridging LLM capabilities with spatial computing.
- Built a robust object transformation pipeline that converts real-world objects into virtual assets efficiently.
What we learned
- The importance of multi-modal interaction in creating intuitive user experiences.
- How to integrate MR, LLMs, and APIs to create a cohesive and functional system.
- Strategies for optimizing real-time interactions with cloud-based services.
- The critical role of context and memory in making virtual assistants feel more human and responsive.
What's next for Lucid Prism
- Enhanced Multi-Modal Interactions: Expanding the assistant’s capabilities to include gesture-based commands and deeper emotional understanding.
- Improved Real-Time Performance: Optimizing latency and streamlining API integration for faster responses.
- User-Centric Design: Conducting user testing to refine the assistant’s functionality and usability further.
- Public Release: Packaging the system as a developer toolkit for broader adoption in the MR and LLM communities.
- Integration with Emerging Technologies: Exploring how the system can leverage advancements in BCI (Brain-Computer Interfaces) and haptics to create even more immersive experiences.


Log in or sign up for Devpost to join the conversation.