CHITTI: AI-Powered Mixed-Reality Assistant
Inspiration
People with special needs often struggle to access real-time, context-aware information in their surroundings. Additionally, language barriers in multicultural environments create communication challenges. Existing digital assistants lack intuitive and adaptive support for real-world tasks. Inspired by these challenges, we envisioned CHITTI—an AI-powered mixed-reality assistant that enhances accessibility, interaction, and guidance.
What We Learned
Building CHITTI taught us valuable lessons in:
- Vision Intelligence – Implementing real-time object recognition to provide relevant contextual responses.
- Speech and Language Processing – Developing seamless real-time transcription and translation for multilingual users.
- User Experience in XR – Crafting an intuitive and immersive interface in Meta Quest 3 for accessibility.
- System Integration – Combining AI models with Unity, Azure Speech-to-Text, and OpenXR to create a seamless experience.
How We Built It
CHITTI was developed using a combination of cutting-edge technologies and AI models:
Tech Stack:
- Unity 2022.3 LTS for building the XR experience.
- Oculus Quest 3 as the hardware platform.
- Microsoft Azure Speech-to-Text SDK for real-time speech recognition.
- TextMeshPro for rendering subtitles in XR.
- C# for application development.
- Blender for 3D modeling.
- Gemini AI for intelligence and contextual understanding.
- OpenXR for XR interaction and integration.
- Unity 2022.3 LTS for building the XR experience.
Process:
- Users interact with CHITTI through speech.
- Audio is processed via Azure Speech-to-Text for real-time transcription.
- Gemini AI performs contextual understanding and response generation.
- Information is displayed in XR, providing real-time object recognition, translation, or step-by-step guidance.
- Users interact with CHITTI through speech.
Challenges We Faced
Building CHITTI came with several challenges:
- Real-Time Processing – Ensuring fast and accurate transcription, translation, and response generation.
- Vision Intelligence – Developing an efficient object recognition system for diverse real-world scenarios.
- XR Integration – Optimizing UI/UX for immersive and intuitive interaction within Meta Quest 3.
- Latency Issues – Balancing response speed with AI model accuracy to provide a seamless user experience.
- Multilingual Support – Handling diverse accents, dialects, and languages for effective communication.
Conclusion
CHITTI is a step toward making digital assistants more interactive, intelligent, and accessible. By leveraging AI and XR, we created a system that understands users’ surroundings, provides real-time insights, and enhances everyday interactions. This project has been a journey of innovation, problem-solving, and pushing the boundaries of technology to create meaningful impact.
Log in or sign up for Devpost to join the conversation.