Setup
Translation from Kannada to English (example) (any-to-any)
Live detection of environment and surroundings (Vision Intelligence)
View of a Virtual Theatre in Meta Quest

CHITTI: AI-Powered Mixed-Reality Assistant

Inspiration

People with special needs often struggle to access real-time, context-aware information in their surroundings. Additionally, language barriers in multicultural environments create communication challenges. Existing digital assistants lack intuitive and adaptive support for real-world tasks. Inspired by these challenges, we envisioned CHITTI—an AI-powered mixed-reality assistant that enhances accessibility, interaction, and guidance.

What We Learned

Building CHITTI taught us valuable lessons in:

Vision Intelligence – Implementing real-time object recognition to provide relevant contextual responses.
Speech and Language Processing – Developing seamless real-time transcription and translation for multilingual users.
User Experience in XR – Crafting an intuitive and immersive interface in Meta Quest 3 for accessibility.
System Integration – Combining AI models with Unity, Azure Speech-to-Text, and OpenXR to create a seamless experience.

How We Built It

CHITTI was developed using a combination of cutting-edge technologies and AI models:

Tech Stack:
- Unity 2022.3 LTS for building the XR experience.
- Oculus Quest 3 as the hardware platform.
- Microsoft Azure Speech-to-Text SDK for real-time speech recognition.
- TextMeshPro for rendering subtitles in XR.
- C# for application development.
- Blender for 3D modeling.
- Gemini AI for intelligence and contextual understanding.
- OpenXR for XR interaction and integration.
Process:
1. Users interact with CHITTI through speech.
2. Audio is processed via Azure Speech-to-Text for real-time transcription.
3. Gemini AI performs contextual understanding and response generation.
4. Information is displayed in XR, providing real-time object recognition, translation, or step-by-step guidance.

Challenges We Faced

Building CHITTI came with several challenges:

Real-Time Processing – Ensuring fast and accurate transcription, translation, and response generation.
Vision Intelligence – Developing an efficient object recognition system for diverse real-world scenarios.
XR Integration – Optimizing UI/UX for immersive and intuitive interaction within Meta Quest 3.
Latency Issues – Balancing response speed with AI model accuracy to provide a seamless user experience.
Multilingual Support – Handling diverse accents, dialects, and languages for effective communication.

Conclusion

CHITTI is a step toward making digital assistants more interactive, intelligent, and accessible. By leveraging AI and XR, we created a system that understands users’ surroundings, provides real-time insights, and enhances everyday interactions. This project has been a journey of innovation, problem-solving, and pushing the boundaries of technology to create meaningful impact.