CHITTI: AI-Powered Mixed-Reality Assistant

Inspiration

People with special needs often struggle to access real-time, context-aware information in their surroundings. Additionally, language barriers in multicultural environments create communication challenges. Existing digital assistants lack intuitive and adaptive support for real-world tasks. Inspired by these challenges, we envisioned CHITTI—an AI-powered mixed-reality assistant that enhances accessibility, interaction, and guidance.

What We Learned

Building CHITTI taught us valuable lessons in:

  • Vision Intelligence – Implementing real-time object recognition to provide relevant contextual responses.
  • Speech and Language Processing – Developing seamless real-time transcription and translation for multilingual users.
  • User Experience in XR – Crafting an intuitive and immersive interface in Meta Quest 3 for accessibility.
  • System Integration – Combining AI models with Unity, Azure Speech-to-Text, and OpenXR to create a seamless experience.

How We Built It

CHITTI was developed using a combination of cutting-edge technologies and AI models:

  • Tech Stack:

    • Unity 2022.3 LTS for building the XR experience.
    • Oculus Quest 3 as the hardware platform.
    • Microsoft Azure Speech-to-Text SDK for real-time speech recognition.
    • TextMeshPro for rendering subtitles in XR.
    • C# for application development.
    • Blender for 3D modeling.
    • Gemini AI for intelligence and contextual understanding.
    • OpenXR for XR interaction and integration.
  • Process:

    1. Users interact with CHITTI through speech.
    2. Audio is processed via Azure Speech-to-Text for real-time transcription.
    3. Gemini AI performs contextual understanding and response generation.
    4. Information is displayed in XR, providing real-time object recognition, translation, or step-by-step guidance.

Challenges We Faced

Building CHITTI came with several challenges:

  • Real-Time Processing – Ensuring fast and accurate transcription, translation, and response generation.
  • Vision Intelligence – Developing an efficient object recognition system for diverse real-world scenarios.
  • XR Integration – Optimizing UI/UX for immersive and intuitive interaction within Meta Quest 3.
  • Latency Issues – Balancing response speed with AI model accuracy to provide a seamless user experience.
  • Multilingual Support – Handling diverse accents, dialects, and languages for effective communication.

Conclusion

CHITTI is a step toward making digital assistants more interactive, intelligent, and accessible. By leveraging AI and XR, we created a system that understands users’ surroundings, provides real-time insights, and enhances everyday interactions. This project has been a journey of innovation, problem-solving, and pushing the boundaries of technology to create meaningful impact.

Share this project:

Updates