Inspiration
The need for real-world practice in oral exams, interviews, and public speaking often leaves many people feeling unprepared and anxious. We wanted to create a safe, immersive space for students and professionals to build confidence and perfect their communication skills. The combination of VR and conversational AI allows us to create dynamic, real-time interactions that mirror real-world scenarios, offering practical preparation without the pressure.
What it does
Our VR Simulation for Oral Exams and Interviews is an innovative tool designed to help users prepare for high-pressure speaking situations such as job interviews and academic oral exams. By offering an immersive, interactive environment, the simulation provides a realistic platform where users can practice, refine, and improve their verbal communication and performance skills in a variety of professional and academic contexts.
How we built it
We combined the power of conversational AI with virtual reality to create a highly interactive environment. The project was developed using Unity XR for building the immersive VR experience, and we integrated ChatGPT to power the real-time conversational AI interactions. For speech recognition (STT) and text-to-speech (TTS), we utilized Google's TTS & STT services, ensuring accurate and natural-sounding responses. We also incorporated META XR AUDIO for spatial audio, enhancing the realism of the simulations. For realistic lip-syncing, we used the Salsa SDK, and *Ready Player Me * provided customizable avatars, allowing users to personalize their virtual presence.
Challenges we ran into
One of the biggest challenges was integrating seamless conversation flows in the VR environment, especially dealing with response times from ChatGPT. Initially, there were delays in generating realistic answers and maintaining the fluidity of conversation, which required extensive prompt engineering to get the right responses. It took time to find the most suitable GPT model for real-time interaction. We also faced challenges with finding and integrating the fastest TTS (Text-to-Speech) and STT (Speech-to-Text) systems to ensure smooth, natural communication.
Accomplishments that we're proud of
We're proud of building a platform that can genuinely help people improve their oral communication skills. After overcoming response time challenges, we significantly reduced the delay, resulting in fast, realistic conversational exchanges. The successful integration of high-speed TTS, STT, and conversational AI in an immersive VR environment was a huge milestone, making the user’s experience feel more natural and responsive.
What we learned
We learned a great deal about prompt engineering and the importance of selecting the right GPT models to match our use case. Optimizing for both realism and speed was crucial for delivering a seamless user experience. Additionally, we deepened our understanding of the challenges in developing for VR, including optimizing speech recognition and creating engaging, interactive environments.
What's next for S29 - O4A VRita
Next, we plan to expand the range of scenarios offered. We're also aiming to support multiple languages to make the platform more accessible globally. In addition, we want to enhance the feedback system to provide more detailed, actionable reports for users after each session. We envision adding AI-driven adaptive learning that evolves with the user’s progress, offering even more personalized guidance. We plan to add different interview/exam environments and multiple examiner scenarios.
Log in or sign up for Devpost to join the conversation.