Inspiration
From studying PHY 211, PHY 212, and EMECH 212 at Penn State, we noticed many students struggle with the course materials because they couldn't visualize words, equations, and diagrams on paper. To help ourselves and our fellow students engage and understand those challenging course materials better, we wanted to build a program that provides a comprehensive visualization of what the bland words and equation mean in real life.
What it does
Using Gemini's API, Houdini, and Meta Quest 3, the program takes live audio inputs from the user and generates, modifies, and/or interacts with objects simultaneously in the virtual playground.
How we built it
We first gathered the API keys from Gemini and tested how we can optimize the output with different prompts. At the same time, two of us were working on setting up the headset and the virtual playground. Finally, we intertwine one section with another so that our product flows smoothly upon demand.
Challenges we ran into
From the basic localization of whisper(we forgot to install ffmpeg and hence have to wait for 8 minute to transcribe a 5 second audio) to the design of structures (couldn't save json file for unknown reason, cost us 2 hours), we've overcome what we've run into, and here we are. Our favourite bug that we encountered would be one of the AI literally returned us "negative 5" when we asked for some axis stuff. lovely result, though, didn't expect to be able to come this far in 24 hours.
Accomplishments that we're proud of
In general, we are proud of the fact that we went from knowing absolutely nothing about VR to finishing this project. We are proud of our determination and our ability to tackle various issues on the journey.
What we learned
In general, we have learned a lot about how LLM models and VR environment works in real life. More importantly, we have learned to debug and problem-solve as young engineers. Behind the scenes, we have learned about different tools and modules programmers incorporate to improve the performance, like CUDA and ffmpeg for example. For the backend, we have learned more about the pricing, usage of LLM as well as the importance of prompt engineering. For the front end, we have learned more about how to better synchronize with the backend and manage data for better presentation.
What's next for xyloweft
After this Sunday, we will continue working on this project to ultimately launch an app for the program. Going forward, we will seek opportunities to collaborate with research professors and industry professionals to enhance the functionality and overall performance of the program.

Log in or sign up for Devpost to join the conversation.