Inspiration
Humans are prolific learners; we have evolved over hundreds of thousands of years to pick up new skills and understand novel topics. In current mainstream education, 'learning' is taken to mean reading from a textbook for an hour, then doing exercises which feel pointless. Repetition, repetition, repetition. As kids, we are far too curious to be limited by a linear textbook which likely doesn't align with what our brains are telling us to learn about next. The result? Unmotivated children that are unable to enjoy learning in an intuitive manner.
What it does
Not only does it give children access to an expert tutor that is able to answer their questions and have natural conversations with, but it's able to use an immersive soundscape, along with 3D objects which it is able to manipulate to explain anything from photosynthesis to the life of a shark.
Any 3D object in the scene can be shrunk, grown, rotated or moved further/closer to other objects in the scene by the tutor, and new scenes can be created on the fly.
How we built it
We used Unity with the Meta XR SDK. Interactions and voice are handled through this SDK. Our realtime procedural scenes are all created with generative AI. We generate audio, textures, 3D models and dialogue to create an immersive experience.
To achieve this, we utilise over 5 different generative AI models and the following API providers:
- OpenAI
- StabilityAI
- fal
- ElevenLabs
OpenAI's GPT-4o acts as the primary agent and tutor that orchestrates the whole experience. It has control over the scene and utilises the other models intelligently for the experience.
Challenges we ran into
Two of the biggest challenges we faced were making the interactions feel as natural as possible, as well as decreasing latency as much as possible to help immerse the users in the experience.
When using a large number of APIs, it is very easy to introduce delays, especially if you don't have a smart way of ordering the calls.
As for natural interactions, they were particularly tricky since humans are very used to how interactions with other humans are supposed to go, so when they interact with an 'expert tutor', they expect a very similar experience. We had to make sure latency was low, and ample feedback was given to the user to guide them through the experience.
Accomplishments that we're proud of
To make the usage feel natural, we opted to make voice interaction the primary way of using the app. The user is able to ask questions or tell the tutor what they want to learn about, and ask follow up questions mid-way through explanations. No need to wait until the end of a 1 hour lecture to get clarifications.
Also, the way we have optimised and ordered the usage of generative AI to create novel experiences in a reliable manner has been a big challenge, which we tackled successfully, making the time to first interaction after an audio input <2 seconds (started out at over 15s).
What we learned
The Meta Unity SDK expedites prototyping and getting proof of concept MR apps. It makes it very easy to get a boilerplate project that you can build on top of. We also found out that graphical user interfaces were quite hard to make immersive in MR, making us opt for a voice interaction approach, supported by the Meta Voice SDK.
The Dictation plugin was very useful in letting us get real-time transcription of what the user was saying. The only limitation was that the quality of the transcription decreased in noisy environments, but otherwise worked very well for our purposes.
Another big lesson we took away was prioritisation. We had to scrap a lot of features and functionality that would not have been feasible to implement with the time resource available. The process of doing this in and of itself was very helpful, forcing us to clearly define the scope of our product.
What's next for Learn Anything
Learning reports for parents to update them on how their children is progressing, and how they can assist them further.
Custom, long-term learning paths that get the parents more involved with their children's education. During a session, the tutor is able to keep track of what has been talked about and intelligently steer the conversations. In the future, this functionality could be extended to work across different learning sessions.
More intelligent manipulation of objects by the tutor, allowing it to explain more complex topics by abstracting and using analogies, assisted by a visual representation of how objects how should interact with each other.
Log in or sign up for Devpost to join the conversation.