Inspiration

We're both African immigrants. Our parents came to this country and were figuring everything out, new jobs, new culture, new everything. They didn't always have time to sit down and walk us through homework or explain hard concepts. We didn't have tutors. We had to figure it out.

That stuck with us. When we started building at Hack for Humanity we kept coming back to the same question: what if every kid, regardless of where they're from or what school they go to, had access to a teacher that was always there and could actually show them things visually?

That's Luminary.


What it does

Luminary puts a teacher in your space. You say a topic and a teacher appears, speaks to you, and walks you through it with live animations on a floating board. It runs in the browser and in Apple Vision Pro through WebSpatial so the classroom breaks out of the screen and into the room around you.


How we built it

We split frontend and backend and built feature by feature. React, Vite and WebSpatial SDK on the front. Flask on an AMD MI300X GPU on the back. ElevenLabs for voice. Gemini for lessons. Manim for the animations. We took a 3D scan of one of us, rigged it in Mixamo, converted it to USDZ and dropped it into the spatial scene as the actual teacher.


Challenges

Three.js frameworks were outdated and we had to step in ourselves when the fixes kept looping. Getting the visionOS simulator running at all took way longer than it should have. We tried a flipbook method to animate 3D objects inside WebSpatial, tried resizing, preloading, repositioning, nothing worked and we had to find a different path. Syncing Manim with the ElevenLabs voice so they land at the right moment was genuinely painful. Building the solar system with a proper 3D orbital algorithm was hard. Storing recordings with state. All of it at the same time.


Accomplishments we're proud of

A 3D version of one of us is literally standing in the app as the teacher. Manim runs perfectly in sync with the ElevenLabs voice agent. We implemented this many frameworks in 24 hours and it didn't crash. The spatial experience works on visionOS.


What we learned

You learn a framework fastest by watching it fail. We went deep enough on WebSpatial to actually understand how scenes, volumes and objects interact. We learned that when an LLM hits an infinite loop you have to be the one to break it.


What's next

NVIDIA Audio2Face for realistic facial animation synced to the voice. Broader subject coverage. Free access for underfunded schools. YC.


Built With

Share this project:

Updates