NewSKOOL
"Learning in a New Dimension" — Team CheatCode
Inspiration
Education is still flat in a world that isn't.
Every student learns differently, yet classrooms deliver the same content, the same way, to everyone. We kept coming back to one statistic that stopped us in our tracks:
Students retain $65\%$ more information through immersive and visual learning compared to traditional methods — and XR technologies drive up to a $400\%$ increase in engagement.
We were inspired by a simple question:
What if your AI tutor could see what you see?
Not just answer your question — but understand your entire context. The textbook in front of you. The diagram on the board. The sketch you just drew with your own hand. We wanted to build something that makes every student feel like they have a personal genius sitting next to them, not a chatbot on a screen.
The moment we put on a Meta Quest and looked at the world through passthrough, we knew — this is where AI education belongs.
What it does
NewSKOOL ** is a mixed reality educational assistant running natively on Meta Quest. Powered by **Amazon Nova, it has four core features:
1. 🧠 Hazel AI — Context-Aware Vision Tutor
Our flagship feature. The student presses a button, asks any question out loud, and Amazon Nova 2 Lite simultaneously analyses:
- The student's live voice command
- A real-time passthrough camera frame of their physical environment
Nova sees what the student sees and responds with a spoken, context-aware answer — aware of the textbook, diagram, or object directly in front of them. Hazel responds natively in $50+$ languages.
The intelligence behind Hazel can be expressed simply:
$$\text{Response} = f(\text{Voice Command}, \text{Visual Context})$$
Where $f$ is Amazon Nova's multimodal reasoning engine — understanding both inputs together, not separately.
2. 🎬 AI Video Generator
Students say "show me a video on Newton's second law" and an educational video spawns as a large mixed reality screen floating in their AR/MR space. This bridges the gap between static slides and expensive video content, at a fraction of the cost of alternatives.
Newton's Second Law, visualised: $$\vec{F} = m\vec{a}$$
3. 🧊 2D Sketch to 3D Conversion
Students draw a 2D sketch — Meshy AI converts it into a fully rendered 3D model that appears floating in their AR/MR environment in real time. Abstract concepts become tangible objects you can walk around.
4. 🧑🏫 3D Avatar Personal Tutor
A personalised AI tutor with a face and voice, fully aware of the student's surroundings via the passthrough API — making AI feel human, present, and immersive.
How we built it
Our stack was chosen for one reason: lowest possible latency on real XR hardware.
| Layer | Technology |
|---|---|
| Headset | Meta Quest (Passthrough Camera API) |
| Core AI Brain | Amazon Nova 2 Lite (Vision + Language) |
| Voice Input | Whisper STT |
| Voice Output | Neural TTS → AudioPlayer |
| 3D Generation | Meshy AI API (Image to 3D) |
| Engine | Unity + Meta XR SDK |
| Language | C# with coroutine-based async pipelines |
The Nova Vision Pipeline
Every request follows this flow:
Microphone → STT → Transcription
↓
Passthrough Camera → Base64 JPEG (512×512)
↓
Amazon Nova 2 Lite API
┌─────────────────────┐
│ "role": "user" │
│ content: [ │
│ { text: query }, │
│ { image: frame } │
│ ] │
└─────────────────────┘
↓
Spoken AR Response (TTS)
The Nova API payload combines voice and vision in a single multimodal request:
{
"model": "nova-2-lite-v1",
"messages": [{
"role": "user",
"content": [
{ "type": "text", "text": "" },
{ "type": "image_url", "image_url": {
"url": "data:image/jpeg;base64,"
}}
]
}],
"max_tokens": 1024,
"temperature": 0.7
}
The Meshy 3D Pipeline
Passthrough Camera captures student's sketch
↓
Meshy Image-to-3D API
↓
Poll for generation status
↓
Download GLB / OBJ model file
↓
Instantiate 3D model in AR scene
at correct world-space position
Challenges we ran into
🔴 JSON Serialisation Silent Failures
Our early Nova requests would randomly return 400 Bad Request with
no obvious cause. After debugging, we discovered that Whisper
transcriptions containing characters like \n, \t, or \" were
being injected raw into our hand-built JSON payload, silently
corrupting it.
We replaced the naive .Replace("\"", "\\\"") with a full
RFC 8259-compliant serialiser handling all control characters:
$$\text{For any } c < U+0020: \quad \texttt{\u{hex}}$$
🔴 Passthrough Camera Frame Timing
Capturing a stable frame from the Quest passthrough pipeline at the exact right moment required careful coroutine synchronisation. We had to:
- Wait for the passthrough stream to be confirmed playing
- Yield
WaitForEndOfFrame()to ensure GPU readback was complete - Handle resolution-zero edge cases with graceful fallback textures
🔴 Nova max_tokens Cutoff
Early responses were being truncated mid-sentence because the default
of 300 tokens is far too low for educational explanations.
We raised this to 1024, giving responses room to breathe:
$$300 \text{ tokens} \approx 225 \text{ words} \quad \xrightarrow{\text{fix}} \quad 1024 \text{ tokens} \approx 768 \text{ words}$$
🔴 TTS + Video Synchronisation
When a student triggered video playback, the video would start while
Hazel was still speaking — creating an audio clash. We built a
configurable videoDelayAfterTts float so the video always waits
for speech to complete before appearing.
🔴 Meshy Async Polling on Quest
Meshy's Image-to-3D API is asynchronous — you submit a job and poll for completion. Implementing a non-blocking polling coroutine that didn't stall Unity's main thread while waiting for 3D generation (which can take 15–60 seconds) required a dedicated coroutine architecture with timeout handling.
🟡 3D Model World Placement
Spawning Meshy-generated models at the correct position relative to the user's head pose required converting from camera-local space to Unity world space:
$$P_{\text{world}} = T_{\text{camera}} \cdot P_{\text{local}}$$
Getting this transform right so the 3D model appears naturally in front of the student — not behind them or at the origin — took significant iteration.
Accomplishments that we're proud of
1 .Full end-to-end voice → vision → spoken response pipeline working on real Quest hardware using Amazon Nova — not a simulation, not a screen recording, a real device
- Nova sees the student's actual world through the passthrough camera — not a description of it, the live frame itself
- Sketch-to-3D in AR is something very few teams have ever demonstrated running live on a headset — watching a drawing become a 3D object you can walk around is genuinely magical
- Four AI-powered features unified under one coherent educational vision and one codebase
5.Sub-2-second Nova response latency on-device for most queries, making the interaction feel genuinely conversational
6.50+ language support out of the box via Nova — making NewSKOOL accessible to students globally, not just English speakers
What we learned
Amazon Nova is Remarkably Scene-Aware
Nova's multimodal understanding of real-world scenes exceeded our expectations. Even at $512 \times 512$ JPEG resolution — necessary to keep payload sizes small for on-device performance — Nova correctly identified handwritten text, printed diagrams, 3D objects, and spatial relationships between items in the scene.
XR Latency is Physical, Not Just Technical
On a screen, a 2-second API response feels acceptable. In a headset,
it breaks immersion completely. Every millisecond we optimised
mattered — reducing image size, compressing at 85% JPEG quality,
and raising max_tokens only to what was needed were all meaningful
on-device wins.
Error Handling Is the Product
The gap between a demo that works once and a product that works reliably is entirely made of error handling. Null camera buffers, API rate limits, malformed JSON, empty transcriptions — every one of these would silently break the experience without robust guards. We learned to treat every failure mode as a first-class feature.
A Clear Use Case Makes Everything Easier
Choosing education as our focus made every technical decision clearer. It told us exactly what to optimise (latency, accuracy, language coverage), what to cut (unnecessary UI), and how to frame every feature for maximum impact.
Multimodal AI Changes What Questions Students Can Ask
Traditional AI tutors answer questions about text. Hazel answers questions about reality. The difference in what students can ask — "what is this chemical formula on my worksheet?", "explain what I'm looking at in this diagram" — is qualitatively different and far more powerful.
What's next for NewSKOOL
🎯 Near Term
Amazon Nova 2 Sonic integration for fully real-time speech-to-speech conversation with Hazel — eliminating the STT → Nova → TTS chain entirely and achieving true conversational latency
Nova Web Grounding (
nova_groundingsystem tool) so Hazel can search the web in real time — answers will always reflect current knowledge, not just training dataCurriculum Upload — students upload their syllabus PDF and Hazel answers only within their specific course context, preventing off-topic responses
🚀 Medium Term
Multi-subject video library — expanding beyond Newton's Second Law to AI-curated videos across:
- Physics: $E = mc^2$, $\nabla \cdot \vec{E} = \frac{\rho}{\varepsilon_0}$
- Chemistry: molecular bonding visualised in 3D
- Mathematics: $\int_a^b f(x)\,dx$ rendered as AR graphs
- Biology: cell structures as interactive 3D models
Multiplayer Study Rooms — multiple Quest users sharing the same AR space with a shared Hazel tutor, enabling collaborative AI-assisted learning
🌍 Long Term
School partnerships — deploying NewSKOOL in underfunded schools where a $500$ headset replaces thousands of dollars of lab equipment via AR simulation
Student progress tracking — Nova analyses what concepts a student struggles with over time and adapts Hazel's teaching style accordingly
$$\text{The future of education} = \text{Every student} \times \text{Personal AI tutor} \times \text{Their real world}$$
Let's revolutionise how the world learns. 🌍
Built With
- 2
- ai
- amazon
- amazon-nova-2-lite
- amazon-nova-2-lite-50+-language-support
- amazon-nova-2-lite-ar-environment-understanding
- amazon-nova-2-lite-base64-image-input
- amazon-nova-2-lite-context-aware-ai
- amazon-nova-2-lite-educational-response-generation
- amazon-nova-2-lite-image-understanding
- amazon-nova-2-lite-json-chat-completions-api
- amazon-nova-2-lite-low-latency-inference
- amazon-nova-2-lite-multi-turn-instruction-following
- amazon-nova-2-lite-multilingual-response
- amazon-nova-2-lite-passthrough-frame-interpretation
- amazon-nova-2-lite-real-time-scene-analysis
- amazon-nova-2-lite-voice-+-vision-fusion
- amazon-nova-api-openai-compatible-interface
- amazon-nova-multimodal-vision
- android
- apimeshy
- c#
- json
- lite
- meta
- mixed
- nova
- ovrsimplejson
- quest
- reality
- unity
- unitywebrequest
- xr
Log in or sign up for Devpost to join the conversation.