NewSKOOL

"Learning in a New Dimension" — Team CheatCode


Inspiration

Education is still flat in a world that isn't.

Every student learns differently, yet classrooms deliver the same content, the same way, to everyone. We kept coming back to one statistic that stopped us in our tracks:

Students retain $65\%$ more information through immersive and visual learning compared to traditional methods — and XR technologies drive up to a $400\%$ increase in engagement.

We were inspired by a simple question:

What if your AI tutor could see what you see?

Not just answer your question — but understand your entire context. The textbook in front of you. The diagram on the board. The sketch you just drew with your own hand. We wanted to build something that makes every student feel like they have a personal genius sitting next to them, not a chatbot on a screen.

The moment we put on a Meta Quest and looked at the world through passthrough, we knew — this is where AI education belongs.


What it does

NewSKOOL ** is a mixed reality educational assistant running natively on Meta Quest. Powered by **Amazon Nova, it has four core features:

1. 🧠 Hazel AI — Context-Aware Vision Tutor

Our flagship feature. The student presses a button, asks any question out loud, and Amazon Nova 2 Lite simultaneously analyses:

  • The student's live voice command
  • A real-time passthrough camera frame of their physical environment

Nova sees what the student sees and responds with a spoken, context-aware answer — aware of the textbook, diagram, or object directly in front of them. Hazel responds natively in $50+$ languages.

The intelligence behind Hazel can be expressed simply:

$$\text{Response} = f(\text{Voice Command}, \text{Visual Context})$$

Where $f$ is Amazon Nova's multimodal reasoning engine — understanding both inputs together, not separately.

2. 🎬 AI Video Generator

Students say "show me a video on Newton's second law" and an educational video spawns as a large mixed reality screen floating in their AR/MR space. This bridges the gap between static slides and expensive video content, at a fraction of the cost of alternatives.

Newton's Second Law, visualised: $$\vec{F} = m\vec{a}$$

3. 🧊 2D Sketch to 3D Conversion

Students draw a 2D sketch — Meshy AI converts it into a fully rendered 3D model that appears floating in their AR/MR environment in real time. Abstract concepts become tangible objects you can walk around.

4. 🧑‍🏫 3D Avatar Personal Tutor

A personalised AI tutor with a face and voice, fully aware of the student's surroundings via the passthrough API — making AI feel human, present, and immersive.


How we built it

Our stack was chosen for one reason: lowest possible latency on real XR hardware.

Layer Technology
Headset Meta Quest (Passthrough Camera API)
Core AI Brain Amazon Nova 2 Lite (Vision + Language)
Voice Input Whisper STT
Voice Output Neural TTS → AudioPlayer
3D Generation Meshy AI API (Image to 3D)
Engine Unity + Meta XR SDK
Language C# with coroutine-based async pipelines

The Nova Vision Pipeline

Every request follows this flow:

Microphone → STT → Transcription
                          ↓
Passthrough Camera → Base64 JPEG (512×512)
                          ↓
         Amazon Nova 2 Lite API
         ┌─────────────────────┐
         │  "role": "user"     │
         │  content: [         │
         │    { text: query }, │
         │    { image: frame } │
         │  ]                  │
         └─────────────────────┘
                          ↓
              Spoken AR Response (TTS)

The Nova API payload combines voice and vision in a single multimodal request:

{
  "model": "nova-2-lite-v1",
  "messages": [{
    "role": "user",
    "content": [
      { "type": "text", "text": "" },
      { "type": "image_url", "image_url": {
          "url": "data:image/jpeg;base64,"
      }}
    ]
  }],
  "max_tokens": 1024,
  "temperature": 0.7
}

The Meshy 3D Pipeline

Passthrough Camera captures student's sketch
             ↓
     Meshy Image-to-3D API
             ↓
    Poll for generation status
             ↓
  Download GLB / OBJ model file
             ↓
  Instantiate 3D model in AR scene
  at correct world-space position

Challenges we ran into

🔴 JSON Serialisation Silent Failures

Our early Nova requests would randomly return 400 Bad Request with no obvious cause. After debugging, we discovered that Whisper transcriptions containing characters like \n, \t, or \" were being injected raw into our hand-built JSON payload, silently corrupting it.

We replaced the naive .Replace("\"", "\\\"") with a full RFC 8259-compliant serialiser handling all control characters:

$$\text{For any } c < U+0020: \quad \texttt{\u{hex}}$$

🔴 Passthrough Camera Frame Timing

Capturing a stable frame from the Quest passthrough pipeline at the exact right moment required careful coroutine synchronisation. We had to:

  • Wait for the passthrough stream to be confirmed playing
  • Yield WaitForEndOfFrame() to ensure GPU readback was complete
  • Handle resolution-zero edge cases with graceful fallback textures

🔴 Nova max_tokens Cutoff

Early responses were being truncated mid-sentence because the default of 300 tokens is far too low for educational explanations. We raised this to 1024, giving responses room to breathe:

$$300 \text{ tokens} \approx 225 \text{ words} \quad \xrightarrow{\text{fix}} \quad 1024 \text{ tokens} \approx 768 \text{ words}$$

🔴 TTS + Video Synchronisation

When a student triggered video playback, the video would start while Hazel was still speaking — creating an audio clash. We built a configurable videoDelayAfterTts float so the video always waits for speech to complete before appearing.

🔴 Meshy Async Polling on Quest

Meshy's Image-to-3D API is asynchronous — you submit a job and poll for completion. Implementing a non-blocking polling coroutine that didn't stall Unity's main thread while waiting for 3D generation (which can take 15–60 seconds) required a dedicated coroutine architecture with timeout handling.

🟡 3D Model World Placement

Spawning Meshy-generated models at the correct position relative to the user's head pose required converting from camera-local space to Unity world space:

$$P_{\text{world}} = T_{\text{camera}} \cdot P_{\text{local}}$$

Getting this transform right so the 3D model appears naturally in front of the student — not behind them or at the origin — took significant iteration.


Accomplishments that we're proud of

  • 1 .Full end-to-end voice → vision → spoken response pipeline working on real Quest hardware using Amazon Nova — not a simulation, not a screen recording, a real device

    1. Nova sees the student's actual world through the passthrough camera — not a description of it, the live frame itself
    1. Sketch-to-3D in AR is something very few teams have ever demonstrated running live on a headset — watching a drawing become a 3D object you can walk around is genuinely magical
    1. Four AI-powered features unified under one coherent educational vision and one codebase
  • 5.Sub-2-second Nova response latency on-device for most queries, making the interaction feel genuinely conversational

  • 6.50+ language support out of the box via Nova — making NewSKOOL accessible to students globally, not just English speakers


What we learned

Amazon Nova is Remarkably Scene-Aware

Nova's multimodal understanding of real-world scenes exceeded our expectations. Even at $512 \times 512$ JPEG resolution — necessary to keep payload sizes small for on-device performance — Nova correctly identified handwritten text, printed diagrams, 3D objects, and spatial relationships between items in the scene.

XR Latency is Physical, Not Just Technical

On a screen, a 2-second API response feels acceptable. In a headset, it breaks immersion completely. Every millisecond we optimised mattered — reducing image size, compressing at 85% JPEG quality, and raising max_tokens only to what was needed were all meaningful on-device wins.

Error Handling Is the Product

The gap between a demo that works once and a product that works reliably is entirely made of error handling. Null camera buffers, API rate limits, malformed JSON, empty transcriptions — every one of these would silently break the experience without robust guards. We learned to treat every failure mode as a first-class feature.

A Clear Use Case Makes Everything Easier

Choosing education as our focus made every technical decision clearer. It told us exactly what to optimise (latency, accuracy, language coverage), what to cut (unnecessary UI), and how to frame every feature for maximum impact.

Multimodal AI Changes What Questions Students Can Ask

Traditional AI tutors answer questions about text. Hazel answers questions about reality. The difference in what students can ask — "what is this chemical formula on my worksheet?", "explain what I'm looking at in this diagram" — is qualitatively different and far more powerful.


What's next for NewSKOOL

🎯 Near Term

  • Amazon Nova 2 Sonic integration for fully real-time speech-to-speech conversation with Hazel — eliminating the STT → Nova → TTS chain entirely and achieving true conversational latency

  • Nova Web Grounding (nova_grounding system tool) so Hazel can search the web in real time — answers will always reflect current knowledge, not just training data

  • Curriculum Upload — students upload their syllabus PDF and Hazel answers only within their specific course context, preventing off-topic responses

🚀 Medium Term

  • Multi-subject video library — expanding beyond Newton's Second Law to AI-curated videos across:

    • Physics: $E = mc^2$, $\nabla \cdot \vec{E} = \frac{\rho}{\varepsilon_0}$
    • Chemistry: molecular bonding visualised in 3D
    • Mathematics: $\int_a^b f(x)\,dx$ rendered as AR graphs
    • Biology: cell structures as interactive 3D models
  • Multiplayer Study Rooms — multiple Quest users sharing the same AR space with a shared Hazel tutor, enabling collaborative AI-assisted learning

🌍 Long Term

  • School partnerships — deploying NewSKOOL in underfunded schools where a $500$ headset replaces thousands of dollars of lab equipment via AR simulation

  • Student progress tracking — Nova analyses what concepts a student struggles with over time and adapts Hazel's teaching style accordingly

$$\text{The future of education} = \text{Every student} \times \text{Personal AI tutor} \times \text{Their real world}$$

Let's revolutionise how the world learns. 🌍

Built With

  • 2
  • ai
  • amazon
  • amazon-nova-2-lite
  • amazon-nova-2-lite-50+-language-support
  • amazon-nova-2-lite-ar-environment-understanding
  • amazon-nova-2-lite-base64-image-input
  • amazon-nova-2-lite-context-aware-ai
  • amazon-nova-2-lite-educational-response-generation
  • amazon-nova-2-lite-image-understanding
  • amazon-nova-2-lite-json-chat-completions-api
  • amazon-nova-2-lite-low-latency-inference
  • amazon-nova-2-lite-multi-turn-instruction-following
  • amazon-nova-2-lite-multilingual-response
  • amazon-nova-2-lite-passthrough-frame-interpretation
  • amazon-nova-2-lite-real-time-scene-analysis
  • amazon-nova-2-lite-voice-+-vision-fusion
  • amazon-nova-api-openai-compatible-interface
  • amazon-nova-multimodal-vision
  • android
  • apimeshy
  • c#
  • json
  • lite
  • meta
  • mixed
  • nova
  • ovrsimplejson
  • quest
  • reality
  • unity
  • unitywebrequest
  • xr
Share this project:

Updates