Liminal WebXR OS - KeanUHackThis 2026 Submission
Between dreams and reality, there is a threshold.
This document answers every prompt on the Devpost-style submission form for Liminal XR OS, a browser-native WebXR spatial-computing platform built during KeanUHackThis 2026.
Project Story
About the project
Inspiration
We were tired of WebXR being trapped behind $500+ headsets and proprietary stores. Spatial computing should not require spatial hardware. We wanted to prove that a phone, a laptop, or any device with a webcam can already deliver immersive, hand-tracked, AI-augmented experiences - today, with no install, no native app, and no special headset.
Liminal WebXR OS is our answer: an entire 3D operating-system-style environment that lives in the browser, where you literally pinch the air to navigate the solar system, talk out loud to a Gemma-powered assistant, and learn ASL or solve a Rubik's cube with your bare hands.
What it does
Liminal XR OS is a single-page WebXR app that boots into a full 3D space and lets you:
- Explore an interactive Solar System with NASA-derived 8K textures, accurate orbits, atmospheres, rings, and a black hole, all powered by Three.js / React Three Fiber.
- Control everything with your hands via MediaPipe - pinch, point, and dwell-to-click on holographic UI elements; no mouse required.
- Talk to "Liminal," a Gemma 4-powered AI assistant that knows what you're looking at and streams responses through ElevenLabs for natural, low-latency voice replies. There's a text mode, a push-to-talk Voice Mode, and a fully streaming Voice Live mode.
- Visualize ideas as topologies of thought in our Mind Graph - a force-directed knowledge graph you rotate with a wrist twist and reshape with a two-handed pinch.
- Play hand-tracked mini-games - Fruit Ninja, Ball Catching, Paper Plane, Kite Flying, Rocket Launch.
- Learn through interactive modules - Sign Language Teacher (real-time ASL letter detection) and Rubik's Cube (a fully-simulated 3×3 cube you orbit with pinch+drag and twist via dwell-clicked face buttons).
- Sign in with Auth0 so the experience is gated and personalized.
How we built it
| Layer | Tech |
|---|---|
| Rendering | Three.js, @react-three/fiber, @react-three/drei, @react-three/postprocessing, @react-three/xr |
| Hand tracking | @mediapipe/tasks-vision (HandLandmarker, GPU delegate, 2-hand mode, runs entirely client-side) |
| Generative AI | Google Gemma 4 (gemma-4-31b-it) via the @google/genai SDK, with streaming responses |
| Voice | ElevenLabs eleven_turbo_v2_5 text-to-speech with sentence-boundary streaming for ~600 ms first-audio latency |
| Auth | Auth0 (@auth0/auth0-react) |
| State | Zustand for global state, React 18 for UI |
| Styling | Tailwind CSS, Framer Motion for transitions |
| Build / Deploy | Vite 5, TypeScript, Vercel |
| Knowledge Graph | d3-force-3d, custom topology layouts (centralized / decentralized / distributed) |
The hand tracker is the spine of the system: a single global useHandTracker hook reads from MediaPipe each frame, computes pinch / two-hand-pinch / dwell, hit-tests against any DOM element marked data-interactive-id, and dispatches "click" events when the user dwells for ~600 ms. Every UI surface - the planet dock, the AI button, game controls, our Rubik's-cube face buttons - opts in by adding that one attribute, which makes accessibility and consistency basically free.
The AI assistant injects live spatial context into every prompt ("the user is currently looking at Saturn, scale 1.4x, cross-section disabled…"), so Gemma's answers feel grounded in the scene instead of being a generic chatbot.
Challenges we ran into
- Gesture disambiguation. Pinch+drag wanted to mean three different things (orbit a planet, orbit a graph, orbit our Rubik's cube). We solved it with a "hovered element" gate: gestures only commit when the finger is not over a clickable surface, and per-feature subscribers in the global store take priority.
- Streaming TTS without choppiness. We had to chunk Gemma's stream at sentence boundaries, kick off ElevenLabs as soon as the first sentence finishes, and queue subsequent chunks so audio plays continuously while text is still generating.
- Solvable, animated 3D Rubik's cube. Implementing layer rotation with quaternion baking, integer-grid snapping, and identity-orientation solved-detection - entirely from scratch - was a satisfying weekend rabbit hole.
- MediaPipe + WebGL contention. Both want the GPU. We instantiate one shared
HandLandmarkerfor the whole app and let games re-use it via the existing<video>element instead of spinning up a second tracker. - Performance. 27 cubies + a particle starfield + post-processing + a graph + live MediaPipe inference is a lot. We lazy-load every heavy module, cache hit-test rectangles for 200 ms instead of recomputing each frame, and clamp post-processing intensity on lower-DPI devices.
What we learned
- WebXR is closer than people think. With MediaPipe and React Three Fiber, you can ship a credible spatial OS in a weekend.
- Gemma is fantastic for grounded, low-latency voice agents. The 31B-IT model felt fast, on-task, and easy to constrain with a system prompt that included real-time scene state.
- Dwell-to-click is the great unlock for hand-tracked UIs. Pinch alone is fatiguing and ambiguous; a 600 ms hover-to-confirm with a visible progress arc is intuitive on first try.
- One attribute (
data-interactive-id) > a thousand props. Letting any element opt into hand-tracking by adding one DOM attribute kept the codebase clean and made adding new modules (like the Rubik's Cube) trivial.
Built With
- d3-force-3d
- elevenlabs
- framer-motion
- mongodb
- snowflake
- tailwindcss
- typescript
- vercel
- vite-plugin-glsl
- vitechlab
- web-audio-api
- webgl
- webrtc-/-getusermedia
Log in or sign up for Devpost to join the conversation.