Spellcraft MX

Inspiration

Spellcraft MX was inspired by two very different worlds colliding. While working on medical XR simulations, I experienced how pen-shaped input devices like the Logitech MX Ink enable far more natural pointing, drawing, and spatial interaction than traditional VR controllers. At the same time, I've long been fascinated by fantasy stories where deliberate hand movements and a wand could produce powerful, intentional effects.

That contrast sparked a question: what if spatial input in XR could feel expressive and meaningful, rather than button-driven and abstract? Spellcraft MX explores this by reimagining MX Ink as a wand, combining precise spatial tracking, hand-drawn gesture recognition, and voice commands to create interactions that feel closer to how people intuitively imagine casting a spell, across both AR and VR.

What it does

Spellcraft MX is a cross-reality spell-casting game built around the Logitech MX Ink as a wand. Players have three ways to cast: draw a gesture in the air, speak a spell name as a voice command, or tap a button to fire a ball. All three modalities work together and complement each other.

Eight gesture shapes map to eight spells: ignite, freeze, levitate, push, pull, stun, reveal, and unlock, each with distinct behaviour on targets. Ignite wraps an object in fire. Freeze swaps its materials to an ice shader. Levitate smoothly lifts it into a bobbing float. Push and pull physically displace objects and return them. Stun and unlock trigger object-specific reactions like animations and knockback. A fire bolt can be shot at any time as a quick attack, with smart targeting that tracks to a locked enemy or travels freely if nothing is in range.

A cone-based targeting system continuously scans the environment, scoring candidates by angle and distance, and applies a Fresnel highlight to the best target so the player always knows what they're aiming at before committing to a cast.

Spider enemies patrol waypoints across walls, floors, and ceilings, and each spell affects them differently, a fireball kills and respawns them, freeze stops them mid-stride, stun sends them into a dizzy spin, and levitate lifts them helplessly off the surface.

The same interaction model works across both AR and VR. In AR mode, the real environment becomes the play space with depth-based occlusion, integrating the wand with the player's real hand. A 3D printed wand cover for the MX Ink further strengthens the physical illusion. Switching between modes is a single toggle in the settings panel.

How we built it

Spellcraft MX is built in Unity using OpenXR on Meta Quest 3. The Logitech MX Ink is the primary input device, with a Quest Touch controller as a fallback.

Gesture recognition uses a DTW + k-NN classifier (k=5) trained on templates recorded directly in the headset and stored as JSON. At runtime, stroke points are sampled from the wand tip, projected onto a camera-aligned plane to remove depth ambiguity, resampled to 64 points, and matched against the template library. Directional overrides handle cases like clockwise vs anticlockwise circles and vertical swipe direction. A confidence margin threshold prevents ambiguous gestures from firing.

Voice input is powered by OpenAI Whisper via an assistant API, recording from the device microphone while the trigger is held. The assistant is instructed to return only exact spell labels, and the result passes through a vocabulary gate and length filter before dispatching. A friendly name map lets players say "ignite" or "freeze" naturally rather than technical gesture labels.

The spell system is built around a static event bus that fully decouples input from effect. Each interactable object handles its own spell response and resets cleanly on a shared session timer, allowing spells to stack logically without state conflicts.

AR particle effects use a custom URP additive shader with a split blend equation that preserves the passthrough compositor's alpha channel, preventing the black halo artefact that standard additive shaders produce in passthrough mode.

Challenges we ran into

Designing gestures that feel natural yet are recognised reliably was the central challenge. Early versions of the DTW classifier confused clockwise and anticlockwise circles, or misread spirals as squiggles. Tuning the normalisation pipeline, adding a camera-plane projection step with a forward-vector fallback for straight strokes, and dialling in the score and margin thresholds brought recognition to a level players can trust.

Integrating voice input introduced its own complexity. Whisper occasionally hallucinates short common words when given silence or ambient noise. Filtering by minimum clip duration and requiring spell vocabulary presence in the result removed most false positives before they reached the game logic.

Keeping the interaction model consistent across both AR and VR, with two input devices, two rendering contexts, and three input modalities, required careful architectural decisions to prevent mode-specific code from proliferating through the codebase.

AR transparency was unexpectedly subtle. Standard additive particle shaders write source alpha into the framebuffer, which the passthrough compositor interprets as occlusion, producing black boxes around fire and explosion VFX. Splitting the blend equation to leave the destination alpha untouched resolved this entirely.

Enemy AI on curved surfaces (walls, ceilings) required per-frame surface raycasting to keep spiders correctly oriented and moving regardless of which face they were on.

Accomplishments that we're proud of

A unified multimodal input system, gesture, voice, and button, running identically across both AR and VR
A robust DTW + k-NN gesture classifier with confidence gating, trained directly in the headset
Eight distinct spells with unique visual effects, physics behaviour, and clean reset logic
Spider enemies that respond meaningfully and differently to every spell type
A dual-mode fire bolt system with smart target tracking and untargeted free-flight fallback
Correct additive transparency in AR passthrough via a shader blend equation fix
A 3D printable wand cover for MX Ink that makes the physical–digital connection tangible
A complete in-game UI suite: start screen, settings panel, gesture reference card, and contextual hints, all designed to work in both AR and VR space
Reaching the semi-finals of the DevStudio Challenge 2026 by Logitech

What we learned

This project reinforced that interaction design should follow physical affordances rather than retrofit controller paradigms into new contexts. The MX Ink's pen form factor affords precision spatial drawing in a way a thumbstick never could, but only if the recognition system is reliable enough that players trust it. Perceived reliability matters as much as measured accuracy.

Combining gesture, voice, and button input meaningfully is harder than stacking three systems. Each modality has different latency, failure modes, and cognitive load. Making them feel complementary rather than redundant required deliberate UX decisions about when each one is the natural choice for a given action.

We also found that small infrastructure details, input gating before the game starts, a shared spell reset timer, and material state restoration after effects have an outsized impact on how polished and trustworthy the whole system feels in practice.

What's next for Spellcraft MX

The core interaction system is stable, and the spell mechanics are fully functional. The next focus is on the world and content.

On the interaction side, we want to expand the gesture vocabulary, explore compound spells triggered by sequential gestures, and refine voice confidence thresholds with real-world usage data from a broader range of players and environments.

On the content side, we're designing a more complete environment with indoor and outdoor spaces, interactive environmental puzzles, and a larger variety of enemies, each with meaningful spell weaknesses to better showcase the depth of the interaction system.

Multiplayer is a strong long-term goal. Cooperative and competitive spell casting introduces entirely new design questions around how gesture and voice input behave under social pressure and latency.

Ultimately, Spellcraft MX is a proof of concept for a broader idea: that expressive, multimodal spatial input grounded in a physical device with real affordances can produce interactions in XR that feel genuinely intentional rather than mechanical.