Lexigo (Let's go!) is a small voice, tile-based game that turns spoken commands into in-game movement and language practice. Speak to move your avatar, practice words across languages, and watch A* pathfinding carry out the commands.

Inspiration

We wanted to make vocabulary practice feel playful and immediate. Instead of flashcards or quizzes, Lexigo lets you speak to a tiny game world with a certain language and see your words and commands move the character. It started as an experiment to combine real-time browser audio capture with a retro tile game so learning vocabulary becomes more active and fun.

What it does

This project is a tiny 2D world: a 10×10 tile-based game built with Excalibur.js where a little avatar can move around trees, avoid obstacles, and navigate the map. Instead of clicking or typing, you control the character with your voice. The browser captures your microphone audio, turns it into a WAV stream, sends it to ElevenLabs’ speech-to-text API, and the transcription is parsed into simple movement commands like “go to the middle” or “walk right 3 steps.” The game then uses an A* pathfinding system to compute a safe, obstacle-aware route through the grid, so speech turns directly into traversal and even gives voice feedback if you try to speech in a non-chosen language.

How we built it

Frameworks & tooling

  • React + TypeScript
  • Vite for fast dev builds
  • Excalibur.js for the game engine and tilemap rendering
  • ElevenLabs for speech-to-text and also text-to-speech
  • GeminiAPI to understand user speech and converting to an action.

Challenges we ran into

We originally wanted to build unlimited, procedurally generated terrain, but that idea blew up pretty fast in the time we had, so we settled on a tight fixed-size world and focused on polish. We also tried doing automatic language classification for voice input, but the results were inconsistent and noisy, so we eventually locked the app to a fixed language to keep things stable. On the technical side, browser audio turned out to be way stricter than expected: STT services only accept perfectly structured WAV files, which meant we had to convert raw Float32 buffers into clean 16-bit PCM and generate valid headers or the transcription would silently fail. Another struggle was making coordinate commands intuitive. Some people say “five three,” some say “row five column three,” and others say “top right-ish,” so we had to build a flexible interpretation layer that didn’t accidentally invert axes or flip the grid.

Accomplishments that we're proud of

We’re really proud that we managed to learn the ElevenLabs API from scratch and hook it up cleanly with the Gemini API. Getting real speech to control a game felt like magic and it was really fun that it worked with different languages. We also pulled off our first real project with Excalibur.js, from tilemaps to collisions to a working A* system, which was a huge win given how little experience we started with.

What we learned

We also got a much deeper understanding of how voice commands flow through a full stack, learning to handle audio was a new learning experience and capturing audio in the browser, cleaning it, sending it to ElevenLabs, and interpreting the transcript in a way that feels natural.

Working with Excalibur.js taught us how interesting 2D game world generation are, tilemaps, and pathfinding actually fit together, and we learned how to manage all of that inside a React app without everything tearing itself apart. And maybe the biggest lesson: sometimes ambitious features like unlimited procedural terrain or automatic language detection sound cool, but scoping down is what lets the core experience shine.

What's next for Lexigo

We want to make the project better by improving how speech works, like showing confidence scores, adding timestamps, and giving short voice confirmations. We also want to support more languages, maybe even detect the language automatically, and let users choose faster or more accurate models. We’d also like to add basic tests and a small CI setup to catch bugs early. For gameplay and learning, we want simple practice features like “say this word”, while collecting an item, add scoring and saved progress, and place items or collectibles on the map that players can pick up with their voice. And someday, we’d love to try bringing back procedural generation so the world can grow automatically beyond its small grid.

Built With

Share this project:

Updates