inspiration We wanted something that felt less like grinding flashcards and more like actually being somewhere ordering food, wandering campus, talking out loud instead of tapping multiple choice. Games already teach through immersion; we figured language learning could borrow that same trick.
What it does LangLand is a browser game where you pick a language, explore a little 3D world, and speak to NPCs. There’s a campus map with spots you can jump into for example a restaurant-style scenario (Brass Tap) where you order and chat with a waiter, and a relaxed outdoor spot (Johnston Green) for casual conversation only. While you talk, the app shows live “next word” suggestions, routes your speech through a chat bot that stays in character, and can read replies aloud. It’s all tied together with a simple UI: hold-to-talk, bubbles for you and the NPC, and optional English subtitles.
How we built it Frontend is React + Vite, with Three.js loading GLB scenes and orbit controls for the camera. The browser handles speech recognition; we send bursts to a small FastAPI backend that calls Gemini for next-word predictions and NPC replies. ElevenLabs handles voice output when it’s available, with the browser’s speech synthesis as backup. Content and flows are mostly driven by screen logic and scenario strings rather than a giant CMS hackathon-friendly and easy to tweak.
Challenges we ran into Getting speech + API calls + 3D to feel stable in one tab took iteration mic permissions, burst timing, and not flooding the model on every partial transcript. On the backend, Gemini quotas bit us more than once (wrong model name or hitting free-tier limits looks like “everything returns ...”). Lighting GLB scenes so they weren’t pitch black or blown out took tuning. And Windows dev quirks multiple uvicorn processes on the same port, .env only loading at process start reminded us to restart the API whenever keys or env vars change.
Accomplishments that we're proud of We’re proud of the end-to-end loop: speak → predictions update → NPC answers in the target language → hear it back, inside an actual 3D space. The two vibes (structured restaurant vs chill campus hangout) feel distinct without rebuilding the whole app. And we’re happy we pushed lots of languages through one pipeline instead of hard-coding one locale.
What we learned Environment and process matter as much as features one stale server can make every AI call look broken. JSON contracts between frontend and backend save pain when you iterate fast. And tone-mapping + lights are half the “graphics” when you’re not a AAA studio; a little renderer tuning goes a long way on imported models.
What's next for Lang Land More locations and scenarios, tighter feedback on pronunciation, maybe a lightweight progress layer so players see streaks or vocab growth. We’d love smoother character animation in-scene, richer NPC memory across sessions, and optional multiplayer or async “leave a message” practice. Longer term: ship it as a proper PWA and tighten accessibility (keyboard paths, clearer error states when the API is down).
You can tighten or hype any section for your time limit. If you want this to match the README’s AWS/Cognito storyline exactly, say so and we can align the wording (I’m in Ask mode so I can’t edit files directly).
Log in or sign up for Devpost to join the conversation.