Unknown World

Inspiration

Most AI apps consist of a single prompt and a chat UI. We took a different direction — instead of having AI just generate text, we built a structure where it tracks world state, manages rules, runs an economy, and composes clickable scenes as a Game Master. The roguelike genre's "a different world every time" property, combined with Gemini 3's structured outputs, image generation, and vision analysis, made it feasible to build a web-based narrative game with infinite replayability.

What it does

Unknown World is a roguelike narrative web game that uses Gemini 3 as its Game Master engine.

Agent-driven Game Master: Each turn, Gemini 3 Pro returns narrative, UI choices, state changes, and costs in a single JSON Schema-enforced structured output. Outputs are dual-validated by Pydantic (server) and Zod (client), with an automatic repair loop (up to 2 retries) on failure.
Agentic Vision: Scene images are re-analyzed by Gemini 3 Flash + Code Execution to detect objects within the image as bounding boxes (0–1000 coordinates). Detected objects become clickable hotspots on the Scene Canvas — interaction targets grounded in vision evidence, not text hallucination.
Multimodal Scene Generation: Gemini 3 Pro Image generates scene artwork matching the narrative. Text and state panels are delivered first; images load asynchronously (lazy loading) to reduce perceived latency.
Scanner (Photo → Item): Users upload real-world photos, and Gemini 3 Flash's vision analysis converts them into captions, detected objects, and in-game item candidates.
NDJSON Turn Streaming: After generating the full TurnOutput, the server streams pipeline stages (Parse→Validate→Plan→Resolve→Render→Verify→Commit) and validation badges (Schema/Economy/Safety/Consistency OK) as NDJSON events. Narrative text is chunked for a typewriter effect.
Interactive Game UI: Action Deck (action cards with cost/risk), Inventory (dnd-kit drag & drop to use items on scene objects), Scene Canvas (clickable hotspots), and Economy HUD (Signal/Shard balance, estimated costs, transaction ledger) are always visible in a fixed layout.
Economy System: Signal/Shard currencies manage action costs. Estimated costs are shown before each action, and alternatives are suggested when balance is insufficient. All transactions are recorded in a ledger.

How we built it

Frontend: React 19 + Vite 7 + TypeScript. CRT retro theme via CSS variables in a fixed game layout. Zustand for WorldState/Inventory/Economy state management, Zod for server response validation, dnd-kit for inventory drag & drop, i18next for Korean/English switching.

Backend: FastAPI (Python 3.14) async orchestrator. NDJSON-based HTTP Streaming (Fetch + POST) delivers turn results as step-by-step events. Pydantic models enforce TurnOutput schema. On validation failure, a repair loop (up to 2 attempts) auto-retries. On persistent failure, a safe fallback (text-only TurnOutput) is returned.

Gemini 3 Integration — Four Pillars:

Text (Game Master): gemini-3-pro-preview (primary) / gemini-3-flash-preview (fallback). response_mime_type: application/json + response_schema for JSON Schema enforcement.
Image Generation: gemini-3-pro-image-preview (primary) / gemini-2.5-flash-image (low-latency). Reference image support for visual continuity.
Agentic Vision: gemini-3-flash-preview + Code Execution. Re-analyzes generated scene images to detect object bounding boxes → converts to hotspots. Filtered to 1–3 per scene with priority ranking and overlap removal.
Scanner (Image Understanding): gemini-3-flash-preview. Analyzes uploaded photos into captions, detected objects, and item candidates.

Fallback Strategy: On API errors (429, 5xx), automatic Pro→Flash model switching with exponential backoff (2s→4s→8s). Mock mode fallback on GenAI client initialization failure.

Challenges we ran into

LLM Output Instability: Responses occasionally break the schema even with JSON Schema enforcement. We addressed this with Pydantic+Zod dual validation, a repair loop (up to 2 retries), and a safe text-only fallback — three layers of defense.
Image Generation Latency: Scene image generation takes 10–20 seconds. We deliver text and state panels first, then load images asynchronously with CRT-themed loading animations to reduce perceived wait time.
Being Mistaken for a Chat App: AI apps are easily perceived as chat wrappers. We removed chat bubbles entirely and kept Action Deck, Inventory, Scene Canvas, and Agent Console permanently visible in a fixed game layout.
Economy Balance: We iteratively tuned Signal earn/spend ratios to prevent resource depletion within a 10-minute demo loop.
Rate Limiting: We implemented automatic Pro→Flash fallback model switching and a countdown-based retry UI for Gemini API 429 errors.

Accomplishments that we're proud of

A Stateful Game System: WorldState, Economy, Inventory, and Repair loops form a structure where state accumulates and is validated — not an app that ends with a single prompt.
Agentic Vision Pipeline: The AI re-analyzes its own generated images to create clickable hotspots. Interactions are grounded in vision evidence rather than text hallucination.
Agent Action Visibility: Seven pipeline stages and four validation badges are displayed in real-time through the Agent Console — showing system behavior without exposing internal reasoning or prompts.
Scanner: Upload a real-world photo, and vision analysis converts it into in-game items — a multimodal interaction loop.
138 Work Units Completed: From planning through implementation, 138 work units completed at 100%.

What we learned

Structured Outputs Are the Key: Enforcing JSON Schema turns AI output into parseable game data. This is the fundamental difference between a chat wrapper and a game system.
Validation and Repair Must Be Designed from Day One: LLM output is inherently unstable. Without automatic repair and safe fallbacks built into the initial design, production breaks.
Latency Can Be Absorbed by UX: Image generation latency is hard to reduce technically, but step-by-step streaming and loading animations significantly reduce perceived wait time.
AI Costs Can Become Game Mechanics: Converting API costs into in-game currency (Signal/Shard) instead of hiding them turns cost management into part of the gameplay.

What's next for Unknown World

Save/Load Redesign: Server-side storage for persistent progress
Expanded Agentic Vision: Applying vision grounding beyond hotspots to Action Deck card generation
Multiplayer Co-op: Multiple players exploring the same world together
BGM/SFX Generation: AI-powered background music and sound effects
Mobile Optimization: Touch interactions and responsive layouts

Built With

dnd-kit
docker
fastapi
gemini-3
gemini-agentic-vision
google-genai-sdk
i18next
pydantic
python
react
typescript
vite
zod
zustand

Updates

yachaboom JIN started this project — Feb 09, 2026 07:58 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.