Venture

The first AI-powered 2D visual RPG adventure game with a consistent and structured game world.

Inspiration

We were inspired by the timeless appeal of Dungeons & Dragons and the growing power of AI in storytelling. While many AI-driven games exist, none truly capture the immersive essence of adventure with visual components that keep players engaged. We wanted to bridge the gap between imagination and visual storytelling by creating a new frontier where AI narrates your journey while stunning visuals bring your actions to life.

What it does

Venture is the world’s first visually immersive text-based RPG, where every choice you make dynamically shapes the story and your destiny. Through a seamless blend of:

  • AI-driven narration: Brief, compelling storytelling in just a few sentences.
  • Visual storytelling: Generated character and environment visuals for every action.
  • Interactive animations: Bringing your decisions to life with real-time animations.
  • Speech-to-text: Speak your decisions instead of typing.
  • Text-to-speech: Let the game narrate the unfolding story to you.

How we built it

Venture was brought to life through a blend of cutting-edge technologies:

  • OpenAI GPT for intelligent, context-aware story generation.
  • DALL·E for dynamic visual asset generation.
  • rembg for background removal and ensuring visuals seamlessly integrate into the game.
  • React for building a fast and interactive front-end experience, ensuring smooth transitions and real-time animations.
  • FastAPI as the backbone of our game engine to handle interactions between AI-driven narratives, game logic, and visuals.
  • Framer Motion to power fluid, real-time animations that reflect player actions on a tactical grid.

Challenges we ran into

Creating Venture wasn’t without its hurdles. Here's what we faced:

  1. Maintaining Consistent State Across AI Interactions:

    • Problem: As the AI-driven story unfolds, keeping the state of the game consistent was difficult. Early on, the AI would sometimes "forget" important details, such as where characters were positioned or previously made decisions.
    • Solution: We built a custom game state management system that logs every critical decision and uses this to guide the AI’s responses. This ensures continuity in both narrative and gameplay.
  2. Concurrent Image Generation:

    • Problem: Generating multiple character and background images asynchronously through the Hugging Face and OpenAI APIs caused bottlenecks, with visual assets taking too long to load, making gameplay slow.
    • Solution: We integrated Python’s asyncio to handle image requests concurrently, allowing us to queue multiple generation requests while ensuring that other parts of the game, like animations and state updates, continue without delays.
  3. Speech-to-text and Text-to-speech Processing:

    • Problem: The latency in converting player voice commands to text using speech-to-text services sometimes introduced delays, leading to awkward pauses in the game.
    • Solution: We optimized this by buffering input commands and pre-processing voice commands locally before sending them to the server. For text-to-speech, we used a streamlined model that can quickly convert responses with minimal lag, making interactions more fluid.
  4. Accurate Visual Representation:

    • Problem: Sometimes the images generated by AI were not exactly aligned with the player’s actions or descriptions. For instance, a character might not look as intended, or a background might be out of context.
    • Solution: We fine-tuned the prompts sent to the AI and added fallback mechanisms (e.g., stock images) to ensure that visuals are always relevant to the described action. We also introduced rembg to remove unwanted backgrounds, providing the characters fit seamlessly into the game's environment.
  5. Synchronizing AI Decisions with Visuals and Animations:

    • Problem: Timing the AI's narrative with the visual and animation elements was tricky. Initially, animations lagged behind the story, breaking immersion.
    • Solution: We built a system that triggers animations when the AI completes its decision-making process, ensuring that visuals and stories are presented in real-time without disrupting the flow.

Accomplishments that we're proud of

  • We successfully created the first-ever AI-driven RPG with real-time visual feedback based on player decisions.
  • Built an architecture capable of handling complex concurrent requests and visual generation, all while maintaining a fluid gameplay experience.
  • Introduced speech-to-text and text-to-speech for a hands-free, immersive experience, making the game accessible to a broader audience.
  • Achieved immersion through brevity by narrating compelling story elements in just two sentences per action, without losing depth or engagement.
  • We created an experience that merges text, visuals, and animations, something that had never been done in a text-based RPG to this level of integration.

What we learned

  • One of the biggest lessons we learned was that AI understands better when you describe what you want, not what you don’t want. Instead of saying "Don’t add a tree," we found it more effective to specify, "Show a barren landscape without vegetation." This made our image generation much more accurate and aligned with our vision.
  • In an AI-driven narrative, especially in a long-running RPG, managing state became essential. We learned that AI models can easily lose track of past events without a well-designed state management system. Implementing a structure where every event, decision, and character interaction is logged helped maintain narrative continuity.
  • While generating visuals, we discovered that AI models often performed better when prompts were explicit about the scene and composition. Instead of saying "a forest with no animals," it was better to say "a dense forest with towering trees and sunlight breaking through, focused on the landscape." This subtle shift in prompt crafting led to more consistent and visually accurate results.

What's next for Venture

  • Multiplayer functionality: Enabling collaborative storytelling where players can team up to explore dungeons, fight enemies, and craft their epic tales together.
  • Map-based location system: More solid game state management with locations that have entities in them and more structured logic about traveling between locations.
  • Low-res image generation: Train our own low-res image generation model for much speedier venture responses

Built With

Share this project:

Updates