Inspiration
Our inspiration for this project started from observing a persistent challenge in online games: effectively filtering player-generated content. Many games struggle to strike the right balance, often missing genuinely inappropriate messages while simultaneously flagging harmless communication. For example, when I played Adopt Me, any long texts I tried to send would be flagged, not allowing me to have full conversations with other players.
In addition, we envisioned the potential of interactive conversations with NPC that have their own personas within games. Imagine a quest giver who can dynamically respond to player questions, offering nuanced guidance beyond pre-scripted dialogue. Or an engaging tutorial NPC capable of answering specific queries about game mechanics. The possibilities for enriching gameplay through intelligent, conversational agents seemed immense.
Our goal was empowering game creators with tools for dynamic NPC dialogue and interactive tutorials, powered by AI with safety enhancements. We built a similar and simpler system, a standalone project demonstrating the capabilities of Google's Gemini API to create interactive conversations as objects with personas.
What it does
This project demonstrates a basic framework for creating AI-powered Non-Player Characters (NPCs) within a game environment using Pygame and Google's Gemini API. It allows players to:
- Interact with different NPCs: Each NPC (represented by an image) has a distinct "personality" derived from its object type (e.g., a chair, a bookshelf, a TV).
- Engage in dynamic conversations: Player input is sent to the Gemini language model, which generates context-aware responses from the perspective of the interacting NPC.
- Incorporate accurate safety measures: User input is checked by AI for potentially inappropriate content with a safety measure with 14 criteria before a response is formed.
- Experience a typing effect: NPC responses are displayed character by character, creating a more natural and engaging interaction.
The setup is designed to be adaptable for game creators who want to implement fully interactive NPCs for various purposes, such as quest givers with dynamic dialogue, interactive tutorials capable of answering player questions, or engaging chat companions.
How we built it
This project was built using the following core components:
- Google Gemini API: This served as the natural language processing engine. We utilized the
google.generativeailibrary in Python to send prompts to thegemini-1.5-flashmodel and receive text-based responses for the NPCs. Prompt engineering was used to guide the model's responses based on the NPC's identity and the conversation history. It is also used for safety evaluation of user inputs. - Pygame: This Python game development library provided the framework for creating the visual interface. We used it to:
- Set up the game window and display.
- Load and render images for the player and NPCs.
- Handle keyboard input for player movement and chat interaction.
- Render text for the chat history and input field using Pygame fonts and text wrapping.
- Implement the character-by-character typing animation using
pygame.time.Clock()and string slicing. - Manage the chat interface state, including entering and exiting chat mode and handling scrolling (with ongoing refinement).
- Python: Python served as the primary programming language, tying together the Gemini API and Pygame functionalities. Data structures like dictionaries were used to manage NPC properties (image, position, conversation context).
- MongoDB: MongoDB is used to save text histories. It stores data related to the conversations between the player and the NPCs, keeping track of dialogue for future use and providing persistent storage for text-based interactions.
The development process involved setting up each component, implementing the game loop, building the chat interface and its functionalities (input, display, scrolling), and integrating the Gemini API for dynamic responses with an accurate safety check.
Challenges we ran into
We encountered several challenges during the development of this project:
- Consistent and Contextual AI Responses: Ensuring the Gemini model consistently provided relevant, in-character responses that flowed naturally within the conversation required careful prompt engineering and management of the conversation context.
- Balancing Performance and Responsiveness: The API calls to the Gemini model introduce a slight delay. We aimed to mitigate this with the typing effect, but ensuring the game remained responsive while waiting for AI responses was a consideration.
- Implementing Chat Scrolling: Achieving smooth and reliable scrolling of the chat window, especially during the AI's text generation, proved to be a significant hurdle. Issues involved accurately calculating visible lines, managing the
scroll_offset, and ensuring it worked seamlessly with both automatic scrolling during typing and manual scrolling via arrow keys. - Preventing Redundant NPC Naming: Initially, the NPC's name would sometimes appear multiple times in their responses. This was addressed by refining the prompts to instruct the model to focus solely on the dialogue content.
Accomplishments that we're proud of
Despite the challenges, we achieved several key milestones:
- Successful Integration of Gemini API: We successfully integrated a powerful LLM into a Pygame environment to enable dynamic NPC conversations.
- Functional Interactive Chat System: We created a basic yet functional chat interface that allows players to communicate with AI-powered NPCs.
- Accurate Safety Mechanism: We implemented an accurate safety check on user input using Gemini with strict and detailed safety guidelines.
- Adaptable Framework: The project provides a foundational structure that game creators can build upon to create more complex and engaging AI-driven NPC interactions.
- Implementation of a Typing Effect: The character-by-character typing animation enhances the user experience and provides a visual cue for the AI's response generation.
What we learned
Through this project, we gained valuable insights into:
- Practical Application of LLMs in Games: We learned firsthand the potential of integrating large language models to create more dynamic and interactive game experiences.
- Game UI/UX Considerations for AI Interactions: We gained experience in designing a basic chat interface and considering how to present AI-generated content in an engaging way.
- The Iterative Nature of Game Development: The challenges with scrolling and prompt engineering highlighted the importance of iterative development, testing, and refining features.
- The Critical Role of Safety in User-Generated Content: The need for robust safety measures when using LLMs for user-facing applications became very apparent.
- The Interplay of Different Technologies: We learned how to effectively combine the capabilities of a game development library (Pygame) with a cloud-based AI service (Gemini API).
What's next
Future development for this project and the broader concept of AI-powered NPCs with built-in safety could include:
- More Nuanced NPC Personalities: Developing more detailed and consistent NPC personalities through improved prompt engineering, potentially incorporating memory of past interactions and external knowledge.
- Advanced Dialogue Management: Implementing branching dialogue trees that are dynamically influenced by AI responses, creating richer and more complex conversations.
- Integration with Game Mechanics: Connecting NPC dialogue and actions to core game mechanics, such as quest progression, item interactions, and world events.
- Voice Integration: Exploring the use of Text-to-Speech APIs to give AI NPCs audible voices, further enhancing immersion.
- Facial Animation and Expressiveness: If integrated into a more advanced game engine, exploring ways to link AI dialogue to NPC facial animations and body language.
- Creator Tools and Customization: Developing user-friendly tools that allow game creators to easily define NPC personalities, behaviors, and safety parameters.
- Performance Optimization: Investigating ways to optimize the interaction with the LLM to reduce latency and improve the real-time feel of conversations.
- Exploring Different LLMs: Experimenting with other large language models to compare their performance, safety features, and suitability for different types of NPCs.
Log in or sign up for Devpost to join the conversation.