GemComic

Inspiration

In a world dominated by passive consumption (scrolling feeds), active creativity is declining. This "Creativity Crisis" often stems from the intimidation of the blank page. We asked: "What if AI wasn't just a tool that does the work for you, but a partner that challenges you to be creative?"

That is the birth of GemComic. We wanted to bridge the gap between analog creativity (pen and paper) and digital storytelling, turning the solitary act of drawing into a collaborative game with Gemini 3.

What it does

GemComic is an interactive, turn-based comic creator. It creates an endless storytelling loop between the human user and the AI.

Draw: The user sketches a simple panel (a doodle, a stick figure, anything!) and uploads it.
Analyze & Narrate: Using Gemini 3’s Multimodal Vision, the app analyzes the visual elements of the sketch. It doesn't just describe them; it weaves them into a coherent narrative.
Challenge: Here is the twist—Gemini acts as the "Director." Based on the plot, it challenges the user to draw the next specific scene (e.g., "Great! Now draw that robot meeting a dragon!").
Loop: The user draws the next scene, and the cycle continues, building a unique comic strip dynamically.

How we built it

We utilized Google AI Studio for rapid prototyping and prompt engineering to define the "Co-Creator" persona. The core intelligence is powered by the Gemini 3 Pro Preview model.

Multimodal Vision: We leveraged Gemini 3's advanced vision capabilities to interpret abstract handwritten sketches. The model is surprisingly good at understanding context, intent, and emotion even from rough drawings.
Contextual Reasoning: To ensure the story makes sense across multiple panels, we implemented a logic where the "Previous Story Context" is fed back into the model for every new generation. This allows Gemini to maintain plot consistency and character memory (Reasoning).
System Instructions: We crafted specific system instructions to force the model to output two distinct parts: a narrative story segment and a creative drawing challenge.

Challenges we ran into

Context Retention: Initially, the AI would "forget" the plot after the second image, treating every upload as a new story. We solved this by rigorously passing the chat history back into the prompt context.
Balancing "Director" vs. "Narrator": Getting the prompt right was tricky. We needed Gemini to be descriptive enough to tell a story, but also prescriptive enough to guide the user on what to draw next without being too bossy.
Solo Development: As a solo participant in my first hackathon, learning to integrate the API and manage the project flow was a steep but rewarding learning curve.

Accomplishments that we're proud of

The "Magic" Loop: Successfully creating a working loop where the AI responds to a physical piece of paper. Seeing Gemini recognize a simple stick figure and turn it into an epic story moment felt magical.
Using Gemini 3: Successfully integrating the latest Gemini 3 Pro Preview model to drive the logic.
Creative Spark: Building an app that encourages users to pick up a pencil and draw again.

What we learned

Multimodal Capabilities: We learned that Gemini 3 sees more than just objects; it understands narrative potential in drawings.
The Power of Scaffolding: We validated the idea that AI can act as "scaffolding" for human creativity—helping users create things they couldn't create alone.

What's next for GemComic

Audio Narration: Integrating Text-to-Speech so the comic can be "read aloud" for accessibility.
PDF Export: Allowing users to download their finished collaborative comic as a digital book.
Genre Selection: Adding a feature where users can choose the "vibe" of the story (e.g., Horror, Sci-Fi, Fairy Tale) before starting the game.

Built With

computer-vision
css3
generative-ai
google-ai-studio
google-gemini-api
html5
react
typescript

Submitted to

Gemini 3 Hackathon

Created by

I worked as a solo developer for GemComic, handling the entire process from concept to code.

My primary focus was bridging the gap between the React/TypeScript frontend and the Gemini 3 API. I spent significant time in Google AI Studio refining the "Director Persona" prompts to ensure the AI could accurately analyze sketches and generate creative challenges.

This was my first time building a fully multimodal web application, and overcoming the challenge of maintaining narrative context across a visual story loop was a huge learning milestone for me.

Muhammad Rizky Fadillah

Updates

Muhammad Rizky Fadillah started this project — Feb 09, 2026 12:42 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.