Architecture overview
Fact-check results
A segmented and illustrated text
An educational text illustration
An educational text illustration
A fictional text

Turn any text into an illustrated, narrated story (chapters + images + audio) in minutes. Powered end-to-end by Gemini 3.

Who is this for?

Educators turning lessons into multimedia material
Content creators converting articles into engaging formats
Companies transforming documentation into training
Media teams producing narrated visual stories at scale

The magic moment

Paste text and get chapters with titles, one image per chapter, and ready-to-play narration — returned as structured markup your app can render immediately.

Why it wins

Fast: 5-step automated pipeline from text to story
Visual: Gemini-generated images per segment
Auditory: Full narration via Gemini TTS
API-first & modular: Built as an integration platform — embed story generation into existing products and workflows.

The Great Stories is an API-first multi-agent service that turns plain text into rich, segmented content: each logical segment gets its own image and spoken narration. It's built for the Gemini 3 API and uses it across the whole pipeline.

Try it now, it's live!

The Great Stories is a working platform you can use today — the links below are real generated outputs:

Why Gemini 3?

Gemini 3 is especially strong at generating structured outputs (schemas, tables, explanations) and working with multimodal inputs. That combination makes reliable segmentation + asset generation possible — including stories based on pictures or infographics. Without that this project was impossible.

Text understanding and structuring

We use Gemini to segment long text into a requested number of parts (e.g. 3 or 5), with no gaps or overlaps and a short title per segment. A single structured Gemini call returns JSON with character ranges and titles. Content type (educational, financial, fictional) is passed in so segmentation and tone stay appropriate. Segments are cached at the boundary level for faster reprocessing of similar text.

Narration and image prompts

Per segment, Gemini generates a narration script (conversational or podcast-style, with disclaimers for financial content) and a detailed image prompt. Style is tuned by content type: educational = clear and diagram-like, financial = restrained, fictional = more creative and cinematic.

Multimodal generation

We use Gemini 3 for image generation (gemini-3-pro-image-preview), producing one image per segment from those prompts.

Audio is produced with Gemini's native text-to-speech from the narration scripts. Assets are stored in S3; the API returns markup that embeds segment boundaries and asset references so clients can render text, images, and audio together.

Architecture

TL;DR: Built like a production service — async pipeline for heavy jobs + synchronous agents for real-time integrations, with assets stored in S3.

The system has two main processing flows:

Asynchronous pipeline (Kafka + workers): For full job processing with webhooks, polling, and API-key quotas. The worker processes jobs, segments text, generates images and audio, and stores assets in S3.
Synchronous agents (gRPC + MCP): A separate agents service exposes segmentation, audio, and image generation over gRPC and MCP (Model Context Protocol) with API key auth. External systems can call these directly:
- gRPC provides all agents (segmentation, audio narration + TTS, image prompt + generation) with protobuf contracts.
- MCP (JSON-RPC 2.0 over HTTP) exposes segment_text, generate_image_prompt, and generate_image as tools with schema discovery.
- Large assets (audio, images) are automatically uploaded to S3 with user-scoped paths (agents/<user_uid>/audio/...) and returned as URLs to avoid message size limits.

Project architecture

The API includes a WebSocket endpoint (/agents/ws) for the frontend so long-running agent calls don't timeout.

Gemini 3 is central: from understanding and segmenting text to generating images and driving the full "text → illustrated, narrated story" pipeline, whether via the async worker or direct synchronous agent calls.

Built With

Updates

Vasil Kulakov posted an update — Feb 09, 2026 09:03 AM EST

A new presentational video is live!

Log in or sign up for Devpost to join the conversation.

Vasil Kulakov posted an update — Feb 08, 2026 02:10 PM EST

Gemini TTS API is unavailable right before the deadline! /sounds of panic/ Adding a clear problem inidication and praying to the fastest TTS recovery!

Log in or sign up for Devpost to join the conversation.

Vasil Kulakov posted an update — Feb 07, 2026 04:14 PM EST

Aaaaand we've just added a fact-checking agent. It adds a small comment into each text segment, if there is something, readers should know.

Pheew, what a rush.

The last day before deadline we will spend to improve UX and security.

Log in or sign up for Devpost to join the conversation.

Vasil Kulakov posted an update — Feb 07, 2026 12:33 PM EST

Today we published gRPC and MCP endpoints for all agents that we've created for this project. You can try them from the web-interface or call directly on a public endpoint.

Log in or sign up for Devpost to join the conversation.

Vasil Kulakov posted an update — Feb 05, 2026 04:03 PM EST

Great news from Great Stories! Now we support multi-modal input! Thanks to Gemini 3 true multi-modality, you can add pdf files or images and get a full story with a great voice-over and illustrations from them and the text you provide!

Log in or sign up for Devpost to join the conversation.

Vasil Kulakov started this project — Feb 04, 2026 11:42 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.