Turn any text into an illustrated, narrated story (chapters + images + audio) in minutes. Powered end-to-end by Gemini 3.
Who is this for?
- Educators turning lessons into multimedia material
- Content creators converting articles into engaging formats
- Companies transforming documentation into training
- Media teams producing narrated visual stories at scale
The magic moment
Paste text and get chapters with titles, one image per chapter, and ready-to-play narration — returned as structured markup your app can render immediately.
Why it wins
- Fast: 5-step automated pipeline from text to story
- Visual: Gemini-generated images per segment
- Auditory: Full narration via Gemini TTS
- API-first & modular: Built as an integration platform — embed story generation into existing products and workflows.
The Great Stories is an API-first multi-agent service that turns plain text into rich, segmented content: each logical segment gets its own image and spoken narration. It's built for the Gemini 3 API and uses it across the whole pipeline.
Try it now, it's live!
The Great Stories is a working platform you can use today — the links below are real generated outputs:
- A podcast about our own architecture!
- An article about homeopathy with fact-check notes
- A podcast-style narrated fictional story
Why Gemini 3?
Gemini 3 is especially strong at generating structured outputs (schemas, tables, explanations) and working with multimodal inputs. That combination makes reliable segmentation + asset generation possible — including stories based on pictures or infographics. Without that this project was impossible.
Text understanding and structuring
We use Gemini to segment long text into a requested number of parts (e.g. 3 or 5), with no gaps or overlaps and a short title per segment. A single structured Gemini call returns JSON with character ranges and titles. Content type (educational, financial, fictional) is passed in so segmentation and tone stay appropriate. Segments are cached at the boundary level for faster reprocessing of similar text.
Narration and image prompts
Per segment, Gemini generates a narration script (conversational or podcast-style, with disclaimers for financial content) and a detailed image prompt. Style is tuned by content type: educational = clear and diagram-like, financial = restrained, fictional = more creative and cinematic.
Multimodal generation
We use Gemini 3 for image generation (gemini-3-pro-image-preview), producing one image per segment from those prompts.
Audio is produced with Gemini's native text-to-speech from the narration scripts. Assets are stored in S3; the API returns markup that embeds segment boundaries and asset references so clients can render text, images, and audio together.
Architecture
TL;DR: Built like a production service — async pipeline for heavy jobs + synchronous agents for real-time integrations, with assets stored in S3.
The system has two main processing flows:
Asynchronous pipeline (Kafka + workers): For full job processing with webhooks, polling, and API-key quotas. The worker processes jobs, segments text, generates images and audio, and stores assets in S3.
Synchronous agents (gRPC + MCP): A separate agents service exposes segmentation, audio, and image generation over gRPC and MCP (Model Context Protocol) with API key auth. External systems can call these directly:
- gRPC provides all agents (segmentation, audio narration + TTS, image prompt + generation) with protobuf contracts.
- MCP (JSON-RPC 2.0 over HTTP) exposes
segment_text,generate_image_prompt, andgenerate_imageas tools with schema discovery. - Large assets (audio, images) are automatically uploaded to S3 with user-scoped paths (
agents/<user_uid>/audio/...) and returned as URLs to avoid message size limits.

The API includes a WebSocket endpoint (/agents/ws) for the frontend so long-running agent calls don't timeout.
Gemini 3 is central: from understanding and segmenting text to generating images and driving the full "text → illustrated, narrated story" pipeline, whether via the async worker or direct synchronous agent calls.

Log in or sign up for Devpost to join the conversation.