icon

🎨 StoryFrame AI - Project Story

💡 The Spark of Inspiration

The idea for StoryFrame AI was born during a quiet evening when I was reading a literary novel. As I immersed myself in the narrative, I found myself constantly pausing to visualize the scenes in my mind. "What if I could actually see these moments?" I wondered. The descriptions were vivid, but I craved something more—a visual representation that could bring the story to life.

I thought: "How can I make this visualization happen for anyone, anywhere, while reading anything online?"

For months, this idea sat dormant. I lacked the technical knowledge and tools to make it a reality. But everything changed when Google announced Gemini Nano and Gemini 2.5 Flash Image. The combination of on-device AI for privacy-focused text analysis and cost-effective cloud image generation was the breakthrough I needed.

I realized: This is possible now. And not just for novels—for articles, PDFs, educational content, and any text a user encounters online. That's when StoryFrame AI was born.

🛠️ How I Built It

Tech Stack & Architecture

StoryFrame AI is a Chrome Extension built with a hybrid AI architecture that leverages both client-side and cloud-based models:

Frontend

HTML5, CSS3, JavaScript (ES6+) - Pure vanilla JS, no frameworks
Chrome Extension APIs - Manifest V3, Side Panel API, Context Menus API, Storage API
Modern UI/UX - Gradient-based design with smooth animations and progressive loading

AI & APIs

Gemini Nano (Prompt API) - On-device text analysis and scene breakdown
Summarizer API - On-device text summarization for long content (70+ words)
Gemini 2.5 Flash - Cloud-based prompt generation fallback
Gemini 2.5 Flash Image - Primary cloud image generation
OpenRouter - Alternative API gateway for broader model access

Key Features

Multi-tab support with per-tab state management
Complete history system with local storage
Smart text processing with automatic summarization
Fallback mechanisms for reliability
Context menu integration that works even on PDFs

Architecture Decisions

The hybrid approach was crucial:

User selects text
    ↓
[On-Device] Gemini Nano analyzes narrative structure (privacy-first)
    ↓
[On-Device] Summarizer API condenses long text (if needed)
    ↓
[Cloud] Gemini 2.5 Flash Image generates comic-style visualization
    ↓
User views multi-panel story in side panel

Why this architecture?

Privacy: Sensitive text analysis happens locally
Quality: Cloud models produce high-quality images
Cost: Efficient resource usage with on-device processing
Reliability: Multiple fallback paths ensure it always works

🎓 What I Learned

1. Chrome Built-in AI is a Game-Changer

Before this project, I had no idea Gemini Nano existed. I had been using Ollama and other local models for sensitive file processing, running heavy LLMs on my machine. Discovering that Chrome now has built-in AI capabilities was mind-blowing:

Text analysis happens instantly without downloads
No GPU required on user's machine
Privacy-preserving by design
Seamless integration with web content

This opened my eyes to the future of client-side AI.

2. Chrome Extension Development is Complex (But Rewarding)

This was my first-ever Chrome extension, and the learning curve was steep:

Challenge #1: Service Workers vs. Background Scripts

Manifest V3 requires service workers (no persistent background pages)
Had to completely rethink state management
Learned about importScripts() vs. ES6 modules

Challenge #2: PDF Interaction

Chrome loads PDFs in a highly secure sandboxed environment
Content scripts cannot inject into PDF viewers
Discovered this limitation after hours of debugging
Solution: Created a workaround using chrome.storage.local to pass data and context menus for triggering

Challenge #3: Side Panel API

Relatively new API with limited documentation
Had to experiment extensively to get it working across different scenarios
Learned about window/tab-specific panel opening restrictions

Challenge #4: Cross-Origin Restrictions

Can't directly manipulate certain web contexts
Had to design message-passing architecture between content scripts, background service worker, and side panel

3. AI Model Limitations & Workarounds

Gemini Nano Context Window Issue:

The biggest technical challenge was Gemini Nano's limited context window and processing speed:

Maximum ~3,000 characters (after summarization)
Takes 60-180 seconds for complex prompts
Performance varies significantly based on device hardware
Even on my high-end CPU, it struggled with longer texts

My Solution: Multi-Layered Fallback System

// Intelligent fallback hierarchy
1. Try Gemini Nano (80-second timeout)
   ↓ [if timeout/error]
2. Use Summarizer API to condense text
   ↓ [if still too long]
3. Fall back to Gemini 2.5 Flash (cloud)
   ↓ [if Google AI fails]
4. Use OpenRouter as final fallback

This ensures reliability without sacrificing user experience.

4. The Genre-Agnostic Challenge

Initially, my prompts forced a "funny, comedic" style on ALL content. This was terrible for serious literature, dramatic scenes, or horror stories.

Learning: AI prompt engineering requires context-aware instructions. I refactored the system prompts to:

First analyze the text's genre and tone
Adapt visual style, colors, and expressions accordingly
Match the original narrative's emotional intent

🚧 Challenges I Faced

Challenge #1: Image Generation Cost 💸

The biggest barrier was cost. While Gemini Nano is free (on-device), Gemini 2.5 Flash Image is not free unless you have billing enabled on your Google Cloud project.

Current State:

Google AI Studio requires billing for image generation
No generous free tier like text models
This limits accessibility for users without API credits

My Workaround:

Added OpenRouter support as an alternative
Used my existing OpenRouter credits during development
Created dual-API architecture so users can choose

Future Hope: I believe (and hope) that as Gemini models mature, Google will make image generation more accessible, potentially offering a free tier similar to text models. Once that happens, StoryFrame AI will become truly accessible to everyone.

Challenge #2: Gemini Nano Performance ⏱️

Even with high-end hardware, Gemini Nano is:

Slow: 60-180 seconds for prompt generation
Context-limited: Can't handle long texts directly
Unpredictable: Timeout issues on complex prompts

My Solution:

Implemented Summarizer API to pre-process long texts
Increased timeout to 180 seconds (3 minutes)
Optimized session parameters (temperature: 0.7, topK: 30)
Added comprehensive cloud fallback to Gemini 2.5 Flash

Trade-off: While I wanted to keep everything on-device for privacy, reliability had to take priority. The hybrid approach ensures users always get results, even if it means using cloud APIs.

Challenge #3: Multi-Tab State Management

Initially, the extension used global state variables, causing chaos when users opened multiple tabs:

State would overwrite between tabs
Prompts from different stories would mix
Results would appear in the wrong tab

Solution: Implemented per-tab state management using a Map<tabId, state> structure, ensuring complete independence between tabs.

🔬 Technical Deep Dive

Why Summarizer API?

For texts over 70 words, I use the Summarizer API to:

Preserve key narrative elements (characters, emotions, actions)
Reduce token count for faster Nano processing
Maintain story coherence for visualization

const summarizer = await self.Summarizer.create({
  type: 'key-points',
  format: 'plain-text',
  length: 'medium',
  sharedContext: 'Preserve complete narrative story...'
});

Why Multi-Panel Comics?

Instead of generating separate images, I create one image with multiple panels because:

Maintains story continuity visually
Reduces API calls and cost
Creates a cohesive narrative flow
Mimics actual comic book reading experience

🎯 What Makes StoryFrame AI Special

Privacy-First: Text analysis happens on your device
Context-Aware: Adapts to any genre (comedy, drama, horror, romance)
Reliable: Multiple fallback mechanisms ensure it always works
Accessible: Works on PDFs, articles, novels, any text online
Complete History: Track and download all your visualizations
Multi-Tab Support: Use independently across browser tabs

🚀 Future Vision

Once Gemini 2.5 Flash Image becomes more affordable/free, StoryFrame AI will:

Remove cost barriers for users
Enable unlimited creative visualization
Become a standard tool for readers, educators, and creators
Potentially support real-time visualization as you scroll

🙏 Acknowledgments

This project wouldn't exist without:

Google's Chrome Built-in AI Challenge for inspiring innovation
Gemini Nano for bringing AI to the browser
The Chrome team for powerful extension APIs
OpenRouter for alternative API access during development

Built with ❤️ for the Google Chrome AI Challenge 2025

"Making stories visible, one panel at a time."

Built With

chrome
css3
gemini2.5flash
gemini2.5flashimage
html5
javascript
mainfest
openrouter
promptapi
storageapi
summarizerapi

Updates

Abhishek Kumar Yadav started this project — Oct 29, 2025 09:45 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.