π¨ StoryFrame AI - Project Story
π‘ The Spark of Inspiration
The idea for StoryFrame AI was born during a quiet evening when I was reading a literary novel. As I immersed myself in the narrative, I found myself constantly pausing to visualize the scenes in my mind. "What if I could actually see these moments?" I wondered. The descriptions were vivid, but I craved something moreβa visual representation that could bring the story to life.
I thought: "How can I make this visualization happen for anyone, anywhere, while reading anything online?"
For months, this idea sat dormant. I lacked the technical knowledge and tools to make it a reality. But everything changed when Google announced Gemini Nano and Gemini 2.5 Flash Image. The combination of on-device AI for privacy-focused text analysis and cost-effective cloud image generation was the breakthrough I needed.
I realized: This is possible now. And not just for novelsβfor articles, PDFs, educational content, and any text a user encounters online. That's when StoryFrame AI was born.
π οΈ How I Built It
Tech Stack & Architecture
StoryFrame AI is a Chrome Extension built with a hybrid AI architecture that leverages both client-side and cloud-based models:
Frontend
- HTML5, CSS3, JavaScript (ES6+) - Pure vanilla JS, no frameworks
- Chrome Extension APIs - Manifest V3, Side Panel API, Context Menus API, Storage API
- Modern UI/UX - Gradient-based design with smooth animations and progressive loading
AI & APIs
- Gemini Nano (Prompt API) - On-device text analysis and scene breakdown
- Summarizer API - On-device text summarization for long content (70+ words)
- Gemini 2.5 Flash - Cloud-based prompt generation fallback
- Gemini 2.5 Flash Image - Primary cloud image generation
- OpenRouter - Alternative API gateway for broader model access
Key Features
- Multi-tab support with per-tab state management
- Complete history system with local storage
- Smart text processing with automatic summarization
- Fallback mechanisms for reliability
- Context menu integration that works even on PDFs
Architecture Decisions
The hybrid approach was crucial:
User selects text
β
[On-Device] Gemini Nano analyzes narrative structure (privacy-first)
β
[On-Device] Summarizer API condenses long text (if needed)
β
[Cloud] Gemini 2.5 Flash Image generates comic-style visualization
β
User views multi-panel story in side panel
Why this architecture?
- Privacy: Sensitive text analysis happens locally
- Quality: Cloud models produce high-quality images
- Cost: Efficient resource usage with on-device processing
- Reliability: Multiple fallback paths ensure it always works
π What I Learned
1. Chrome Built-in AI is a Game-Changer
Before this project, I had no idea Gemini Nano existed. I had been using Ollama and other local models for sensitive file processing, running heavy LLMs on my machine. Discovering that Chrome now has built-in AI capabilities was mind-blowing:
- Text analysis happens instantly without downloads
- No GPU required on user's machine
- Privacy-preserving by design
- Seamless integration with web content
This opened my eyes to the future of client-side AI.
2. Chrome Extension Development is Complex (But Rewarding)
This was my first-ever Chrome extension, and the learning curve was steep:
Challenge #1: Service Workers vs. Background Scripts
- Manifest V3 requires service workers (no persistent background pages)
- Had to completely rethink state management
- Learned about
importScripts()vs. ES6 modules
Challenge #2: PDF Interaction
- Chrome loads PDFs in a highly secure sandboxed environment
- Content scripts cannot inject into PDF viewers
- Discovered this limitation after hours of debugging
- Solution: Created a workaround using
chrome.storage.localto pass data and context menus for triggering
Challenge #3: Side Panel API
- Relatively new API with limited documentation
- Had to experiment extensively to get it working across different scenarios
- Learned about window/tab-specific panel opening restrictions
Challenge #4: Cross-Origin Restrictions
- Can't directly manipulate certain web contexts
- Had to design message-passing architecture between content scripts, background service worker, and side panel
3. AI Model Limitations & Workarounds
Gemini Nano Context Window Issue:
The biggest technical challenge was Gemini Nano's limited context window and processing speed:
- Maximum ~3,000 characters (after summarization)
- Takes 60-180 seconds for complex prompts
- Performance varies significantly based on device hardware
- Even on my high-end CPU, it struggled with longer texts
My Solution: Multi-Layered Fallback System
// Intelligent fallback hierarchy
1. Try Gemini Nano (80-second timeout)
β [if timeout/error]
2. Use Summarizer API to condense text
β [if still too long]
3. Fall back to Gemini 2.5 Flash (cloud)
β [if Google AI fails]
4. Use OpenRouter as final fallback
This ensures reliability without sacrificing user experience.
4. The Genre-Agnostic Challenge
Initially, my prompts forced a "funny, comedic" style on ALL content. This was terrible for serious literature, dramatic scenes, or horror stories.
Learning: AI prompt engineering requires context-aware instructions. I refactored the system prompts to:
- First analyze the text's genre and tone
- Adapt visual style, colors, and expressions accordingly
- Match the original narrative's emotional intent
π§ Challenges I Faced
Challenge #1: Image Generation Cost πΈ
The biggest barrier was cost. While Gemini Nano is free (on-device), Gemini 2.5 Flash Image is not free unless you have billing enabled on your Google Cloud project.
Current State:
- Google AI Studio requires billing for image generation
- No generous free tier like text models
- This limits accessibility for users without API credits
My Workaround:
- Added OpenRouter support as an alternative
- Used my existing OpenRouter credits during development
- Created dual-API architecture so users can choose
Future Hope: I believe (and hope) that as Gemini models mature, Google will make image generation more accessible, potentially offering a free tier similar to text models. Once that happens, StoryFrame AI will become truly accessible to everyone.
Challenge #2: Gemini Nano Performance β±οΈ
Even with high-end hardware, Gemini Nano is:
- Slow: 60-180 seconds for prompt generation
- Context-limited: Can't handle long texts directly
- Unpredictable: Timeout issues on complex prompts
My Solution:
- Implemented Summarizer API to pre-process long texts
- Increased timeout to 180 seconds (3 minutes)
- Optimized session parameters (
temperature: 0.7,topK: 30) - Added comprehensive cloud fallback to Gemini 2.5 Flash
Trade-off: While I wanted to keep everything on-device for privacy, reliability had to take priority. The hybrid approach ensures users always get results, even if it means using cloud APIs.
Challenge #3: Multi-Tab State Management
Initially, the extension used global state variables, causing chaos when users opened multiple tabs:
- State would overwrite between tabs
- Prompts from different stories would mix
- Results would appear in the wrong tab
Solution: Implemented per-tab state management using a Map<tabId, state> structure, ensuring complete independence between tabs.
π¬ Technical Deep Dive
Why Summarizer API?
For texts over 70 words, I use the Summarizer API to:
- Preserve key narrative elements (characters, emotions, actions)
- Reduce token count for faster Nano processing
- Maintain story coherence for visualization
const summarizer = await self.Summarizer.create({
type: 'key-points',
format: 'plain-text',
length: 'medium',
sharedContext: 'Preserve complete narrative story...'
});
Why Multi-Panel Comics?
Instead of generating separate images, I create one image with multiple panels because:
- Maintains story continuity visually
- Reduces API calls and cost
- Creates a cohesive narrative flow
- Mimics actual comic book reading experience
π― What Makes StoryFrame AI Special
- Privacy-First: Text analysis happens on your device
- Context-Aware: Adapts to any genre (comedy, drama, horror, romance)
- Reliable: Multiple fallback mechanisms ensure it always works
- Accessible: Works on PDFs, articles, novels, any text online
- Complete History: Track and download all your visualizations
- Multi-Tab Support: Use independently across browser tabs
π Future Vision
Once Gemini 2.5 Flash Image becomes more affordable/free, StoryFrame AI will:
- Remove cost barriers for users
- Enable unlimited creative visualization
- Become a standard tool for readers, educators, and creators
- Potentially support real-time visualization as you scroll
π Acknowledgments
This project wouldn't exist without:
- Google's Chrome Built-in AI Challenge for inspiring innovation
- Gemini Nano for bringing AI to the browser
- The Chrome team for powerful extension APIs
- OpenRouter for alternative API access during development
Built with β€οΈ for the Google Chrome AI Challenge 2025
"Making stories visible, one panel at a time."
Built With
- chrome
- css3
- gemini2.5flash
- gemini2.5flashimage
- html5
- javascript
- mainfest
- openrouter
- promptapi
- storageapi
- summarizerapi
Log in or sign up for Devpost to join the conversation.