Inspiration
The spark for AI Muse Creator came from a late-night brainstorming session where I sketched ideas verbally but struggled to visualize them without clunky tools. As a developer fascinated by AI's creative potential, I wondered: What if I could speak a concept—a mystical forest or cyberpunk skyline—and have AI refine it into text, craft a vivid prompt, generate art, and even edit it on the fly, all within Chrome? Inspired by the Google Chrome Built-in AI Challenge 2025's call to reimagine the web with on-device Gemini Nano, I aimed to make multimodal creation accessible and private, eliminating typing barriers for artists, writers, and non-native English speakers. This project embodies the challenge's ethos: leveraging Chrome's AI APIs for seamless, offline innovation.
What it does
AI Muse Creator is a Chrome extension that transforms voice inputs into AI-generated art through an intuitive popup interface. Speak your idea (e.g., "a serene mountain lake at dawn"), and it auto-transcribes, detects language and mood, refines it into engaging text, crafts an advanced image prompt, and generates a high-quality image using Stability AI. Users can voice-edit the text (e.g., "make it brighter") via Canvas manipulation, download the result, or share via QR code/URL stored in MongoDB. Built for on-device privacy, it runs core logic with Gemini Nano, making creativity instant and secure—no cloud for mood/prompt gen.
How we built it
I built AI Muse Creator as a Manifest V3 Chrome extension with a Node.js backend, iterating over two weeks to integrate Chrome's AI APIs seamlessly.
Frontend (Extension): Started with
manifest.jsonfor V3 structure, permissions (storage, activeTab), and strict CSP. The popup (popup.html+popup.js) uses Web Speech API for voice recognition and TTS. Transcript auto-fills a textbox, then chains Chrome APIs: Translator for lang detection/translation, Prompt API (Gemini Nano) for mood analysis and advanced prompt generation (structured SDXL templates with quality boosters like "8k UHD, cinematic lighting"), and Writer API for text refinement.Backend Proxy: Express.js server (
server.js) handles cloud calls: Stability AI for image gen (with negative prompts for reliability) and MongoDB for share persistence. Axios fetches APIs, dotenv loads secrets from.env. Routes like/generate-imageand/analyze-mood(rule-based fallback) ensure offline resilience.Integration: Extension fetches backend via
BACKEND_URL. Voice edits use Canvas API for pixel adjustments. QR sharing with qrcode.js. Deployed backend to Vercel for live demos.
Tools: VS Code for editing, GitHub for repo, Vercel for hosting. Total ~500 LOC, focused on on-device efficiency.
Challenges we ran into
Several hurdles tested my resolve:
- API Compatibility: Gemini Nano required enabling experimental flags (
chrome://flags/#prompt-api-for-gemini-nano), and initial ONNX lang detection in backend failed (model 404 errors)—switched to Chrome's Translator API for reliability. - Async Flows in Extensions: Await in event handlers caused "SyntaxError" loops; resolved with
.thenchaining and ES modules in backend. - Prompt & Image Quality: Basic transcripts produced blurry outputs; iterated SDXL templates (adding "masterpiece, sharp focus") and negative prompts, boosting consistency to 90%.
- Secrets & GitHub Protection: Hardcoded API keys triggered push blocks—learned to use
.env+ BFG Repo-Cleaner to scrub history.
These pushed me to prioritize fallbacks and security, strengthening the MVP.
Accomplishments that we're proud of
- On-Device Multimodal Pipeline: Integrated 3+ Chrome AI APIs (Prompt, Translator, Writer) for a fully offline voice-to-art flow—Gemini Nano handles 80% of logic, cutting latency to <2s.
- Voice-Driven Edits: Canvas-based voice edits (e.g., "brighter" → pixel math) add interactivity, making it feel like a "living canvas."
- Global Accessibility: Supports 100+ languages with auto-translation, ensuring non-English users (e.g., Spanish voice → English prompt gen → native TTS).
- Reliable Sharing: QR/URL with Mongo persistence—tested end-to-end, including Vercel deployment for live demos.
- Challenge Alignment: Built during the contest, showcasing hybrid AI (on-device + cloud proxy) for privacy-focused creativity.
Proudest: From sketch to submission in weeks—it's functional, fun, and forward-thinking.
What we learned
This project was a masterclass in Chrome's AI ecosystem:
- API Chaining: Gemini Nano excels for lightweight tasks (mood/prompts) but needs structured outputs (JSON) for parsing—fallbacks like rule-based mood ensure robustness.
- Extension Constraints: V3's CSP and async limits demand creative workarounds (
.thenfor handlers, hashes for libs). - Prompt Engineering: SDXL thrives on detailed, structured prompts (subject/setting/lighting/quality)—simple templates fail; boosters like "8k UHD" transform results.
- Security Best Practices: GitHub's Push Protection taught me to never hardcode keys—
.env+ history scrubbing (BFG) is essential. - User-Centric Design: Voice UX reveals nuances (permission prompts, accent detection)—iterating with tests improved inclusivity.
Overall, I learned Chrome APIs enable "web-native" AI, but hybrid (on-device + cloud) balances speed and power.
What's next for AI Muse Creator: Voice-Powered AI Art with Gemini Nano
Next, evolve to full storytelling: Voice-to-video (integrate RunwayML for 5-10s clips from scene breakdowns via Prompt API) and collaborative edits (WebSockets for shared canvases). Add AR previews (WebXR for 3D image overlays) and more languages with fine-tuned Nano models. Open-source expansions: Community plugins for custom prompts or voice styles. Ultimately, publish to Chrome Web Store for global reach—turning everyday browsers into creative studios. Fork on GitHub and contribute!
Built With
- chrome
- express.js
- github
- javascript
- manifest
- mongodb
- node.js
- openai
- qrcode.js
- stability
Log in or sign up for Devpost to join the conversation.