Inspiration
We love the power of modern AI, but we're frustrated by the "one-size-fits-all" nature of cloud models. Every new chat starts from zero, forcing us to re-explain our custom instructions, roles, and context. This "generic AI fatigue" is inefficient, and the privacy risk of sending sensitive data—like personal thoughts, proprietary code, or practicing a speech—to a server is a non-starter for many.
The Chrome Built-in AI Challenge presented a perfect opportunity. We saw a future where AI isn't just a service you access, but a personal tool you own. Swa-AI (from the Sanskrit "स्व" meaning "one's own self") was born from this vision: to create a truly private, fast, and highly specialized AI hub that adapts to you, not the other way around.
What it does
Swa-AI is a 100% private, on-device AI platform that runs entirely in the browser using the Chrome LanguageModel API. It lets you create, save, and reuse a library of persistent, specialist AI Personas.
Key Features:
- 100% Private: All AI processing and data (personas, chats) are stored in your browser's
localStorage. Nothing ever touches a server. - Persistent Persona Hub: Create new personas from scratch (e.g., a "JavaScript Code Explainer," an "Image Analyst") or use our pre-built specialists.
- Speech Coach (Multimodal): A pre-built persona that uses live audio and video snapshots to provide detailed, private feedback on your presentation skills, analyzing your tone, pace, and visual cues.
- Prompt Writer: A meta-persona that acts as an expert prompt engineer, helping you craft perfect prompts for any task.
- In-Chat Multimodal: Chat with your personas using text, or upload images and audio files for analysis.
- Full Chat History: Every conversation is saved and grouped by persona.
- Rewrite Functionality: Instantly rewrite any AI response with new instructions (e.g., "make it shorter," "sound more formal") without losing your chat context.
How I built it
I built Swa-AI as a modern React web app, prioritizing a fast, polished user experience
- Core: React (with Vite) and TypeScript for a robust, type-safe foundation.
- On-Device AI: The core of the app is the Chrome
LanguageModelAPI (Gemini Nano). We use: LanguageModel.create()withinitialPromptsto load each persona's unique system prompt and chat history, giving them persistent memory and character.session.promptStreaming()to deliver fast, real-time chat responses.- Multimodal Input (
expectedInputs) to handle audio and image data for the Speech Coach and in-chat uploads. - UI/UX: TailwindCSS + shadcn/ui for a responsive, professional-looking interface. sonner is used for non-intrusive toast notifications.
- Media Capture: react-media-recorder to handle camera/microphone access, with a custom canvas solution for capturing video snapshots.
- State & Storage: All personas and conversations are stored as JSON in the browser's localStorage, managed by custom React hooks (
usePersonas,useLanguageModel). - Validation: zod for validating new persona creation.
- Deployment: Hosted on Vercel with an Origin Trial Token to enable multimodal features for judges.
Challenges I ran into
The Multimodal Hurdle: The multimodal features (audio/image input) are experimental. Our primary development machine (with 8GB RAM) didn't meet the 16GB RAM requirement for CPU fallback, resulting in a
NotAllowedError. We had to switch to a capable system to test and validate our multimodal code (Speech Coach, image uploads), confirming our implementation was correct but blocked by hardware.LanguageModel API Nuances: The API has strict rules. We hit a NotAllowedError when trying to create a session after a model download finished, because the useEffect trigger didn't count as a "user gesture." We had to refactor our logic to chain the session creation directly to the downloadModel button's click event.
Live Video Flickering: react-media-recorder provided new previewStream references on each render, causing our useEffect to constantly stop/start the video tracks, leading to a black, flickering preview. We solved this with a useRef (hasSetStreamRef) to ensure the stream was only assigned to the element once.
Accomplishments that I am proud of
A Complete, Polished Platform: This isn't just a single-feature demo; it's a fully functional application with persistent state, routing, and multiple advanced features.
Solving the NotAllowedError: Debugging and solving the user gesture and multimodal capability errors felt like a huge win.
Successful Multimodal Analysis: Seeing the Speech Coach actually analyze live audio and video snapshots on a capable system and provide cohesive feedback was the "wow" moment of the project.
Advanced Prompt Engineering: The "Prompt Writer" persona, which guides users to write better prompts, works incredibly well. The "StoryWeaver" and "Speech Coach" prompts also produce high-quality, structured output.
What we learned
On-device AI is viable today. The LanguageModel API is fast, surprisingly powerful (especially text generation), and opens a new world of private-first applications.
Read the Whole Doc: The user gesture requirement for downloads/session creation is a critical detail.
Hardware Matters: When working with experimental, high-performance APIs, the documented system requirements (like 16GB RAM) are not suggestions—they are hard rules.
Refactoring is Key: My first attempt at managing the previewStream was buggy. Debugging, logging, and refactoring to a simpler useRef-based solution was essential.
What's next for Swa-AI
Swa-AI is a platform I truly believe in, and I plan to keep working on it.
Local Import/Export: Implement a 100% local "Export Persona" (to JSON file) and "Import Persona" feature, allowing users to share their custom AIs without ever touching a server.
Speech Coach Uploads: Finish the client-side video processing (audio extraction, snapshotting) for the "Upload Video" feature.
Deeper History: Use the
session.append()method to feed longer chat histories to the model for even better long-term context.Richer Rewrites: Explore the
Rewriter APIfor tone/length adjustments, in addition to our currentPrompt APIbased rewrite.
Built With
- promptapi
- react
- tailwind
- typescript
Log in or sign up for Devpost to join the conversation.