Swa-AI

Swa AI Home page
Pre-built personas

Inspiration

We love the power of modern AI, but we're frustrated by the "one-size-fits-all" nature of cloud models. Every new chat starts from zero, forcing us to re-explain our custom instructions, roles, and context. This "generic AI fatigue" is inefficient, and the privacy risk of sending sensitive data—like personal thoughts, proprietary code, or practicing a speech—to a server is a non-starter for many.

The Chrome Built-in AI Challenge presented a perfect opportunity. We saw a future where AI isn't just a service you access, but a personal tool you own. Swa-AI (from the Sanskrit "स्व" meaning "one's own self") was born from this vision: to create a truly private, fast, and highly specialized AI hub that adapts to you, not the other way around.

What it does

Swa-AI is a 100% private, on-device AI platform that runs entirely in the browser using the Chrome LanguageModel API. It lets you create, save, and reuse a library of persistent, specialist AI Personas.

Key Features:

100% Private: All AI processing and data (personas, chats) are stored in your browser's localStorage. Nothing ever touches a server.
Persistent Persona Hub: Create new personas from scratch (e.g., a "JavaScript Code Explainer," an "Image Analyst") or use our pre-built specialists.
Speech Coach (Multimodal): A pre-built persona that uses live audio and video snapshots to provide detailed, private feedback on your presentation skills, analyzing your tone, pace, and visual cues.
Prompt Writer: A meta-persona that acts as an expert prompt engineer, helping you craft perfect prompts for any task.
In-Chat Multimodal: Chat with your personas using text, or upload images and audio files for analysis.
Full Chat History: Every conversation is saved and grouped by persona.
Rewrite Functionality: Instantly rewrite any AI response with new instructions (e.g., "make it shorter," "sound more formal") without losing your chat context.

How I built it

I built Swa-AI as a modern React web app, prioritizing a fast, polished user experience

Core: React (with Vite) and TypeScript for a robust, type-safe foundation.
On-Device AI: The core of the app is the Chrome LanguageModel API (Gemini Nano). We use:
LanguageModel.create() with initialPrompts to load each persona's unique system prompt and chat history, giving them persistent memory and character.
session.promptStreaming() to deliver fast, real-time chat responses.
Multimodal Input (expectedInputs) to handle audio and image data for the Speech Coach and in-chat uploads.
UI/UX: TailwindCSS + shadcn/ui for a responsive, professional-looking interface. sonner is used for non-intrusive toast notifications.
Media Capture: react-media-recorder to handle camera/microphone access, with a custom canvas solution for capturing video snapshots.
State & Storage: All personas and conversations are stored as JSON in the browser's localStorage, managed by custom React hooks (usePersonas, useLanguageModel).
Validation: zod for validating new persona creation.
Deployment: Hosted on Vercel with an Origin Trial Token to enable multimodal features for judges.

Challenges I ran into

The Multimodal Hurdle: The multimodal features (audio/image input) are experimental. Our primary development machine (with 8GB RAM) didn't meet the 16GB RAM requirement for CPU fallback, resulting in a NotAllowedError. We had to switch to a capable system to test and validate our multimodal code (Speech Coach, image uploads), confirming our implementation was correct but blocked by hardware.
LanguageModel API Nuances: The API has strict rules. We hit a NotAllowedError when trying to create a session after a model download finished, because the useEffect trigger didn't count as a "user gesture." We had to refactor our logic to chain the session creation directly to the downloadModel button's click event.
Live Video Flickering: react-media-recorder provided new previewStream references on each render, causing our useEffect to constantly stop/start the video tracks, leading to a black, flickering preview. We solved this with a useRef (hasSetStreamRef) to ensure the stream was only assigned to the element once.

Accomplishments that I am proud of

A Complete, Polished Platform: This isn't just a single-feature demo; it's a fully functional application with persistent state, routing, and multiple advanced features.
Solving the NotAllowedError: Debugging and solving the user gesture and multimodal capability errors felt like a huge win.
Successful Multimodal Analysis: Seeing the Speech Coach actually analyze live audio and video snapshots on a capable system and provide cohesive feedback was the "wow" moment of the project.
Advanced Prompt Engineering: The "Prompt Writer" persona, which guides users to write better prompts, works incredibly well. The "StoryWeaver" and "Speech Coach" prompts also produce high-quality, structured output.

What we learned

On-device AI is viable today. The LanguageModel API is fast, surprisingly powerful (especially text generation), and opens a new world of private-first applications.
Read the Whole Doc: The user gesture requirement for downloads/session creation is a critical detail.
Hardware Matters: When working with experimental, high-performance APIs, the documented system requirements (like 16GB RAM) are not suggestions—they are hard rules.
Refactoring is Key: My first attempt at managing the previewStream was buggy. Debugging, logging, and refactoring to a simpler useRef-based solution was essential.

What's next for Swa-AI

Swa-AI is a platform I truly believe in, and I plan to keep working on it.

Local Import/Export: Implement a 100% local "Export Persona" (to JSON file) and "Import Persona" feature, allowing users to share their custom AIs without ever touching a server.
Speech Coach Uploads: Finish the client-side video processing (audio extraction, snapshotting) for the "Upload Video" feature.
Deeper History: Use the session.append() method to feed longer chat histories to the model for even better long-term context.
Richer Rewrites: Explore the Rewriter API for tone/length adjustments, in addition to our current Prompt API based rewrite.

Built With

promptapi
react
tailwind
typescript

Updates

Kaushik paykoli started this project — Oct 31, 2025 12:29 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.