x | Devpost

Built With

chrome-built-in-ai

Updates

Private user posted an update — Nov 11, 2025 01:06 AM EST

Hello, I wanted to share a recent update with several major improvements. I didn’t include these changes in the main submission or the original GitHub repo because I wanted to keep the competition version unchanged.

Since this is a solution I plan to keep improving for my own use, I wanted to mention the latest enhancements. They are significant, and I have released them as a Version 2 in a separate repository.

1. Two-Stage AI Form Analysis (Key Innovation)

One of the major technical innovations is our two-stage AI analysis system that dramatically improves privacy and reduces costs by keeping more processing on-device:

The Problem with Single-Stage Analysis: In a traditional single-stage approach, one AI call would need:

Entire HTML page (often 3000+ tokens) System prompt (~500 tokens) Custom instructions (~300 tokens) Personal knowledge base with user data (~2000-3000 tokens) Total: >6000 tokens → Forces processing to cloud (exceeds Gemini Nano limit) → User data sent to cloud → Privacy concerns + API costs

Our Two-Stage Solution:

Two-Stage Architecture:

Stage 1: HTML → Gemini 2.5 Flash → Form Structure (exact labels, types, IDs) Stage 2: Field Labels + Knowledge Base → Gemini 2.5 Flash → Exact Values Why This is Better for Privacy & Cost:

Stage 1: Form Structure Detection (~3500 tokens)

Input: HTML page only (no user data) Output: Extracted field labels, types, IDs Privacy: No sensitive user data involved, cloud processing is acceptable May go to cloud, but contains no private information Stage 2: Value Extraction (~2500 tokens)

Input: Only extracted field labels (not entire HTML) + knowledge base Key benefit: Removed massive HTML, greatly reduced token count Result: Much higher probability of staying under 6000 tokens Privacy: More likely to process on-device with Gemini Nano (user data stays local) Cost: Avoids cloud API calls when possible Benefits:

✅ Privacy: Stage 2 (with user data) more likely to stay on-device ✅ Cost savings: Fewer cloud API calls = lower costs ✅ Speed: On-device processing is faster (<200ms vs 2s) ✅ Exact field matching: Stage 2 returns {"Email Address": "value"} with exact field labels ✅ Better context: Stage 1 provides field metadata to Stage 2

2. User Storage Choice (Cloud vs Local)

Another key innovation is giving users full control over where their embeddings are stored.

Why This Matters:

✅ User choice: Privacy-conscious users can keep data local (default mode)
✅ Scalability: Power users can upload 100s of documents to cloud
✅ Transparent: UI shows which mode is active and storage usage
✅ Multi-device: Cloud mode enables syncing across devices
✅ GDPR-friendly: Users control their data location
✅ Graceful fallback: If cloud storage fails, system falls back to local storage

Storage Comparison:

Feature	Local Storage	Cloud Storage
Capacity	10MB (~20 docs)	Unlimited (petabytes)
Privacy	100% local	Data in Google Cloud
Speed	<10ms	~500ms
Multi-device	❌ No	✅ Yes
Cost	Free	~$0.02/GB/month

3. Voice Assistant with Gemini Live API and Web Speech API Fallback

The extension provides two voice input options:

Primary: Gemini Live API 2.5 (preferred when available)

Native audio processing with function calling
AI autonomously fills fields by calling JavaScript functions
Real-time bidirectional streaming

Fallback: Web Speech API (when Gemini Live unavailable)

Browser's built-in speech recognition
Converts speech to text for basic voice input
Works without internet in some browsers

Log in or sign up for Devpost to join the conversation.

Private user started this project — Nov 01, 2025 02:19 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.