x | Devpost

Private user posted an update — Nov 11, 2025 01:06 AM EST

Hello, I wanted to share a recent update with several major improvements. I didn’t include these changes in the main submission or the original GitHub repo because I wanted to keep the competition version unchanged.

Since this is a solution I plan to keep improving for my own use, I wanted to mention the latest enhancements. They are significant, and I have released them as a Version 2 in a separate repository.

1. Two-Stage AI Form Analysis (Key Innovation)

One of the major technical innovations is our two-stage AI analysis system that dramatically improves privacy and reduces costs by keeping more processing on-device:

The Problem with Single-Stage Analysis: In a traditional single-stage approach, one AI call would need:

Entire HTML page (often 3000+ tokens) System prompt (~500 tokens) Custom instructions (~300 tokens) Personal knowledge base with user data (~2000-3000 tokens) Total: >6000 tokens → Forces processing to cloud (exceeds Gemini Nano limit) → User data sent to cloud → Privacy concerns + API costs

Our Two-Stage Solution:

Two-Stage Architecture:

Stage 1: HTML → Gemini 2.5 Flash → Form Structure (exact labels, types, IDs) Stage 2: Field Labels + Knowledge Base → Gemini 2.5 Flash → Exact Values Why This is Better for Privacy & Cost:

Stage 1: Form Structure Detection (~3500 tokens)

Input: HTML page only (no user data) Output: Extracted field labels, types, IDs Privacy: No sensitive user data involved, cloud processing is acceptable May go to cloud, but contains no private information Stage 2: Value Extraction (~2500 tokens)

Input: Only extracted field labels (not entire HTML) + knowledge base Key benefit: Removed massive HTML, greatly reduced token count Result: Much higher probability of staying under 6000 tokens Privacy: More likely to process on-device with Gemini Nano (user data stays local) Cost: Avoids cloud API calls when possible Benefits:

✅ Privacy: Stage 2 (with user data) more likely to stay on-device ✅ Cost savings: Fewer cloud API calls = lower costs ✅ Speed: On-device processing is faster (<200ms vs 2s) ✅ Exact field matching: Stage 2 returns {"Email Address": "value"} with exact field labels ✅ Better context: Stage 1 provides field metadata to Stage 2

2. User Storage Choice (Cloud vs Local)

Another key innovation is giving users full control over where their embeddings are stored.

Why This Matters:

✅ User choice: Privacy-conscious users can keep data local (default mode)
✅ Scalability: Power users can upload 100s of documents to cloud
✅ Transparent: UI shows which mode is active and storage usage
✅ Multi-device: Cloud mode enables syncing across devices
✅ GDPR-friendly: Users control their data location
✅ Graceful fallback: If cloud storage fails, system falls back to local storage

Storage Comparison:

Feature	Local Storage	Cloud Storage
Capacity	10MB (~20 docs)	Unlimited (petabytes)
Privacy	100% local	Data in Google Cloud
Speed	<10ms	~500ms
Multi-device	❌ No	✅ Yes
Cost	Free	~$0.02/GB/month

3. Voice Assistant with Gemini Live API and Web Speech API Fallback

The extension provides two voice input options:

Primary: Gemini Live API 2.5 (preferred when available)

Native audio processing with function calling
AI autonomously fills fields by calling JavaScript functions
Real-time bidirectional streaming

Fallback: Web Speech API (when Gemini Live unavailable)

Browser's built-in speech recognition
Converts speech to text for basic voice input
Works without internet in some browsers

Log in or sign up for Devpost to join the conversation.