Hello, I wanted to share a recent update with several major improvements. I didn’t include these changes in the main submission or the original GitHub repo because I wanted to keep the competition version unchanged.
Since this is a solution I plan to keep improving for my own use, I wanted to mention the latest enhancements. They are significant, and I have released them as a Version 2 in a separate repository.
1. Two-Stage AI Form Analysis (Key Innovation)
One of the major technical innovations is our two-stage AI analysis system that dramatically improves privacy and reduces costs by keeping more processing on-device:
The Problem with Single-Stage Analysis: In a traditional single-stage approach, one AI call would need:
Entire HTML page (often 3000+ tokens) System prompt (~500 tokens) Custom instructions (~300 tokens) Personal knowledge base with user data (~2000-3000 tokens) Total: >6000 tokens → Forces processing to cloud (exceeds Gemini Nano limit) → User data sent to cloud → Privacy concerns + API costs
Our Two-Stage Solution:
Two-Stage Architecture:
Stage 1: HTML → Gemini 2.5 Flash → Form Structure (exact labels, types, IDs) Stage 2: Field Labels + Knowledge Base → Gemini 2.5 Flash → Exact Values Why This is Better for Privacy & Cost:
Stage 1: Form Structure Detection (~3500 tokens)
Input: HTML page only (no user data) Output: Extracted field labels, types, IDs Privacy: No sensitive user data involved, cloud processing is acceptable May go to cloud, but contains no private information Stage 2: Value Extraction (~2500 tokens)
Input: Only extracted field labels (not entire HTML) + knowledge base Key benefit: Removed massive HTML, greatly reduced token count Result: Much higher probability of staying under 6000 tokens Privacy: More likely to process on-device with Gemini Nano (user data stays local) Cost: Avoids cloud API calls when possible Benefits:
✅ Privacy: Stage 2 (with user data) more likely to stay on-device ✅ Cost savings: Fewer cloud API calls = lower costs ✅ Speed: On-device processing is faster (<200ms vs 2s) ✅ Exact field matching: Stage 2 returns {"Email Address": "value"} with exact field labels ✅ Better context: Stage 1 provides field metadata to Stage 2
2. User Storage Choice (Cloud vs Local)
Another key innovation is giving users full control over where their embeddings are stored.
Why This Matters:
- ✅ User choice: Privacy-conscious users can keep data local (default mode)
- ✅ Scalability: Power users can upload 100s of documents to cloud
- ✅ Transparent: UI shows which mode is active and storage usage
- ✅ Multi-device: Cloud mode enables syncing across devices
- ✅ GDPR-friendly: Users control their data location
- ✅ Graceful fallback: If cloud storage fails, system falls back to local storage
Storage Comparison:
| Feature | Local Storage | Cloud Storage |
|---|---|---|
| Capacity | 10MB (~20 docs) | Unlimited (petabytes) |
| Privacy | 100% local | Data in Google Cloud |
| Speed | <10ms | ~500ms |
| Multi-device | ❌ No | ✅ Yes |
| Cost | Free | ~$0.02/GB/month |
3. Voice Assistant with Gemini Live API and Web Speech API Fallback
The extension provides two voice input options:
Primary: Gemini Live API 2.5 (preferred when available)
- Native audio processing with function calling
- AI autonomously fills fields by calling JavaScript functions
- Real-time bidirectional streaming
Fallback: Web Speech API (when Gemini Live unavailable)
- Browser's built-in speech recognition
- Converts speech to text for basic voice input
- Works without internet in some browsers
Log in or sign up for Devpost to join the conversation.