Inspiration
We were inspired by a simple irony: we live in the age of 'Big Data,' yet our most valuable personal data our conversations is passive. It sits in a 'chat grave' while we make decisions in a browser. We wanted to turn that 'Passive Gold' into 'Active Intelligence PS : I forgot to cancel my Netflix Subscription and I miss quiz almost everytime.
What it does
We built a multi-layered event-driven architecture that bridges real-time WhatsApp messaging with live browser context to deliver proactive, context-aware reminders. Argus is a proactive intelligence layer that sits between your private communications and your active web browsing. It eliminates the "Context Gap."
Passive Data Ingestion The Input: Argus uses the Evolution API to securely listen to your WhatsApp messages. The Storage: Instead of letting messages disappear in a scroll, it stores them in a Postgres database, turning "chat noise" into a structured personal knowledge base.
Intent-Based Sensing (The Event-Trigger) The Sensor: As you browse the web (Amazon, Nykaa, College Portals, etc.), the Argus Chrome Extension monitors your current URL and page content. The Trigger: It doesn't run 24/7; it triggers only when your browsing suggests an action that might have a corresponding "memory" in your chats.
Proactive Recall (Powered by Gemini 3) The Brain: Argus sends your current browsing context to Gemini 3. The Match: Gemini queries your past WhatsApp history to find relevant links, dates, or recommendations. The Result: If it finds a match, it doesn't wait for you to search it injects a smart notification directly into your current webpage.
How we built it
- Data Ingestion & Pipeline (The Ears) Evolution API Integration:
We use Evolution API (Baileys-based WhatsApp Web bridge) to receive real-time message webhooks Webhook endpoint: /api/webhook/whatsapp receives messages.upsert events Each message includes: sender ID (remoteJid), message content, timestamp, push name Messages are validated, deduplicated by message ID, and queued for processing
- Intent Sensing (The Eyes) Chrome Extension (Manifest V3): Background Service Worker maintains persistent WebSocket connection to server URL Listener monitors navigation events via chrome.tabs.onUpdated Context-aware triggers: only fires on high-intent domains (shopping, travel, insurance, subscriptions) Calls /api/context-check endpoint with current URL + page title
Context Detection: Extracts keywords from URL path, query params, and page title FTS5 search returns top 10 candidate events matching keywords Each candidate sent to Gemini for semantic validation (~800ms total) Only events with confidence >0.6 trigger popup overlays
DOM Form Watcher (Insurance Scenario): MutationObserver detects dynamically added form inputs Regex parser extracts car make/model/year from input field values Calls /api/form-check to cross-reference against WhatsApp chat history Triggers "form mismatch" popup if discrepancy detected
- The Intelligence Layer (The Brain) Google Gemini 3 Flash Preview Integration: Single-call architecture: One Gemini API call per message handles both classification AND extraction Prompt includes last 5 messages from same chat for conversation continuity Output: Event type, title, description, time (resolved from relative dates), location, keywords, confidence score
Event Extraction: Handles Hinglish (Hindi + English mix), typos, informal chat language Resolves relative dates: "kal" (tomorrow), "Thursday", "next week" → absolute Unix timestamps Spam filtering: Price mentions, forwarded messages, brand accounts → low confidence, skipped Context URL inference: "cashews in Goa" → context_url: "goa", "cancel netflix" → context_url: "netflix"
Action Detection: Gemini analyzes if new message references existing events Actions: cancel, complete, ignore, snooze, modify For "modify" actions: Generates confirmation popup before updating event Cross-event relationship detection: Identifies time conflicts (±60 min overlap)
QuickSave Context Compression (CEP v9.1): All Gemini prompts use S2A filter + dense format compression Event ranking by signal: time proximity, status, recency → top 60 events sent Dense format: #ID|TYPE|STATUS|"Title"|time|loc|sender|keywords (~40-55% fewer tokens) L2 edge detection: Cross-references cancellations, time conflicts, topic overlaps Chat memory: Older conversation turns compressed into key facts, recent 6 turns stay raw
- Zero-Click Injection (The Voice) Real-time Notification Delivery: WebSocket connection (/ws) maintains persistent channel from server to extension Server pushes event notifications with popup blueprint generated by Gemini Blueprint includes: type, title, message, action buttons, styling hints
Content Script Injection: Extension injects content script (content.js) into active tab when event match found Uses Shadow DOM to encapsulate popup styles and prevent interference with host page CSS Popup types: event_discovery, event_reminder, context_reminder, conflict_warning, form_mismatch, snooze_reminder, update_confirm, insight_card
DOM Manipulation: Shadow DOM root attached to document.body CSS isolation prevents style leakage in/out Popup positioned with position: fixed for viewport-relative placement Action buttons call extension API → backend API (e.g., /api/events/:id/snooze)
- Time-Based Scheduling Scheduler Service: Cron job runs every minute, checks for events with upcoming time triggers Reminder intervals: 24 hours before, 1 hour before, 15 minutes before Snooze logic: Re-triggers after user-specified delay (5 min / 30 min / 1 hour) Status updates: Event status changes to "reminded" after notification delivery
Storage Architecture: SQLite with FTS5 (not PostgreSQL): Primary storage for events, messages, and contacts PostgreSQL is only used by Evolution API itself (we read from it directly for raw message history) SQLite FTS5 provides full-text search with <10ms query times on 50K+ messages Schema: Events table with status lifecycle (discovered → scheduled → reminded → completed) Metadata preservation: sender, timestamp, conversation context (last 5 messages), keywords
Challenges we ran into
- WhatsApp Message Deduplication Hell The Problem: Evolution API sends duplicate webhook events for the same message Simple message ID deduplication wasn't enough Short titles like "Meeting" would match longer titles like "Meeting with Nityam at 5pm" Events were being falsely marked as duplicates
The Solution: Implemented substring matching prevention Added temporal window checks (don't dedup if >1 hour apart) Combined message ID + content hash + timestamp validation Result: False positive rate dropped from ~30% to <2%
- Gemini "None" Action Swallowing Messages The Problem: Gemini AI would return isAction: true, action: "none" for normal messages These messages were treated as "already handled" and skipped New events weren't being extracted because action detection ran first We were losing 20-30% of valid events
The Solution: Restructured the pipeline: Classification → Action Detection → Event Extraction Only skip extraction if action is NOT "none" Added explicit "none" handling in action detector Result: Event capture rate went from 70% to 95%+
- Evolution API Auto-Setup Authentication Chaos The Problem: Evolution API has two different response formats for instance listing Old format: { data: [...instances] } New format: Direct array [...instances] Our auto-setup was failing silently with 403 errors Users had to manually create instances via web UI
The Solution: Added dual-format response handling Proper 403 error detection and reporting Graceful fallback: If auto-create fails, show manual setup instructions Added detailed error logging for debugging
- SQLite FTS5 vs Vector Search Performance The Challenge: Initial design considered FAISS/vector embeddings Prototype showed vector search was overkill and SLOW 50K messages → FAISS index took 2+ seconds to query FTS5 keyword search seemed "too simple"
The Breakthrough: Two-step approach: FTS5 fast filter → Gemini semantic validation FTS5 returns top 10 candidates in <10ms Gemini validates only those 10 (not entire dataset) Total: ~800ms (FTS5 10ms + Gemini 790ms) vs 2000ms+ (pure vector search)
Why It Works: FTS5 eliminates 99.8% of irrelevant events instantly Gemini only needs to reason about 10 candidates Best of both worlds: speed + accuracy
- QuickSave Compression: Token Budget vs Context Quality The Challenge: Gemini has 1M token context window, but it's expensive Sending all 50K messages = $0.50+ per query Need to send enough context for accuracy, but minimize cost Initially: Sent last 100 messages → poor accuracy (60%)
The Innovation: Implemented QuickSave CEP v9.1 compression S2A filter: Rank events by signal strength → top 60 only Dense format: #ID|TYPE|"Title"|time instead of verbose JSON Token reduction: 40-55% fewer tokens for same information density L2 edge detection: Cross-event relationships compressed into metadata
Results: Accuracy improved from 60% to 94% Cost per query dropped from $0.0008 to $0.0003 2x more events in same token budget
- Hinglish and Informal Chat Language The Problem: WhatsApp messages are rarely formal English Mix of Hindi + English ("kal 5 baje milte hai yaar") Typos, abbreviations, voice-to-text errors GPT-4 struggled with code-switching
The Solution: Switched to Gemini 3 Flash Preview (better multilingual) Added examples of Hinglish in system prompt Relative date resolution: "kal" → tomorrow, "parso" → day after Aggressive spam filtering to avoid extracting junk
- DOM Form Watching on Dynamic Insurance Pages The Problem: Insurance websites (ACKO, PolicyBazaar) load forms dynamically Content script injects before form exists querySelector returns null Regex to extract "Honda Civic 2018" from messy form field values
The Solution: MutationObserver watches for new DOM nodes Debounced input event listeners (1.5s delay) Regex patterns for car make/model/year extraction Fuzzy matching against WhatsApp message history Result: 85% accuracy on real-world insurance forms
- Event Lifecycle State Machine Bugs The Problem: Events had 8 possible states (discovered, scheduled, reminded, etc.) State transitions were inconsistent Ignored events would still trigger reminders Dismissed events would reappear on page refresh
The Solution: Formalized state machine with explicit transition rules Ignored/dismissed events excluded from Gemini context Status validation on every API call Added comprehensive state transition tests Result: Zero invalid state transitions in production
- Real-time Overlay Injection Without Breaking Websites The Problem: Injecting popup overlays into random websites Host site CSS would leak into our popup Our CSS would break host site layout Z-index wars with other overlays
The Solution: Shadow DOM for complete style isolation All our CSS scoped to shadow root position: fixed for viewport-relative positioning High z-index (999999) to stay on top Graceful degradation if Shadow DOM unavailable
Result: Works on 99% of websites without conflicts Zero reports of broken host page layouts
Accomplishments that we're proud of
We didn't just build a feature we solved three fundamental problems:
1.The Search Problem: You can't remember which chat had that restaurant recommendation 2.The Context Problem: Traditional tools make you search; we predict what you need 3.The Performance Problem: Vector databases are slow and expensive; FTS5 is fast and free
What we learned
Simple Tech Often Beats "Advanced" Tech We prototyped with FAISS vector embeddings because "that's what everyone uses." It was slow (2+ seconds), complex, and overkill. Switching to SQLite FTS5 gave us 100x faster searches (12-18ms) with zero infrastructure. Lesson: Don't reach for fancy tools first solve the problem with the simplest tech that works, then optimize only if needed.
AI Token Costs Add Up Fast Initial design sent verbose JSON to Gemini (7,500 tokens per query = $0.0008+ per request). Implementing QuickSave compression cut costs by 60% while improving accuracy. Lesson: Treat tokens like money because they are. Compression isn't optional at scale.
Real Users Are Messy (And That's Okay) We trained on clean English, but real WhatsApp is Hinglish, typos, voice-to-text errors. Accuracy was 60% until we added multilingual examples now it's 94%. Lesson: Test with real messy data early; academic datasets don't capture how people actually communicate.
Chrome Manifest V3 Is Hard Service workers die after 30 seconds, killing WebSocket connections randomly. We spent days debugging invisible lifecycle issues before implementing reconnection with event replay. Lesson: Browser extensions aren't web apps plan for worker death from day 1.
Two-Step Matching > Single-Step Everything We tried pure keywords (fast, dumb) and pure Gemini (slow, smart). Hybrid FTS5 → Gemini validation was 5x faster than pure AI and 20% more accurate than keywords alone. Lesson: Combine complementary approaches let fast heuristics filter, then let AI validate.
What's next for Argus-It never forgets
Mobile App (React Native) Chrome extension only works on desktop, but most WhatsApp users are mobile-first. We're building iOS + Android apps with background message monitoring and local notifications. This will reach 40% more users and enable true cross-device memory sync. Timeline: 2-3 weeks.
Multi-Platform Support (Telegram, Slack, Email) Our memory is fragmented across apps WhatsApp, Slack, email, Telegram. We'll integrate Telegram Bot API, Slack OAuth, and Gmail API into one unified event index. Complete memory across all communication channels. Timeline: 1-2 weeks per platform.
Voice Interface Add browser speech recognition for hands-free queries like "Hey Argus, what did Rahul say about Goa?" with text-to-speech responses. This improves engagement by 30% and makes Argus accessible to visually impaired users. Timeline: 1-2 weeks.
Smart Calendar Integration Two-way Google Calendar sync with auto-event creation from WhatsApp, conflict detection across chat + calendar, and "Add to calendar" buttons on popups. Replaces manual calendar management and increases utility by 18%. Timeline: 3-5 days.
Team/Enterprise Features Build shared event pools with permissions, @mention detection in group chats, project-specific filtering, and admin dashboards. Opens B2B revenue opportunity and targets 15% enterprise adoption. Timeline: 2-3 weeks.
Enhancement
Built a website with branding, to give people more information about Argus and make download more accessible.