Inspiration

We've all been there—opening TikTok "just to check" and suddenly 90 minutes have vanished. After yet another 2-hour scrolling session, we asked: "What if an AI agent could literally scroll for us?"

With Gemini Live API's real-time vision capabilities, we realized we could build something radical: an autonomous agent that watches your feed, understands what matters, and delivers a curated digest—so you never have to scroll again.

This isn't a productivity hack. It's a mental health intervention.


What it does

Unhooked is a Chrome extension that scrolls your social media feeds for you using Google's Gemini Live API.

The experience:

  1. Set your curator goal: "AI news, startup funding, cooking recipes"
  2. Click "Start Curating"
  3. The agent takes over—scrolling automatically while streaming your tab as video to Gemini Live
  4. Gemini watches in real-time and narrates observations via audio transcription
  5. The agent adapts: PAUSE on interesting content, SKIP ads instantly, SLOW for potential relevance
  6. After 3-10 minutes, you get a rich digest with Must-Read highlights, noise breakdown, and time saved

Result: Stay informed in 3 minutes instead of scrolling for an hour.


How we built it

Architecture:

Chrome Tab → Tab Capture (1 FPS) → Offscreen Document → WebSocket
    ↓
Gemini Live API (bidiGenerateContent)
    ↓
Audio Transcription → Scroll Commands → Content Script
    ↓
Adaptive Scroll State Machine (PAUSE/SLOW/FAST/UP)

Tech Stack:

  • Gemini Live API (bidiGenerateContent) for real-time video analysis
  • Chrome Manifest V3 (service worker, offscreen document, content script)
  • Tab Capture API → JPEG frames at 1 FPS
  • Google GenAI SDK (@google/genai)
  • generateContent for digest generation

Key Innovation: We parse Gemini's natural language observations ("This looks interesting, let's pause") into scroll commands, creating a true autonomous agent that acts on your behalf.


Challenges we ran into

1. Real-time video streaming latency — Solved by reducing to 1 FPS (640x360, 70% JPEG quality), cutting latency from 3-4s to <1s

2. Parsing natural language into commands — Built keyword-based parser with priority levels and auto-resume timeouts

3. State management across extension components — Background worker as single source of truth with heartbeat pings

4. Platform bot detection — Humanized scroll patterns with randomized speeds, variable intervals, and micro-scrolls

5. Token costs — Optimized to 6-8K tokens per 3-min session (~$0.0009) through frame skipping and prompt compression


Accomplishments that we're proud of

🏆 Genuinely autonomous — Takes physical action (scrolling) based on real-time vision, not just responding to prompts

🎥 <1 second latency — True real-time multimodal interaction using Gemini Live's audio transcription

No backend required — Pure Chrome extension using Google's public APIs

🎯 Measurable impact — Beta testers save 7-12 minutes per 3-minute session (3-4x compression), filter 75-85% noise

🛡️ Privacy-first — Everything local, no analytics, full transparency

💡 Novel use case — First autonomous browser control using Gemini Live API


What we learned

Technical:

  • 1 FPS is the sweet spot for scroll analysis (balances cost and completeness)
  • Natural language is a valid command interface—better than forcing JSON structure
  • outputAudioTranscription enables real-time agent narration with <1s latency
  • Manifest V3 requires careful message-passing architecture

Design:

  • Transparency builds trust—showing every PAUSE/SKIP decision made users confident
  • "Time saved" metric resonates more than technical stats
  • Casual digest language ("Found some solid AI updates") beats formal summaries

Philosophy:

  • AI agents should augment human intention, not replace agency
  • The future isn't chatbots—it's invisible agents handling tedious digital labor
  • Gemini Live API makes autonomous workflows possible today

What's next for Unhooked

Short-term:

  • Chrome Web Store launch with free/paid tiers
  • Voice digests using Gemini's audio output
  • User feedback loop to improve relevance

Medium-term:

  • Cross-platform digest (Twitter + Instagram + LinkedIn + Reddit in one)
  • Smart notifications for truly urgent content
  • Learning preferences over time

Long-term vision:

  • Universal attention filter for email, meetings, Slack, YouTube
  • Agent personas (Minimalist Monk, News Junkie, Social Butterfly)
  • B2B version for competitive intelligence

The goal: Compress the internet's information overload into the precise 10 minutes that actually matters to you.

Unhooked is just the beginning. The future is agentic—and we're building it.

Built With

Share this project:

Updates