Inspiration
We've all been there—opening TikTok "just to check" and suddenly 90 minutes have vanished. After yet another 2-hour scrolling session, we asked: "What if an AI agent could literally scroll for us?"
With Gemini Live API's real-time vision capabilities, we realized we could build something radical: an autonomous agent that watches your feed, understands what matters, and delivers a curated digest—so you never have to scroll again.
This isn't a productivity hack. It's a mental health intervention.
What it does
Unhooked is a Chrome extension that scrolls your social media feeds for you using Google's Gemini Live API.
The experience:
- Set your curator goal: "AI news, startup funding, cooking recipes"
- Click "Start Curating"
- The agent takes over—scrolling automatically while streaming your tab as video to Gemini Live
- Gemini watches in real-time and narrates observations via audio transcription
- The agent adapts: PAUSE on interesting content, SKIP ads instantly, SLOW for potential relevance
- After 3-10 minutes, you get a rich digest with Must-Read highlights, noise breakdown, and time saved
Result: Stay informed in 3 minutes instead of scrolling for an hour.
How we built it
Architecture:
Chrome Tab → Tab Capture (1 FPS) → Offscreen Document → WebSocket
↓
Gemini Live API (bidiGenerateContent)
↓
Audio Transcription → Scroll Commands → Content Script
↓
Adaptive Scroll State Machine (PAUSE/SLOW/FAST/UP)
Tech Stack:
- Gemini Live API (
bidiGenerateContent) for real-time video analysis - Chrome Manifest V3 (service worker, offscreen document, content script)
- Tab Capture API → JPEG frames at 1 FPS
- Google GenAI SDK (
@google/genai) - generateContent for digest generation
Key Innovation: We parse Gemini's natural language observations ("This looks interesting, let's pause") into scroll commands, creating a true autonomous agent that acts on your behalf.
Challenges we ran into
1. Real-time video streaming latency — Solved by reducing to 1 FPS (640x360, 70% JPEG quality), cutting latency from 3-4s to <1s
2. Parsing natural language into commands — Built keyword-based parser with priority levels and auto-resume timeouts
3. State management across extension components — Background worker as single source of truth with heartbeat pings
4. Platform bot detection — Humanized scroll patterns with randomized speeds, variable intervals, and micro-scrolls
5. Token costs — Optimized to 6-8K tokens per 3-min session (~$0.0009) through frame skipping and prompt compression
Accomplishments that we're proud of
🏆 Genuinely autonomous — Takes physical action (scrolling) based on real-time vision, not just responding to prompts
🎥 <1 second latency — True real-time multimodal interaction using Gemini Live's audio transcription
⚡ No backend required — Pure Chrome extension using Google's public APIs
🎯 Measurable impact — Beta testers save 7-12 minutes per 3-minute session (3-4x compression), filter 75-85% noise
🛡️ Privacy-first — Everything local, no analytics, full transparency
💡 Novel use case — First autonomous browser control using Gemini Live API
What we learned
Technical:
- 1 FPS is the sweet spot for scroll analysis (balances cost and completeness)
- Natural language is a valid command interface—better than forcing JSON structure
outputAudioTranscriptionenables real-time agent narration with <1s latency- Manifest V3 requires careful message-passing architecture
Design:
- Transparency builds trust—showing every PAUSE/SKIP decision made users confident
- "Time saved" metric resonates more than technical stats
- Casual digest language ("Found some solid AI updates") beats formal summaries
Philosophy:
- AI agents should augment human intention, not replace agency
- The future isn't chatbots—it's invisible agents handling tedious digital labor
- Gemini Live API makes autonomous workflows possible today
What's next for Unhooked
Short-term:
- Chrome Web Store launch with free/paid tiers
- Voice digests using Gemini's audio output
- User feedback loop to improve relevance
Medium-term:
- Cross-platform digest (Twitter + Instagram + LinkedIn + Reddit in one)
- Smart notifications for truly urgent content
- Learning preferences over time
Long-term vision:
- Universal attention filter for email, meetings, Slack, YouTube
- Agent personas (Minimalist Monk, News Junkie, Social Butterfly)
- B2B version for competitive intelligence
The goal: Compress the internet's information overload into the precise 10 minutes that actually matters to you.
Unhooked is just the beginning. The future is agentic—and we're building it.
Built With
- chrome
- gemini
- javascript
- live-api

Log in or sign up for Devpost to join the conversation.