🤖 AI Doc Tab Grouper: Innovating Tab Management with Gemini
About the Project
The AI Doc Tab Grouper is a Google Chrome extension designed to combat digital clutter by intelligently grouping open document-related tabs (Google Docs, Notion, GitHub, Miro, etc.). Using the powerful Gemini API on a Cloud Run backend, the extension analyzes the content and context of tabs to automatically create logical, project-based tab groups. This moves tab management from a tedious manual task to a seamless, automated process, allowing users to instantly regain focus and organization.
💡 Inspiration
The project was inspired by a common frustration: having dozens of open tabs, many of which are documents or project-related pages, making it impossible to switch context or even find the correct tab. Existing tab management tools rely on simple domain or title matching. We realized that true organization requires understanding the content and *context*. The rise of large language models (LLMs) like Gemini provided the perfect opportunity to build a solution that could read snippets of multiple documents, grasp the underlying themes, and propose intelligent groupings.
🏗️ How We Built It (The Architecture)
Our solution employs a three-part architecture:
Google Chrome Service Worker (
service-worker.js):- Manages the entire grouping lifecycle, triggered by a click on the extension icon.
- Identifies relevant document tabs based on a predefined list of URL patterns (e.g.,
*://docs.google.com/*,*://*.atlassian.net/*). - Dynamically injects the content script (
content-script.js) into all relevant tabs. - Implements a 15-second timeout and robust state tracking for each tab (e.g.,
INJECTED_WAITING_FOR_DATA,DATA_RECEIVED_READY_TO_GROUP). - Manages user feedback via the extension badge (showing counts like '7' and status like 'AI...').
Google Chrome Content Script (
content-script.js):- Runs within the context of the webpage.
- Uses advanced DOM selectors (e.g.,
[role="textbox"],.kix-zoomservice) to safely and accurately extract a text snippet (up to 500 characters) from the active document. - Sends the extracted data (title, URL, snippet) back to the Service Worker via
chrome.runtime.sendMessage.
Cloud Run Backend (Python/Flask with Gemini API):
- Receives the aggregated tab data from the Service Worker.
- Feeds the list of tab objects into the Gemini 2.5 Flash model with a system prompt defining its role as an "expert AI Tab Grouping Agent."
- Utilizes Structured Output (JSON Schema) to guarantee the model returns a predictable, parsable format: a list of groups, each with a
group_title, arationale, and a list oftab_ids. - The Service Worker then processes this structured JSON to create the new tab groups in Chrome.
🎓 What We Learned
- Asynchronous Communication in Chrome Extensions: We mastered the crucial pattern of initiating an action in the Service Worker, injecting a script into multiple tabs, waiting for all asynchronous responses (or timeouts), and then aggregating the data back into the Service Worker to kick off the next stage (the API call).
- The Power of Structured Output: Using Gemini's ability to enforce a JSON schema was essential. It transformed an unreliable text-generation task into a highly reliable data-generation task, ensuring the final grouping logic was robust.
- Frontend Error Handling is Key: The most common issues were race conditions and unhandled errors. Implementing features like a lock (
isProcessing) and a centralized, structured logging system (Logger) were vital for stability.
🚧 Project Story: Core Architectural Challenges
This section details the most critical technical blockers encountered during development and the engineering solutions implemented to ensure a reliable, enterprise-grade user experience.
| Challenge | Impact on Logs/Code | Solution Implemented |
|---|---|---|
| 1. Suspended Tabs Block Data Collection | Background document tabs (like Google Docs) often enter a suspended (sleeping) state to conserve memory. Attempting to inject the content script via chrome.scripting.executeScript into a sleeping tab would indefinitely freeze the Service Worker, causing the entire app to stall. |
The pokingTabsForExecution() Utility. We implemented a non-blocking chrome.tabs.query check before injection. This function effectively "pokes" the tab awake without freezing the main thread, ensuring the subsequent script injection succeeds and data is collected. |
| 2. LLM Output Reliability | Generative models often return non-JSON data, incorrectly formatted JSON, or highly variable structure, which breaks the programmatic tab grouping logic. The service-worker.js would fail on JSON.parse errors or receive useless data. |
Mandatory Structured Output (JSON Schema). In the Flask backend (app.py), we enforce a strict response_schema using Gemini's GenerationConfig. This forces the AI to return a clean, predictable JSON array, making the app's output reliable enough for programmatic use. |
| 3. Unreliable Snippet Extraction | Different document hosts (Google Docs, Notion, SharePoint) use unique and frequently updated HTML structures. Logs showed CONTENT_TOO_SHORT errors for many tabs, crippling the AI's ability to determine context. |
Cascading DOM Scraper. In content-script.js, we implemented a tiered selector strategy. This prioritizes robust accessibility roles ([role="textbox"]) before falling back to less reliable vendor classes, guaranteeing a high-quality 500-character snippet for the LLM. |
By overcoming these challenges, we transformed a promising idea into a resilient, production-ready AI solution that fundamentally improves the user's tab experience.
Log in or sign up for Devpost to join the conversation.