Loading Animation
groups that were created
tabs grouped

🤖 AI Doc Tab Grouper: Innovating Tab Management with Gemini

About the Project

The AI Doc Tab Grouper is a Google Chrome extension designed to combat digital clutter by intelligently grouping open document-related tabs (Google Docs, Notion, GitHub, Miro, etc.). Using the powerful Gemini API on a Cloud Run backend, the extension analyzes the content and context of tabs to automatically create logical, project-based tab groups. This moves tab management from a tedious manual task to a seamless, automated process, allowing users to instantly regain focus and organization.

💡 Inspiration

The project was inspired by a common frustration: having dozens of open tabs, many of which are documents or project-related pages, making it impossible to switch context or even find the correct tab. Existing tab management tools rely on simple domain or title matching. We realized that true organization requires understanding the content and *context*. The rise of large language models (LLMs) like Gemini provided the perfect opportunity to build a solution that could read snippets of multiple documents, grasp the underlying themes, and propose intelligent groupings.

🏗️ How We Built It (The Architecture)

Our solution employs a three-part architecture:

Google Chrome Service Worker (service-worker.js):
- Manages the entire grouping lifecycle, triggered by a click on the extension icon.
- Identifies relevant document tabs based on a predefined list of URL patterns (e.g., *://docs.google.com/*, *://*.atlassian.net/*).
- Dynamically injects the content script (content-script.js) into all relevant tabs.
- Implements a 15-second timeout and robust state tracking for each tab (e.g., INJECTED_WAITING_FOR_DATA, DATA_RECEIVED_READY_TO_GROUP).
- Manages user feedback via the extension badge (showing counts like '7' and status like 'AI...').
Google Chrome Content Script (content-script.js):
- Runs within the context of the webpage.
- Uses advanced DOM selectors (e.g., [role="textbox"], .kix-zoomservice) to safely and accurately extract a text snippet (up to 500 characters) from the active document.
- Sends the extracted data (title, URL, snippet) back to the Service Worker via chrome.runtime.sendMessage.
Cloud Run Backend (Python/Flask with Gemini API):
- Receives the aggregated tab data from the Service Worker.
- Feeds the list of tab objects into the Gemini 2.5 Flash model with a system prompt defining its role as an "expert AI Tab Grouping Agent."
- Utilizes Structured Output (JSON Schema) to guarantee the model returns a predictable, parsable format: a list of groups, each with a group_title, a rationale, and a list of tab_ids.
- The Service Worker then processes this structured JSON to create the new tab groups in Chrome.

🎓 What We Learned

Asynchronous Communication in Chrome Extensions: We mastered the crucial pattern of initiating an action in the Service Worker, injecting a script into multiple tabs, waiting for all asynchronous responses (or timeouts), and then aggregating the data back into the Service Worker to kick off the next stage (the API call).
The Power of Structured Output: Using Gemini's ability to enforce a JSON schema was essential. It transformed an unreliable text-generation task into a highly reliable data-generation task, ensuring the final grouping logic was robust.
Frontend Error Handling is Key: The most common issues were race conditions and unhandled errors. Implementing features like a lock (isProcessing) and a centralized, structured logging system (Logger) were vital for stability.

🚧 Project Story: Core Architectural Challenges

This section details the most critical technical blockers encountered during development and the engineering solutions implemented to ensure a reliable, enterprise-grade user experience.

Challenge	Impact on Logs/Code	Solution Implemented
1. Suspended Tabs Block Data Collection	Background document tabs (like Google Docs) often enter a suspended (sleeping) state to conserve memory. Attempting to inject the content script via `chrome.scripting.executeScript` into a sleeping tab would indefinitely freeze the Service Worker, causing the entire app to stall.	The `pokingTabsForExecution()` Utility. We implemented a non-blocking `chrome.tabs.query` check before injection. This function effectively "pokes" the tab awake without freezing the main thread, ensuring the subsequent script injection succeeds and data is collected.
2. LLM Output Reliability	Generative models often return non-JSON data, incorrectly formatted JSON, or highly variable structure, which breaks the programmatic tab grouping logic. The `service-worker.js` would fail on `JSON.parse` errors or receive useless data.	Mandatory Structured Output (JSON Schema). In the Flask backend (`app.py`), we enforce a strict `response_schema` using Gemini's `GenerationConfig`. This forces the AI to return a clean, predictable JSON array, making the app's output reliable enough for programmatic use.
3. Unreliable Snippet Extraction	Different document hosts (Google Docs, Notion, SharePoint) use unique and frequently updated HTML structures. Logs showed `CONTENT_TOO_SHORT` errors for many tabs, crippling the AI's ability to determine context.	Cascading DOM Scraper. In `content-script.js`, we implemented a tiered selector strategy. This prioritizes robust accessibility roles (`[role="textbox"]`) before falling back to less reliable vendor classes, guaranteeing a high-quality 500-character snippet for the LLM.

By overcoming these challenges, we transformed a promising idea into a resilient, production-ready AI solution that fundamentally improves the user's tab experience.

Built With

apis
chrome
cors
css
dom
flask
gcp
gemini
google-cloud-run
json
python
schema
sdk
vertex

Updates

Jason Menjivar started this project — Oct 17, 2025 02:35 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.