Inspiration
The primary inspiration for this project was the promise of truly on-device, private AI within the Chrome browser, driven by Gemini Nano. I realized that users constantly spend time disrupting their workflow—copying text, switching to a chat app, asking for a summary, then copying the result back. My goal was to eliminate this friction entirely, embedding the full power of an LLM directly into the user's primary interface: the right-click context menu. I aimed to create a tool that made complex content analysis feel instantaneous and native to the web browsing experience.
What it does
The Nano-Navigator: Contextual Content Adapter is a comprehensive Chrome Extension that transforms selected text into actionable intelligence using the Gemini API. Accessible via a single right-click on any webpage text selection, the extension provides five distinct, high-value functions:
- Summarize (Key Points): Condenses dense paragraphs into an easy-to-read bulleted list.
- Rewrite (Simplify): Adjusts complex text to a simpler reading level (e.g., 5th grade).
- Proofread & Correct: Fixes grammar, spelling, and style, listing the changes made.
- Translate to...: Translates the selected text to any user-specified language.
- Custom Prompt: Allows me to execute any specific generative request (e.g., "Write a tweet," or "Explain this concept using analogies") against the selected context.
How we built it
I architected the project as a standard Manifest V3 Chrome Extension.
- Architecture: I used a single service_worker.js file to manage all the background tasks, including API key storage (chrome.storage.local) and setting up the custom right-click menus (chrome.contextMenus).
- UI/Interaction: All user interface elements, such as the loading indicators, result modals, and input fields, are handled by functions I injected into the active webpage's DOM using chrome.scripting.executeScript. This design is crucial for passing the user's selected text and ensuring a non-disruptive, contextual experience.
- The Gemini Bridge: Each feature (Summarize, Rewrite, etc.) is mapped to a specific system instruction and sent to the Cloud Gemini API via authenticated fetch requests, ensuring reliable and robust performance that was not possible with the experimental local APIs.
Challenges we ran into
The largest and most defining challenge I faced was the unavailability of the on-device Gemini Nano APIs (chrome.ai) in my development environment, even after enabling the required flags in Chrome Canary. The API object was consistently reported as undefined in both the main console and the service worker console. This forced a complete pivot in my API strategy:
- Pivot from On-Device to Cloud: I switched from relying on the local Nano model to using the Gemini 2.5 Flash Cloud API. This ensured the project could be completed and delivered a high-quality user experience, fulfilling the core contest requirement to use a Gemini API.
- API Key Management: The pivot necessitated building a secure mechanism to prompt the user for their API key and store it locally (chrome.storage.local), adding a layer of configuration complexity that the local API was designed to avoid.
Accomplishments that we're proud of
I am most proud of the project's robustness and functional completeness. Despite the initial technical roadblock with Gemini Nano's local API, I successfully rebuilt the core logic to harness the full power of the Cloud Gemini API. The resulting extension seamlessly delivers five distinct functions, all integrated into the native browser context menu. The rapid, resilient transition demonstrates strong problem-solving and adaptive development.
What we learned
I learned a critical lesson about working with cutting-edge, experimental technology: always have a functional contingency plan. While the initial vision centered on privacy-focused on-device AI, I successfully translated the functional requirements of the Prompt, Summarizer, Rewriter, Proofreader, and Translator APIs into structured system instructions for the powerful Cloud Gemini model. This reinforced the idea that the power of Gemini lies not just in its deployment method, but in its ability to execute complex, targeted tasks reliably through prompt engineering.
What's next for Nano-Navigator: Contextual Content Adapter
- Hybrid Architecture Relaunch: As the local Gemini Nano APIs stabilize, I plan to implement a true hybrid system: defaulting to the fast, private, on-device Nano model when available, and automatically falling back to the cloud model for users whose devices do not support the local model, or for complex tasks that exceed Nano's context window.
- Side Panel UI: Replace the injected modals with a persistent Chrome Side Panel UI to provide a cleaner interface for API key status, result history, and the custom prompt input.
- Multimodal Integration: I will explore the integration of visual processing by analyzing images on the page (if a future version of the Gemini API allows access to DOM elements for context) for tasks like object recognition or image-based translation.
Built With
- css
- gemini-cloud-api
- html
- javascript
- promptapi
- proofreaderapi
- rewriterapi
- summarizerapi
- translatorapi
Log in or sign up for Devpost to join the conversation.