Inspiration

As a developer who had recently ventured into Chrome extension development, I was immediately intrigued when Chrome announced their Built-in AI capabilities. Having already created two basic extensions, I saw this as an opportunity to push my boundaries and create something more ambitious. The idea of Gemini Spark was born from a simple question: "How can we make Google Workspace more powerful with AI?"

What it does

Gemini Spark is a Chrome extension that enhances Google Workspace applications using Chrome's native AI capabilities. It's designed to be a comprehensive productivity tool that seamlessly integrates with:

  • Gmail Integration: Smart compose, Smart Reply, Auto Reply, Smart Summarize, Smart Translate
  • Google Docs: AI Writer, Paraphrasing Tools, Document Insights, Translation
  • Google Sheets: Generate Formula and Formula Explainer
  • Google Slides: Slide Content Generator and Presentation Outline Creator
  • AI Chat: Generate text, write stories, and chat with AI
  • Other Tools: Tab summarizer, right-click summarizer

How I Built It

  • I started by thoroughly studying Chrome's Built-in AI Early Preview Program documentation to understand the capabilities and limitations of the available APIs. My development journey began with implementing core features using Chrome's AI APIs, focusing mainly on the Gemini Nano Model for on-device processing.

  • For Gmail integration, I developed smart compose and automated response features using the various API , which required careful handling of real-time streaming responses. I implemented specific URL detection functions to ensure the right-click popup menu only appears when users are on Gmail's website and have text highlighted.

  • In Google Docs, I implemented paraphrasing tools and real-time translation features. This required creating a modular architecture that could handle different types of text transformations while maintaining document formatting. The translation feature was particularly challenging, requiring specific language pack implementations for English ↔ Spanish and English ↔ Japanese pairs.

  • I used Rollup.js for bundling our JavaScript modules and @mozilla/readability for processing text content. Also implemented a robust error handling throughout the application to manage experimental API behaviors and ensure a smooth user experience.

  • The development process was iterative, with continuous testing and refinement

Challenges I ran into

  1. Initial Setup and Configuration Challenges

  2. Language Model Challenges

    • Attempted to implement Chinese and Spanish responses but discovered these weren't fully supported
    • Learned through the developer group that the model was recently updated to focus on English response for better security and quality
    • Had to limit our scope to supported language pairs (English ↔ Spanish and English ↔ Japanese) after understanding the current limitations
  3. Language API Implementation Issues

    • Initially attempted to implement translation across multiple language pairs without reading the documentation thoroughly
    • Discovered we needed to enable specific language pack limits
    • Finally implemented a working solution with default English ↔ Spanish and English ↔ Japanese language pairs
    • Learned the importance of thorough documentation review before implementation
  4. Untested Language Error

    • Encountered "model attempted to output text in an untested language" error frequently
    • One example is that text length affected the error (12-word summaries failed, while 60+ words worked)
    • AppScript code generation particularly suffered from this issue
    • Discovered through the community that this is a known limitation being actively addressed
    • Had to remove certain features from the final submission due to Gemini Nano's current limitations with code generation, but the code is kept for future implementation.
    • Observed that the model sometimes hallucinated while generating code, producing incorrect results
  5. Model Execution Issues

    • Encountered "The model was available but there was not an execution config available" error
    • Resolved by performing a clean installation of Chrome Canary
    • Learned the importance of maintaining a clean development environment
  6. API Availability Constraints

    • Initially planned to use the Writer-ReWriter API
    • Discovered through announcements that these APIs are not yet available for direct use but will be available later.
    • Had to pivot and modify our feature set accordingly

Accomplishments I'm proud of

  1. Successful API Integration: Successfully integrated multiple Chrome AI APIs into a cohesive product
  2. User Experience: Created an intuitive interface for complex AI features
  3. Performance Optimization: Achieved fast response times using on-device processing
  4. Error Handling: Developed robust error handling for experimental APIs
  5. Community Contribution: Learned a lot from the Chrome AI developer community

What I learned

  1. Technical Skills

    • Deep understanding of Chrome's AI APIs
    • Experience with real-time streaming responses
    • Expertise in content script injection and management
    • Advanced Chrome extension architecture design
    • Error handling for machine learning models
  2. Best Practices

    • Documentation-first approach
    • Community engagement importance
    • Feature prioritization strategies
    • Robust error handling
    • User-centric design

What's next for Gemini Spark: AI-Powered Chrome Extension with 15+ Features

  1. Enhanced Language Support

    • Expand to more language pairs
    • Implement robust translation features
  2. New Features

    • Integration with more Google Workspace applications
    • Advanced code generation capabilities
    • better fine-tuning options
  3. Technical Improvements

    • Performance optimization for larger documents and text
    • Offline capability enhancements
    • Better error handling and recovery
  4. User Experience

    • More customization options
    • Enhanced accessibility features
  5. RAG Implementation

    • Retrieval-Augmented Generation for accurate responses
    • Custom knowledge bases support
    • Document-specific context awareness
    • Google Drive integration
  6. Advanced Features

    • Context-aware template and chat generation
    • Advanced formatting options for generated content

Built With

  • chrome-context-menus-api
  • chrome-generative-content-api
  • chrome-native-messaging-api
  • chrome-scripting-api
  • chrome-sidepanel-api
  • chrome-storage-api
  • chrome-tabs-api
  • css
  • gemini
  • gemininano
  • gmail
  • html
  • javascript
  • language-api
  • mozilla/readability
  • node.js
  • promptapi
  • rollup.js
  • summarization-api
Share this project:

Updates