About the Project
Inspiration: AI Browsers
I was very much inspired by Perplexity's Comet and OpenAI's Atlas because they are going to transform how we use the web and what the concept of a search engine means. They represent a paradigm shift from simple searching to intelligent contextual assistance.
I did notice a gap, however. Though Google does offer AI search modes, the user experience still involves tab-switching, one tab for the AI and another tab for the content being accessed.
The Vision: Work Seamlessly in the Browser
What if AI could be built directly into any webpage? This was the vision for creating Cloudey, a Chrome extension that enables contextual comprehension, voice interaction, and the ability to take autonomous actions.
The Technical Journey
At first, an AI assistant was not really the goal. Instead, I envisioned a small pill-shaped extension that would sit and contextually analyze tabs to make references. However, when expanding the idea, this approach became limiting. That is when the idea of using the side panel sparked, providing better structured space and enabling a full chat interface.
The biggest challenge was truly building an agentic system rather than a typical chatbot. I learned that real AI assistance requires context extraction, intent recognition, action planning, and autonomous execution.
Built using the Web Speech API for real-time voice recognition and the Chrome Extension APIs to utilize native browser capabilities. Getting microphone permissions to work reliably was a significant challenge, requiring careful handling of permission flows and audio input management. Building systems that can control the web autonomously also presented unique technical hurdles.
For UI design, I really wanted it to look good, something people would want to use. I drew heavy inspiration from components at 21st.dev and adapted them into regular JavaScript-compatible components.
The Result
A functioning AI assistant that can help you with what you're doing. Its core is powered by Gemini, which orchestrates various API calls including the Translate API, Write and Rewrite API as seen in the agentic features, and Web Speech API for the voice interaction capabilities.
Built With
- chrome
- chrome-devtools
- chrome-extension-api
- css
- extension-manifest-v3
- gemini-api
- gemini-nano
- git
- html
- javascript
- translate-api
- web-speech-api