About Mina
I've always wanted an AI assistant that truly lives in my browser and understands what I'm doing. That’s why I built Mina. It's an AI assistant built right into the Chrome Side Panel, designed to be the perfect companion for browsing. You can chat, get instant summaries of long articles, rewrite text you've highlighted, or even get an analysis of what's on your screen.
Inspiration
Honestly, my main inspiration was the challenge itself! But I also just really wanted this tool for my own use. I'm always drowning in tabs and articles, and I wanted something that could instantly tell me what's important on a page without me having to read the whole thing.
My "big idea" was for it to not just read text, but also see the screen. That’s where the screen analysis feature came from—letting the AI analyze charts, images, or UI designs directly. I wanted to build something that felt like a native part of Chrome, not just another popup.
How I Built It
Mina is a Manifest V3 extension, and I really dug deep into the new Chrome APIs to make it work.
- Side Panel API: This was the obvious choice for the UI. It makes Mina feel truly integrated, like it belongs right next to the webpage.
- Gemini API (Streaming): All the AI power comes from the Gemini API (I included both Flash and Pro models). I made sure to implement streaming, so the response types out word-by-word. It just feels so much more responsive and "live" that way.
- Context Menus API: A classic, but so useful. You can just right-click any text on a page to instantly 'Summarize' or 'Rewrite' it. This opens Mina with the selected text ready to go.
- Scripting API (for Smart Summaries): This was a cool part. Instead of just grabbing all the messy text from a page (like ads and menus), I use the Scripting API to inject a script that intelligently finds the real content (like the main
<article>or[role="main"]). This gives the AI much cleaner text, so the summaries are way more accurate. - Offscreen API & Screen Capture: This was the most complex feature, but I'm really proud of it. When you press 'Ctrl' and drag, it uses
chrome.tabs.captureVisibleTabto get an image, then sends that image data and the crop coordinates to anOffscreendocument. A small script there draws it on a canvas, crops it, and sends the finaljpegto the Gemini vision model. It took a while to get right! - Storage API: This is used to safely store the user's Gemini API key and their preferences (like dark/light mode).
Challenges I Faced
My biggest headache? The chat UI animations. By far.
I had this specific vision: when Leo (now Mina) is thinking, an icon and a "thinking" animation (...) should appear perfectly aligned. Then, as soon as the text response starts, the "thinking" animation must smoothly fade out while the actual text smoothly fades in.
Getting this transition to be perfectly stable, without any "jumping" or "glitching" as the content changed, was incredibly frustrating. I tried align-items: center (a disaster, everything jumped around), I tried position: absolute (which broke button clicks), and so many margin/padding hacks.
The final, stable solution was to go back to basics:
- Use
align-items: flex-startto align everything to the top. - Give both the icon (
.message-icon) and the message bubble (.message-bubble) the exact samemargin-top: 6px. This ensures they always start at the same pixel. - Give the "thinking" animation (
.thinking-animation) amin-heightthat matches the text's line height. - Use CSS transitions on
opacityto hide the text when.streamingis active, and hide the animation when.streamingis not active.
It finally works perfectly, and that smooth transition makes the whole experience feel professional.
What I Learned
I learned a ton. This was my first real deep dive into the Side Panel and Offscreen APIs, and they are way more powerful than I realized.
I also truly learned that an AI is only as good as your instructions. My 'Summarize Page' feature was just "okay" at first. It only became great after I wrote a detailed prompt telling the AI to act as a "chief analyst" and to structure its output as a formal "Manager's Summary." Prompt engineering is everything.
And finally, I learned not to underestimate CSS! A responsive, stable UI that feels good to use is just as important as the AI logic itself. I'm really proud of how Mina turned out, and winning an award for it would be incredible.
Log in or sign up for Devpost to join the conversation.