Inspiration

The inspiration likely struck from a desire to take the groove of a track or the intensity of a broadcast and using agentic AI to give it a living, breathing visual pulse. It's about taking the invisible soul of the audio and putting it right on the screen for the listener to experience in a whole new way.

What it does

This app is a seamless integration between a Chrome extension and a cloud-based AI agent that transforms browser audio into a live visual experience. It captures sound from your active tab—whether that's music, news, or videos—and sends it to the AI, which analyzes the mood and style to generate original artwork. This dynamic art is then continuously displayed and refreshed in a floating Picture-in-Picture window right on your screen.

How we built it

Taking the past experience of building several Chrome extensions already and meshing this with the Google ADK and Gemini CLI, we were able to prototype an app quickly. From there, the process of iterating via tuning features and prompt engineering resulted in a polished app experience.

Challenges we ran into

We ran into the Google Acceptable Usage Policy guardrails many times when attempting to generate imagery with Imagen 4. We had to engineer the correct prompt to stay inside those limitations and this took many attempts before we got it right.

Accomplishments that we're proud of

A very well polished experience that works out of the box without any settings or configuration needed by the user once the Chrome Extension is installed!

What we learned

This was our first application that we developed using the ADK. So, it was a learning experience in that regard.

What's next for Agentleman

Well, if this application is well received, we plan to iterate and add additional features. For instance: settings that the user can tweak, the ability to save these settings, and a way for the user to change the style of the generated art.

Share this project:

Updates