Inspiration
Honestly this idea came from personal frustration. We have both shipped features that looked absolutely fine during development, only to get a message from someone saying a button was broken on their phone, or a screen reader could not navigate the page, or the text was overlapping on tablet. You spend hours manually clicking through pages, resizing browsers, checking contrast ratios, running Lighthouse and you still miss things. We thought there had to be a better way. What if an AI could just look at your website the way a real person does and tell you everything that is wrong?
What it does
QA Ghost is a visual UI testing agent that actually sees your website. You paste a URL and Gemini takes over. It opens your site, looks at the screen through screenshots, decides where to click and what to explore entirely on its own, and then tells you everything it found. It catches layout bugs, checks whether your site is accessible to people using screen readers, measures the performance metrics that affect your Google rankings, and checks how everything looks on a phone versus a desktop. It records the whole session as a video so you can watch exactly what the agent did. And when it finds bugs it does not just report them. It writes JavaScript fixes and applies them live, then shows you a before and after comparison down to the pixel. A Gemini voice summary narrates the full report at the end.
How we built it
We built the backend with FastAPI running on Google Cloud Run. The core of the project is a Google ADK agent that orchestrates the entire QA pipeline through four tools it can call: one for navigating and capturing screenshots, one for sending those screenshots to Gemini Vision for bug analysis, one for generating and injecting self-healing fixes, and one for producing the final voice summary. Playwright drives the actual browser, handles all the navigation, and records the session. Gemini 2.5 Flash does the heavy lifting on the visual analysis, writes the JavaScript fixes, and narrates the final report using the Kore TTS voice. axe-core runs the accessibility audit injected directly into the page. Results stream back to the frontend in real time using Server-Sent Events so you can watch the scan progress live.
Challenges we ran into
The trickiest part was getting Gemini to actually navigate instead of just scrolling the page forever. The agent kept deciding to scroll rather than click into new pages which meant we were scanning the same page repeatedly. We rewrote the agent prompt to make clicking the explicit priority on the first step and added a fallback that automatically picks up an internal link and visits it if the agent ends up scanning zero pages.
The other big challenge was Cloud Run timing out mid scan. A full QA scan takes a couple of minutes and Cloud Run's default HTTP connection was dropping before the scan finished. We switched the entire scan endpoint to Server-Sent Events with keepalive pings every five seconds which keeps the connection alive for as long as the scan needs.
Getting the session recording to actually work in the browser was also painful. Playwright records in WebM format which does not have duration metadata so the video player could not seek through it. We had to convert every recording to MP4 using imageio-ffmpeg's bundled binary inside the container.
Accomplishments that we're proud of
We are genuinely proud that the agent navigates and makes decisions completely on its own. Watching Gemini look at a screenshot, decide to click the Travel category, navigate to that page, scan it for bugs, then move on to Mystery and Historical Fiction without any hardcoded instructions felt like a real moment. We are also proud of the self-healing engine. The fact that the agent finds a bug, writes a fix, injects it into the live page, and then captures a pixel-level before and after comparison is something we did not expect to work as well as it does. Building a product that combines Gemini Vision, Google ADK, Playwright, axe-core, Core Web Vitals, TTS, session recording, and PDF export into one seamless scan is something we are really happy with.
What we learned
The biggest thing we learned is that building an agent that reasons from what it sees is a completely different challenge from writing normal automation code. With Playwright scripts you tell the browser exactly what to do. With an agentic system powered by Gemini Vision you describe what you want and the agent figures out how to do it. Sometimes it surprises you in a good way. Sometimes it decides to scroll fifteen times when you wanted it to click. You have to design for unpredictability and build fallbacks for everything. We also learned that multimodal AI genuinely opens up a category of testing that did not exist before. You do not need to write selectors or maintain test files. The agent just looks at the page and tells you what a real user would notice.
What's next for QA Ghost
The next big step for us is making the entire experience feel more like a conversation than a report. Right now QA Ghost finds the bugs and tells you about them. But we want to take it further by adding a voice interface powered by Gemini Live where you can actually talk to the agent while it is scanning. Imagine watching the session recording and being able to say out loud "fix the contrast issue on the header" or "ignore the mobile layout warnings" and having the agent respond and apply those changes in real time. No clicking through menus, no editing config files, just talking to your QA engineer like you would talk to a colleague sitting next to you. We also want to let developers give voice instructions to shape how the fixes are generated, so instead of accepting whatever JavaScript the agent writes you could say "use CSS variables instead of inline styles" or "make this fix work for dark mode too" and the agent would adjust accordingly. The goal is to make QA Ghost feel less like a tool you run and more like a smart teammate you can have a back and forth with, one that sees your UI, finds the problems, and works through the solutions with you in a natural conversation.
Built With
- axe-core
- css
- fastapi
- gemini-2.5-flash
- google-adk
- google-cloud-run
- google-genai-sdk
- html
- imageio-ffmpeg
- javascript
- pillow
- playwright
- python
- server-sent-events
- uvicorn
Log in or sign up for Devpost to join the conversation.