Inspiration

Every website goes through quality checks before launch — performance via Lighthouse, SEO via Ahrefs, security via automated scanners. But when it comes to design validation — checking if CTAs actually stand out, if the theme is consistent across pages, or if the site matches the client's original vision — the industry still relies entirely on human reviewers manually scrolling through pages and eyeballing elements.

This process is slow, subjective, and doesn't scale. We asked ourselves: if AI can audit performance, why can't it audit design? That's how VibeAudit was born — an AI agent that sees your website the way a user does and gives you structured, actionable design feedback.

What it does

VibeAudit is an AI-powered website auditor that evaluates the non-functional, design-facing aspects of any website:

  • CTA Effectiveness — Are buttons visible? Is the copy compelling? Is contrast sufficient?
  • Theme Consistency — Do colors, fonts, and spacing stay coherent across pages?
  • Intent Alignment — Does the website actually match what the owner intended it to look and feel like?

The user provides a URL, sets how many pages to crawl, and describes their design intent in natural language. The AI agent then crawls the site using BFS, scrolls through each page viewport-by-viewport, captures screenshots, and generates a scored report with specific issues and recommendations — each tied to the exact screenshot where the problem was found.

How we built it

  • Frontend: React 19 + Vite, with Socket.io for real-time streaming of the agent's activity during analysis. The report page uses html2pdf.js for one-click PDF export.
  • Backend: Python Flask with Flask-SocketIO for WebSocket communication. The crawling engine uses a BFS algorithm with networkx for page discovery.
  • AI Layer: Google Gemini models power the core analysis. Gemini 2.5 Flash Lite handles intent parsing — converting natural language into structured audit config. Gemini 3 Pro Preview drives the browser agent via the browser-use library + Playwright, performing viewport-by-viewport visual analysis.
  • Architecture: The agent scrolls through each page section by section (simulating real user behavior), captures screenshots at each viewport, identifies CTAs, evaluates theme elements, and maps every issue to the exact viewport where it was detected.

Challenges we faced

  • Real-time streaming: Getting the AI agent's thoughts, actions, and progress to stream live to the frontend via WebSocket required careful event handling and state management on both ends.
  • Viewport-based analysis: Making the agent scroll systematically and associate issues with specific screenshots was non-trivial. We had to design a viewport numbering system and ensure screenshots mapped correctly to findings.
  • Intent parsing reliability: Users describe their website vision in wildly different ways. Getting Gemini to reliably extract structured config (website type, tone, audience, theme) from messy natural language took multiple iterations of prompt engineering and schema design.
  • BFS crawling scope control: Preventing the crawler from escaping to external domains, handling redirects, and respecting the max page limit while still discovering the most relevant pages required careful queue management.

What we learned

  • How to build AI agent pipelines that combine browser automation with structured LLM output
  • Real-time communication patterns with WebSocket (Socket.io) between Python and React
  • Prompt engineering for structured JSON output from Gemini models using schema constraints
  • The gap between automated technical audits and subjective design evaluation — and how AI is finally closing it

Built With

Share this project:

Updates