LandingPageRoasterAI

Inspiration

Landing pages fail quietly.
Founders, marketers, and builders spend hours tweaking copy, colors, and layouts—but still don’t know why users bounce or conversions stall. Most tools show metrics, not reasons.

We wanted to build something that behaves like a senior SEO and CRO expert sitting beside you, opening your landing page, seeing what users see, and telling you—clearly and honestly—what’s going wrong.

With the rise of multimodal AI, we realized Gemini could do more than generate text. It could look at a page, reason about it, and explain its impact on user behavior. That became the core idea behind LandingPageRoasterAI.

What it does

LandingPageRoasterAI takes a landing page URL and:

Opens the page in a real browser
Captures the visible content and layout
Analyzes the page using Gemini’s multimodal reasoning
Produces an expert-level audit with:
- A bold, attention-grabbing roast
- A realistic conversion score
- Clear diagnosis of what’s hurting SEO and conversions
- Actionable guidance on how to fix it

Instead of generic advice, the feedback is grounded in what’s actually visible on the page—copy, hierarchy, CTAs, trust signals, and UX clarity.

How we built it

We designed the system as a grounded, agent-like pipeline:

The frontend collects a landing page URL
The backend uses Playwright to load the page like a real user
We extract:
- Visible page text
- A full-page screenshot
This real data is passed to Gemini for multimodal analysis
Gemini reasons over the content and returns a structured audit and roast
The frontend renders the results in a clear, readable format

By separating observation (what exists on the page) from judgment (why it fails or succeeds), we ensured the AI’s output is grounded, explainable, and consistent.

Challenges we ran into

One of the biggest challenges was realizing that AI cannot meaningfully judge a website from a URL alone. Without real page data, the model naturally converges to average, generic outputs.

We also faced challenges enforcing strict output formats from a large language model while still allowing expressive, human-like feedback. Solving this required carefully designing prompt contracts and output schemas so Gemini could reason freely without breaking the system.

Balancing multimodal input, performance, and reliability within a hackathon timeframe was another key challenge.