Architecture of PriceHawk

🦅 PriceHawk — How We Built an AI That Shops Like a Human

Inspiration

It started with a frustrating Saturday afternoon.

I wanted to buy a pair of Sony headphones. I opened Amazon, noted the price, then opened Flipkart in another tab, then Croma in a third. By the time I had compared all three, the price on Amazon had changed. I had wasted 20 minutes doing something a computer should do in seconds.

But every existing price comparison tool I found had the same fatal flaw — they relied on official retailer APIs. Amazon's API is restricted. Flipkart's is nearly impossible to access. Croma has none. So these tools only showed a fraction of available listings, often outdated by hours.

That's when the idea hit: what if instead of asking stores for their data, we just... looked at their websites? Like a human does?

With Gemini 2.0 Flash's multimodal vision capabilities, this wasn't just possible — it was the perfect use case for the UI Navigator category. An agent that truly sees the web.

What I Learned

Building PriceHawk taught me more in a few days than weeks of regular development. The most important lessons:

1. Vision AI is genuinely resilient

When Amazon updated its CSS class names mid-development (which they do constantly), my DOM selectors broke instantly. But Gemini's vision kept working — it doesn't care what the HTML looks like, it just reads the pixels. This was the moment I truly understood why multimodal AI changes everything about web automation.

2. The math of parallel scraping

Running scrapers sequentially vs. in parallel makes a dramatic difference. If each site takes an average of $t$ seconds to load:

$$T_{\text{sequential}} = \sum_{i=1}^{n} t_i = t_1 + t_2 + t_3$$

$$T_{\text{parallel}} = \max(t_1, t_2, t_3)$$

With $t_1 = 12s$, $t_2 = 10s$, $t_3 = 8s$:

$$T_{\text{sequential}} = 30s \quad \text{vs} \quad T_{\text{parallel}} = 12s$$

That's a 2.5× speedup just from Promise.allSettled(). The user experience difference is enormous.

3. Bot detection is an arms race

Every major e-commerce site runs sophisticated headless browser detection. I learned about browser fingerprinting, canvas fingerprinting, and the navigator.webdriver flag that betrays automated sessions. The fix — --disable-blink-features=AutomationControlled combined with realistic headers and randomized delays — felt like a proper cat-and-mouse game.

4. Gemini quota math matters

The free tier gives 1,500 requests per day. With 3 sites × 2 screenshots each = 6 Gemini calls per search, I had a budget of:

$$\text{Max searches/day} = \left\lfloor \frac{1500}{6} \right\rfloor = 250 \text{ searches}$$

This forced me to implement MD5-based screenshot caching — if the same screenshot hash was seen before, skip the Gemini call entirely. Cache hit rate in testing reached ~40%, effectively pushing the budget to:

$$\text{Effective budget} = \frac{1500}{6 \times (1 - 0.4)} \approx 416 \text{ searches/day}$$

5. India-first thinking

Deployed from India, US sites like eBay immediately served CAPTCHA pages. This wasn't a bug — it was a feature discovery. Indian stores (Flipkart, Croma) are far more accessible, show INR pricing that's locally relevant, and have less aggressive bot detection. The best product isn't always the most obvious one to build.

How I Built It

The Architecture

PriceHawk has three layers working in concert:

User Query
    ↓
React Frontend  ──────────────────────────────────  Cloud Run
    ↓ HTTPS POST
Express API     ──────────────────────────────────  Cloud Run
    ↓ spawns
Playwright Agent  opens 3 real browser tabs in parallel
    ↓
Screenshots  ──→  Gemini 2.0 Flash (vision)
    ↓                      ↓
DOM extraction         AI analysis
    ↓                      ↓
         Merge & deduplicate
                ↓
         Firestore (price history)
                ↓
         Real-time polling → Frontend

The Frontend

The UI went through three complete redesigns. The final version — a "Mission Control" aesthetic with deep teal glassmorphism, animated smoke background, particle constellation canvas, and Playfair Display serif typography — was designed to feel like a premium intelligence tool rather than a utility app.

Every icon is a hand-crafted inline SVG. No icon library. No external dependencies. A crosshair for the logo (precision targeting), a CPU chip for AI (actual intelligence), a brain for analysis — each icon was chosen to reinforce the product's identity.

Challenges

Challenge 1: The Selector Graveyard

Amazon's product title is inside a <span> inside an <a> inside an <h2> — but the exact class names change by layout version, A/B test variant, and geographic region. I went through 8 selector iterations before landing on the solution: loop through every <span> inside <h2> and pick the longest text. The longest span is always the full product title, regardless of layout.

Challenge 2: Price Parsing Gone Wrong

Early versions extracted prices like ₹36,939.48 for an iPhone — obviously wrong. The issue: the scraper was grabbing a cumulative price field (total sold × price) rather than the listing price. Fixed by targeting .a-price .a-offscreen specifically, which is Amazon's accessibility-hidden price element that always contains exactly the display price.

Challenge 3: The Quota Cliff

During demo testing, I hit the Gemini free tier limit mid-demo. The app silently returned 0 results with no explanation. This led to building the three-tier fallback system: Gemini 2.0 Flash → Gemini 1.5 Flash → JavaScript analysis. Now the app always delivers something useful, and shows a small badge indicating whether the analysis was "AI-powered" or "basic" — honest about its capabilities.

Challenge 4: Cloud Run Cold Starts

Playwright's Chromium takes 3–4 seconds to initialize. Combined with Cloud Run cold starts, first searches could take 45+ seconds. The solution was twofold: keeping a warm instance via scheduled health check pings, and redesigning the loading screen with a 4-stage animated pipeline that makes the wait feel intentional and exciting rather than broken.

What's Next

Price alerts — notify users when a product drops below their target price
Historical price intelligence — "this iPhone is 12% cheaper than its 30-day average"
Browser extension — highlight the cheapest retailer while you're already browsing
Voice search — Gemini Live API integration for hands-free price hunting

Closing Thought

The most valuable thing PriceHawk taught me is that the best AI applications don't replace human tasks — they do the boring parts so humans can do the interesting parts.

Nobody enjoys opening five tabs, comparing numbers, and second-guessing whether a price is good. That's mechanical. That's exactly what a vision AI agent should do.

The interesting part — deciding which product actually fits your life — that's still yours.

Built with Gemini 2.0 Flash · Google Cloud Run · Firestore · Playwright · React