About the project
Scoutlytics is the first Answer Engine Optimization (AEO) tool that moves beyond diagnosis into execution. Every existing AEO tool tells you what to fix. Scoutlytics fixes it — generating deployment-ready assets in minutes, not weeks and packaging them into a professional PDF implementation brief.
Inspiration
AI search engines — ChatGPT, Perplexity, Google AI Overviews, Claude, You.com — now answer queries directly, citing specific web pages. Traditional SEO no longer guarantees visibility. Brands need to appear in the AI-generated answer, not below it.
The tools that exist today (Profound, Scrunch AI, Peec AI, Omnia) all stop at dashboards and recommendations. Recommendations require a strategist to interpret, a developer to implement, and a content writer to execute. That cycle takes 3–6 weeks minimum.
We asked: what if a single tool could collapse that entire pipeline — from live citation intelligence to deployment-ready fixes — into one automated run? That's Scoutlytics.
What it does
Input a domain and topic. Receive everything you need to get cited by AI search engines:
- Citation Discovery — queries You.com Search API with 5 query variants to find which URLs AI engines currently cite for your topic
- Live Content Extraction — fetches full page content from every cited URL via You.com's livecrawl mode (real-time, not cached)
- Pattern Analysis — a custom TypeScript engine clusters cited pages into structural archetypes, identifies gaps, and calculates a Citation Probability Score (0–100)
- Asset Generation — You.com Express Agent produces deployment-ready fixes: rewritten page copy, valid JSON-LD schema, FAQ sections, and content blocks — tailored to the specific gaps identified
- PDF Implementation Brief — all assets are rendered into a professional multi-page PDF via Foxit PDF Services, ready to hand off to a developer or content team
Total time: under 90 seconds.
How we built it
Architecture: Next.js 16 App Router with a 5-stage asynchronous pipeline. Each stage is a discrete API route designed to complete within Vercel's 10-second serverless function timeout. The client orchestrates stages sequentially from a loading page with live progress visualization.
You.com APIs (4 capabilities):
- Search API — called with 5 query variants per analysis for citation discovery. This is the ground-truth layer; without it, we have no data on what AI engines actually cite.
- Search API (livecrawl) — extracts full live page content for every cited URL plus the user's domain. The structural signals parsed from these pages (headings, schema, FAQ, entities, word count) feed the pattern engine.
- Express Agent API — generates deployment-ready assets. Prompts are constructed dynamically from the gap analysis, so output is tailored to each run.
- Advanced Agent API — runs deep iterative research with streaming for complex topics, surfacing subtopic coverage and knowledge gaps.
Foxit APIs (2 services + fallback):
- PDF Services API — primary output pipeline. The brief is rendered as self-contained HTML → uploaded to Foxit → converted to PDF → polled → downloaded. The template includes structured sections, code blocks, before/after comparisons, and a branded cover page.
- Document Generation API — template-driven fallback if PDF Services is unavailable.
- DOCX fallback — if both Foxit services fail, generates a DOCX locally via the
docxlibrary. The user always gets a downloadable deliverable.
Data layer: Hybrid persistence — in-memory Map for fast access during analysis, Supabase as the durable backing store for dashboard history.
Scoring: Citation Probability Score is calculated from weighted structural signals:
$$S = S_{\text{base}} + S_{\text{citation}} + S_{\text{schema}} + S_{\text{faq}} + S_{\text{depth}} + S_{\text{headings}} + S_{\text{entities}}$$
where \( S_{\text{base}} = 10 \), \( S_{\text{citation}} \leq 30 \), \( S_{\text{schema}} \leq 15 \), \( S_{\text{faq}} \leq 12 \), \( S_{\text{depth}} \leq 15 \), \( S_{\text{headings}} \leq 12 \), \( S_{\text{entities}} \leq 6 \), capped at 100.
Challenges we ran into
- Vercel's 10-second timeout forced us to split what is logically one pipeline into 5 independent API routes, each with its own error handling and state persistence. The client-side orchestration had to be resilient to partial failures.
- You.com livecrawl returns markdown, not HTML, which means we couldn't extract existing schema markup directly from the crawled content. We had to build heuristic detection from the markdown structure instead.
- JSON-LD escaping — the Express Agent returns JSON with literal escape sequences (
\n,\"). We wrote a multi-layer parsing pipeline:JSON.parse→ double-encoded string detection → manual unescape fallback → clean re-stringify. @graphschema validation — standard JSON-LD validators choke on@graphwrapper structures. We built custom logic to detect@grapharrays, validate@contextat the parent level, and iterate items individually.- Foxit PDF rendering required engineering a self-contained HTML template with inline CSS that renders correctly across PDF conversion — no external stylesheets, no asset references, print-optimized page breaks.
Accomplishments that we're proud of
- Zero-to-deliverable in 90 seconds. A complete competitive analysis, gap identification, asset generation, and professional PDF brief — fully automated.
- Three-strategy PDF resilience. Foxit PDF Services → Foxit Document Generation → local DOCX. The user always gets their document.
- The pipeline is not synthetic. Every You.com API call serves a distinct, necessary function. Remove any one and the pipeline breaks. The same is true for Foxit — the PDF brief is the product, not a demo feature.
- Citation Probability Score provides a quantified, reproducible metric where the industry currently relies on qualitative guesses.
- Live, not cached. All content extraction uses You.com's livecrawl mode — judges can verify the data is real-time.
What we learned
- AEO is a real gap in the market. Every tool we researched stops at recommendations. The execution gap is where all the value is.
- You.com's API surface is surprisingly deep. The combination of Search + livecrawl + Express Agent + Advanced Agent covers the full spectrum from data retrieval to content generation. We didn't need any other AI provider.
- PDF generation is harder than it looks. Browser-rendered HTML and PDF-rendered HTML behave differently. Inline CSS, careful section breaking, and self-contained templates are non-negotiable for consistent output.
- Serverless constraints shape architecture. The 10-second timeout is a hard wall that forced better design — each stage is independently retry-able and the state model had to support partial progress.
What's next for Scoutlytics
- Scheduled monitoring — automated recurring analyses that track citation status over time and alert when a competitor gains or loses a citation
- CMS integrations — one-click deployment of generated assets directly to WordPress, Webflow, and Shopify
- Multi-language AEO — citation patterns differ across languages and regions; expanding query variants and content generation to non-English markets
- Browser extension — a lightweight overlay that shows citation probability scores while browsing any page
- Team collaboration — shared workspaces, role-based access, and approval workflows for enterprise AEO teams
Built With
- foxit
- next.js
- node.js
- react
- supabase
- tailwind
- typescript
- vercel
- you.com
Log in or sign up for Devpost to join the conversation.