SiteScanner: Your Automated SEO Co-Pilot
SiteScanner is a comprehensive, automated SEO auditing platform designed to demystify technical SEO for developers, marketers, and agencies. In minutes, it crawls an entire website, analyzes it against over 100 different data points, and delivers a beautifully designed, prioritized, and actionable report.
Inspiration
As developers and digital marketers, we've all been there: staring at a complex, jargon-filled SEO report from an expensive tool, wondering where to even begin. The data is overwhelming, the recommendations are vague, and the user interface feels like it was designed in the early 2000s.
We were frustrated with the choice between tools that were either too simplistic to be useful or too complex to be efficient. We saw a gap in the market for a tool that was both powerful and approachable.
Our inspiration was to build the SEO tool we always wanted:
- Automated: A "fire and forget" system that does the heavy lifting for you.
- Actionable: It doesn't just show you problems; it tells you why they're problems and prioritizes them by severity.
- Beautiful: A clean, modern, and intuitive dashboard that makes complex data easy to understand and a pleasure to use.
We wanted to create a co-pilot that empowers users to uncover every critical SEO issue and provides a clear roadmap to a healthier, higher-ranking website.
How We Built It
SiteScanner is a full-stack web application built with a Python backend and a modern React and TypeScript frontend, with a focus on performance, accuracy, and user experience. The architecture is designed as a pipeline that processes a user's request from initial URL submission to a final, detailed report.
The Analysis Pipeline
Our process is broken down into four key stages, as shown on our landing page:
Discovery: When a user submits a URL, our system first looks for a
sitemap.xmlfile. It uses this as a starting point and performs an initial crawl to discover all crawlable pages, images, and other assets linked from the homepage.Crawling & Data Extraction: We use a Python (Flask) backend with libraries like
requeststo fetch page content andBeautifulSoup4to parse the raw HTML. This approach is efficient for traditional websites. During this phase, we extract:- Raw HTML content
- HTTP status codes and headers
- All internal and external links
- All image tags and their attributes
- Structured data scripts (JSON-LD)
Multi-Point Analysis: The extracted data for each page is fed into a series of specialized analysis modules written in Python.
- On-Page SEO: Custom checkers parse the HTML to validate meta tags, heading structure (
H1,H2, etc.), and alt text presence. - Technical SEO: We analyze
robots.txt, canonical tags, and meta robots tags. We also check for broken links (404s) and validate the syntax of any found Schema.org markup. - Performance & Accessibility: We integrate directly with the Google PageSpeed Insights API. This provides us with real-world data on Core Web Vitals (LCP, FID, CLS), an overall accessibility score from Lighthouse, and actionable opportunities for asset optimization.
- Security: We check for the presence and correctness of HTTPS, scan for mixed content issues, and analyze key security headers.
- On-Page SEO: Custom checkers parse the HTML to validate meta tags, heading structure (
Reporting: All findings are aggregated, scored, and prioritized in our Supabase database. The frontend, built with React, TypeScript, and Tailwind CSS, fetches this compiled data from Supabase and presents it in the interactive, easy-to-digest dashboard that is our core feature.
Challenges We Faced
Handling Long-Running Processes: Crawling an entire website can be a long-running process that would exceed the timeout limits of standard serverless functions. Our biggest challenge was architecting a scalable backend system. We designed our Flask application to handle these intensive analysis tasks asynchronously, ensuring the process could complete reliably and update the database upon completion.
Data Overload vs. Actionable Insight: Our crawler gathered thousands of data points. The initial versions of our dashboard were just as overwhelming as the tools we disliked. The real challenge was in prioritization and design. We spent a significant amount of time developing a scoring algorithm that weighs different issues (e.g., a missing H1 tag is more severe than an image missing alt text) to produce a single, meaningful "Health Score" and a prioritized to-do list.
Limitation: Analyzing JavaScript-Heavy Sites: Our current approach uses a server-side Python crawler, which is fast for static HTML sites. However, we acknowledge that it cannot render pages built with modern JavaScript frameworks like React or Vue, as it analyzes the initial HTML source. A key next step is to integrate a headless browser solution (like Puppeteer or Playwright) to overcome this and perform a true-to-browser analysis.
What We Learned
Throughout this hackathon, our team learned a tremendous amount about the intricate world of modern SEO.
- The User Experience of Data: We learned that how you present data is just as important as the data itself. A well-designed UI can turn a confusing spreadsheet of issues into a clear, motivating action plan.
- The Depth of Technical SEO: We gained a new appreciation for the sheer number of factors that influence a site's visibility, from security headers and social sharing tags to accessibility attributes and Core Web Vitals.
- The Power of APIs & BaaS: Integrating with trusted APIs like Google PageSpeed Insights and using a Backend-as-a-Service like Supabase allowed us to provide world-class data and build a robust application quickly, letting us focus on our core value proposition: the unified, actionable report.
What's Next for SiteScanner
This hackathon is just the beginning. We have a clear vision for the future:
- Recurring, Scheduled Audits: Allow users to monitor their sites over time and receive alerts when new issues appear.
- Competitor Analysis: Enable users to run reports on competitor sites to identify strategic opportunities.
- Deeper Content Analysis: Integrate NLP to provide recommendations on keyword usage, content length, and readability.
- Integrations: Push issues directly to project management tools like Jira, Trello, or Asana to seamlessly integrate SEO into the development workflow.
Built With
- bolt.new
- supabase
- vercel

Log in or sign up for Devpost to join the conversation.