Y Combinator MCP

Project Submission: YC Model Content Protocol (MCP) Overview The YC Model Content Protocol (MCP) is a structured interface for extracting and querying startup information from the Y Combinator website. It enables AI assistants to retrieve semantically rich, normalized data from YC startup profiles — making the site machine-readable and queryable without manual scraping or brittle parsing.

What It Does MCP allows AI systems to: • Extract structured data from YC startup pages • Query specific content like founder bios, product descriptions, funding details, or tags • Interpret page structure using semantic cues and hierarchical content modeling • Receive updated content when startup profiles change

Why We Built It AI assistants need reliable access to structured data to understand and reason about startups. The YC website provides high-value information, but it's designed for human readers. MCP bridges that gap by turning YC content into a clean, machine-readable format — purpose-built for retrieval, summarization, and analysis by AI systems.

How It Works

Content Extraction A modular scraping layer identifies and retrieves consistent content blocks across YC startup profiles, including: • Descriptions • Founders and team info • Tags (industry, location, stage) • Metadata (batch, status, URL)
Transformation Engine The raw HTML is converted into a structured, assistant-ready format: • Layout normalization • Semantic labeling of content (e.g., pitch, traction, team) • Hierarchical representation for context-aware queries
Query API The resulting content is exposed through a simple interface that lets assistants: • Request content by URL • Query specific fields or sections • Detect updates and changes to existing profiles

Challenges • Layout variability: While YC pages follow a general structure, edge cases required adaptive parsing and fallbacks • Implicit semantics: Many content blocks lacked labels, so we implemented heuristics based on content type, order, and structure • Performance: To support fast queries, we implemented lightweight caching and partial refresh strategies • Context preservation: Relationships between content elements are preserved in a hierarchical model for better querying

What’s Next • Change tracking: Versioning and diffing of startup data over time • Interactive content support: Parsing of structured subcomponents like embedded lists, badges, or expandable sections • User-context adaptation: Filtering or prioritizing content based on assistant use-case (e.g., investor vs. applicant queries)

Impact MCP makes the YC website programmatically accessible without requiring manual scraping or brittle DOM logic. Assistants can now answer questions like: • “Who founded this startup and what’s their background?” • “Which YC companies in the current batch are focused on AI?” • “Has this startup changed its pitch since last week?” By turning unstructured pages into structured protocol-compliant content, MCP unlocks the YC dataset for intelligent systems.

Built With

Submitted to

World's Biggest MCP Hackathon

Created by

I worked on Full Stack application designing the front ten, linking it with the backend and implementing scraping functions for information retrieval

ryanzambrano Zambrano
I did uv setup, initial python YC website scraping, initial fastapi setup, and the mcp server for YC with 7 tools.

Morgan Rockett
Seth Nuzum
Mason Dierkes

Built With

Updates