Inspiration

The AI boom has triggered a massive race to build new data centers, but the actual process of finding the right land is agonizingly slow. Siting multi-billion-dollar infrastructure normally requires hiring teams of GIS consultants, waiting weeks for environmental impact reports, and sifting through hundreds of opaque PDFs. I realized that all the necessary data—from high-voltage grid capacity to seismic fault lines—already exists in the public domain; it's just scattered across a dozen different government agencies. I got incredibly frustrated looking at how disconnected this process is. We built DataSiteAI because we wanted to prove that by aggregating this data into a single, intelligent pipeline, we could compress months of expensive, manual real estate consulting into milliseconds of computational analysis.

What it does

DataSiteAI is a full-stack spatial intelligence platform that scores and ranks land across the continental United States for data center suitability. Users click anywhere on our interactive map, and the system instantly returns a composite suitability score out of 100. This score is calculated in real-time using seven weighted categories: power infrastructure, water access, geological stability, climate, fiber connectivity, economic environment, and environmental sensitivity.

But we didn't stop at just visualizing data. We integrated a location-aware Gemini AI assistant directly into the map. When you select a parcel, Gemini instantly reads the underlying risk math and gives you a professional, consultant-level breakdown of why that specific plot is a golden opportunity or a risky investment. Finally, we map real, currently available land listings onto our highly-scored cells, allowing users to go from geospatial analysis to purchasing in a single click.

How we built it

We built the backend using Python 3.13 and FastAPI for high-performance async endpoints, connected to a PostgreSQL/PostGIS database for spatial queries. We integrated 9 different public APIs (including FEMA, USGS, OpenStreetMap, and EPA), wrapping them in Redis caching to survive rate limits.The core engine consists of 7 independent scorers that run concurrently via asyncio.gather. Each fetches domain-specific data and calculates a normalized raw score ($[0,1]$), which is then multiplied by an isolated weights matrix to generate the final composite score. On the frontend, we used React 19, Vite, and Tailwind CSS to build a glassmorphic, Tier-1 AI startup aesthetic. We used Leaflet to render the interactive map and 8 dynamic GeoJSON layers. Finally, we wired up the Google Gemini SDK to silently ingest the backend's scoring payload on every map click, acting as our localized expert chatbot.

Challenges we ran into

Normalizing geospatial data in the wild was brutal. Translating dollar-per-acre prices, integer AQI values, and alphanumeric FEMA flood codes (like "AE" or "X") into comparable $[0, 1]$ risk scores without losing their intrinsic meaning required complex, domain-specific math. Second, respecting rate limits across 9 different external APIs simultaneously required careful async design and aggressive Redis caching to prevent the app from getting IP-banned.We also lost valuable hackathon hours to a silent CORS failure caused by a classic port mismatch between our backend and frontend (:8001 vs :8000). Finally, maintaining architectural discipline was tough; it was incredibly tempting under pressure to hardcode weight adjustments directly inside endpoints to fix edge cases, but we forced ourselves to keep all scoring math strictly isolated from the routing logic.

Accomplishments that we're proud of

We are incredibly proud of the location-aware Gemini integration. It isn't just a basic chat wrapper; because we silently inject the exact geographic coordinates, bounding boxes, and raw risk scores into the prompt context, the AI actually understands the land you are looking at. Seeing it accurately explain the wind potential and substation proximity of a random field in Kansas was a massive win.I'm also proud of the sheer speed of the pipeline. Getting the flow of click $\rightarrow$ bounding box $\rightarrow$ live data fetch $\rightarrow$ score $\rightarrow$ GeoJSON generation to run end-to-end in milliseconds on live data required serious optimization. Lastly, completing a beautiful, glassmorphic UI that looks like a legitimate, funded enterprise product in just 24 hours is a huge testament to our team's execution.

What we learned

We learned the immense architectural value of decoupling logic. By strictly separating the scoring functions from the weight configuration (weights.py), we created a robust system where the engine just does the math, and the user decides what matters.

We also learned exactly how difficult it is to make wildly different public data formats speak the same language—geospatial data is inherently chaotic, and standardizing it requires patience and precision. Furthermore, we realized the absolute necessity of mock modes and caching in a hackathon environment; building dummy data responses early on allowed our frontend and backend to scale in parallel without bottlenecking each other or burning through API quotas.

What's next for DataSiteAI

The next major step is integrating real-time, nodal-level power grid capacity data. Knowing a high-voltage line is nearby is great, but knowing if it actually has available capacity for a 500MW data center is the holy grail of site selection. We also want to ingest local county zoning ordinances and tax incentive zones using an LLM to automatically parse legal PDFs into geospatial layers.

From a product standpoint, we plan to add user accounts so enterprise clients can save sites, adjust the algorithm's scoring weights based on their specific company priorities (e.g., prioritizing 100% renewable energy over latency), and export automated, boardroom-ready PDF due-diligence reports.

Built With

Share this project:

Updates