ClimateLens

Inspiration

Growing up in Palestine, I’d watch the morning haze settle over my town and wonder: What’s really happening to our climate? Raw datasets—IPCC PDFs, CSVs of CO₂ emissions, gigabytes of weather station logs—were siloed and buried. Even basic questions took hours of manual wrangling. I wanted a single tool where you could simply ask in plain English and get an instant, source-traceable answer. That’s how ClimateLens was born.


What It Does

  • Natural-language Q&A
    Ask ClimateLens any climate- or weather-related question:
    • “What were Palestine’s CO₂ emissions in 2020?”
    • “Top 5 CO₂ emitters in 2019.”
    • “What was the weather in France on 2020-07-07?”
  • Vector & text search
    • CO₂ data lives in MongoDB with Atlas Search indexes (exact, fuzzy, and vector).
    • IPCC report paragraphs are embedded and stored in MongoDB for sub-second vector search.
  • Answer synthesis
    We use Google Vertex AI’s Gemini LLM with function declarations to:
    1. Parse user intent → decide which “function” to call (e.g. get_emissions, get_weather, get_report).
    2. Fetch structured data from MongoDB (emissions, weather aggregates, IPCC text).
    3. Call Gemini again to stitch numbers and report snippets into a natural-language answer.
  • Traceable Sources
    Every answer comes with a “Sources” panel linking back to the exact database record or IPCC paragraph.

Challenges

  • Monorepo & Docker
    Yarn workspaces + separate client/ and server/ lockfiles → tricky multi-stage Docker builds.
  • TypeScript quirks
    Enforcing strict TS checks in Cloud Run (moduleResolution, esm vs cjs) required careful tsconfig tweaking.
  • Vector search fallback
    Ensuring reliability when Gemini embeddings fail → fallback to full-text Atlas Search.
  • Geospatial weather lookup
    Geocoding + $nearSphere queries on hundreds of millions of records demanded efficient indexing and caching.

Accomplishments We’re Proud Of

  • Sub-2 second responses across emissions, weather, and report queries—live on Cloud Run + Firebase Hosting.
  • End-to-end traceability: every number and quote can be traced back to a MongoDB record or IPCC chunk.
  • Seamless UX: a single React/Vite SPA that feels like chatting with a climate expert.

What We Learned

  • Monorepo deployment: how to build and ship a workspace-based Node.js service with multi-stage Docker and GCP Cloud Build.
  • LLM function calling: designing precise function schemas for Gemini to invoke and return structured JSON.
  • MongoDB Atlas Search: building hybrid pipelines that mix vector kNN, fuzzy text search, and geospatial queries.
  • Firebase Hosting + Environment: injecting Cloud Run URLs at build time (.env.production) for a rock-solid frontend.

What’s Next for ClimateLens

  1. User authentication & personalization: save favorite queries, schedule daily climate digests.
  2. Global impact dashboard: visual heatmaps of emissions, live weather anomalies, sea-level rise projections.
  3. Open-source contributions: enable third-party data plug-ins (e.g. air-quality, deforestation, renewable energy).
  4. Multi-language support: let users ask in Arabic, Spanish, French, and get answers in their native tongue.

Join us: ClimateLens is a free, open-source effort to democratize climate insight. We welcome feedback, contributions, and partnership!

Built With

Share this project:

Updates