About the Project

Inspiration

The 24‑hour news cycle produces millions of headlines that are hard to parse in isolation. We wanted a single, visual dashboard that surfaces where things are happening and why they matter—without forcing users to read hundreds of articles. GlobaLens was born from that need: turning raw, real‑time data into instant global awareness.

What it does

  • Streams the GDELT public dataset hourly and plots each event on a 3D globe.
  • Summarises every article with Vertex AI, distilling paragraphs into bite‑sized insights.
  • Searches semantically using MongoDB Atlas Vector Search so you can ask, “election protests” and get results even if the articles never contain that exact phrase.
  • Filters the timeline with a Select Date Range option to explore events within custom time windows.

How we built it

  1. Ingestion Pipeline – BigQuery + Cloud Functions fetch GDELT CSVs, enrich them, and store in GCS.
  2. NLP Enrichment – Vertex AI generates summaries and sentiment, while a Sentence‑Transformers model creates embeddings.
  3. Storage Layer – MongoDB Atlas stores JSON docs plus a vector index for K‑NN search.
  4. Backend – Flask provides REST endpoints and vector‑search queries.
  5. Frontend – Vite/React renders an interactive globe (react‑globe.gl) with Tailwind styling and a chat‑style search panel.
  6. CI/CD – GitHub Actions builds Docker images and deploys Cloud Functions via Terraform.

Challenges we ran into

  • Real‑time scale – GDELT emits >100 K events/day; batching and indexing had to stay under free‑tier limits.
  • Unreliable Data Source – Fetching CSVs directly from GDELT became unreliable due to intermittent site outages.
  • Mixed‑language content – Ensuring summaries worked across 65+ languages required translation fallbacks.
  • Frontend performance – Rendering tens of thousands of points crashed browsers until we implemented dynamic level‑of‑detail and WebGL instancing.
  • Cold starts – Cloud Functions sometimes exceeded latency targets; we mitigated with min‑instances and caching.

Accomplishments that we’re proud of

  • Shipped an end‑to‑end pipeline in 48 hours (hackathon deadline!)
  • Achieved sub‑second semantic search over 30 K+ events.
  • Visualised linked protests across three continents—insights not obvious from headlines alone.
  • Maintained a zero‑ops serverless stack (no VMs to babysit).

What we learned

  • Vector databases turn search into discovery—you don’t know what you’re missing until embeddings connect the dots.
  • Geospatial + NLP is a powerful combo for storytelling.
  • Spending time on DX (dev experience)—pre‑commit hooks, Docker Compose—saves hours when teammates join late.

What’s next for GlobaLens

  • Event clustering & heat‑maps to highlight hotspots.
  • User alerts (email/SMS) for custom triggers like “earthquake > 6.5”.
  • Collaborative annotations so journalists can attach notes and share filtered views.
  • PWA & offline mode for low‑bandwidth regions.
  • Multi‑lingual UI with on‑device translation for privacy.

Built With

  • atlas
  • big-query
  • cloud-functions
  • cloud-scheduler
  • cloud-storage
  • flask
  • pydantic
  • python-3.11
  • react-18
  • react?globe.gl
  • sentence?transformersmongodb-atlas-(vector-search
  • tailwind-css
  • vertex
  • vite
Share this project:

Updates

posted an update

Note for Judges & Viewers: We initially extracted data directly from the GDELT website using CSV downloads. However, the site has become unreachable, which disrupted our data pipeline. While we transitioned to using the GDELT dataset via Google BigQuery, the data currently available there only goes up to June 14th, 2025. As a result, the most recent events may not be reflected in our demo. We’ve structured our system to handle real-time ingestion and analysis, and once full data access resumes, it will operate as intended.

Best, GlobaLens Team

Log in or sign up for Devpost to join the conversation.