Inspiration

The motivation behind DataLoom came from a common frustration: even with access to open datasets like the World Bank or IMF, most people — including students — struggle to derive meaningful insights without technical skills.

I envisioned a tool where users could simply ask questions about the world and receive clear, AI-generated answers backed by real data. The inspiration was to build a platform that democratizes data analysis and visualization through a seamless chat interface.

What it does

DataLoom is a semantic data insight engine that allows users to ask natural language questions (e.g., “What is the GDP growth rate of India from 2000 to 2020?”) and receive:

  • AI-generated answers
  • Data visualizations
  • Insights pulled from real CSV datasets

It uses OpenAI embeddings to convert questions into vector form and perform similarity searches across the uploaded dataset. The backend serves the insights, and the frontend (in progress) is designed to make data exploration intuitive and visual.


How we built it

  1. Data: Used World Bank GDP datasets (3 CSVs) containing economic indicators and metadata.
  2. Processing: Loaded and cleaned the datasets using pandas in Google Colab.
  3. Embedding: Converted insights into vector representations using OpenAI’s text-embedding-ada-002 model.
  4. Database: Stored data and embeddings in MongoDB Atlas, using a cloud-hosted cluster.
  5. Backend: Created a lightweight Flask API to:
    • Handle user queries
    • Search embeddings
    • Use GPT-4 to generate summarized insights
  6. Frontend (planned): React + Tailwind dashboard with a natural language search bar and interactive charts using Chart.js and Mapbox.

Challenges we ran into

  • MongoDB SSL & Authentication Issues: Faced persistent OperationFailure and TLS handshake issues from Colab when trying to connect to MongoDB Atlas.
  • Environment Management: Securely managing API keys (OpenAI and MongoDB) across local and cloud environments.
  • Semantic Search Tuning: Ensuring the right data is matched to the user’s intent using vector similarity.
  • Time Constraints: Building the full pipeline (data, backend, future frontend) in under a few days with minimal budget.
  • Debugging Colab Dependencies: Compatibility issues while installing MongoDB and OpenAI libraries in Colab.

Accomplishments that we're proud of

  • Successfully implemented end-to-end vector search on economic data using OpenAI embeddings.
  • Created a modular backend that can plug into any dataset and allow semantic querying.
  • Designed a future-ready architecture for a full-stack AI-powered data insight engine.
  • Overcame multiple environment and access issues to connect MongoDB and OpenAI in real time.

What we learned

  • Practical experience with vector databases, semantic search, and embedding-based querying.
  • How to integrate AI APIs (OpenAI) with cloud databases (MongoDB Atlas) in real-world pipelines.
  • The importance of clean data preprocessing before performing semantic operations.
  • The criticality of good backend design to allow flexible input/output interactions with AI.

What's next for DataLoom

  • Frontend Launch: Develop a clean React UI for querying and displaying visual insights.
  • Add More Datasets: Support for environmental, population, and education datasets.
  • User Authentication: Add Google login and individual dashboards to save insights.
  • Insight Reports: Export results as PDF reports or slides.
  • Deploy Backend: Use Google Cloud Run or Render to deploy the Flask API publicly.
  • Fine-tuned AI Models: Improve prompt engineering and explore fine-tuning GPT responses with domain-specific tone.

Built With

Share this project:

Updates