DataLoom

Inspiration

The motivation behind DataLoom came from a common frustration: even with access to open datasets like the World Bank or IMF, most people — including students — struggle to derive meaningful insights without technical skills.

I envisioned a tool where users could simply ask questions about the world and receive clear, AI-generated answers backed by real data. The inspiration was to build a platform that democratizes data analysis and visualization through a seamless chat interface.

What it does

DataLoom is a semantic data insight engine that allows users to ask natural language questions (e.g., “What is the GDP growth rate of India from 2000 to 2020?”) and receive:

AI-generated answers
Data visualizations
Insights pulled from real CSV datasets

It uses OpenAI embeddings to convert questions into vector form and perform similarity searches across the uploaded dataset. The backend serves the insights, and the frontend (in progress) is designed to make data exploration intuitive and visual.

How we built it

Data: Used World Bank GDP datasets (3 CSVs) containing economic indicators and metadata.
Processing: Loaded and cleaned the datasets using pandas in Google Colab.
Embedding: Converted insights into vector representations using OpenAI’s text-embedding-ada-002 model.
Database: Stored data and embeddings in MongoDB Atlas, using a cloud-hosted cluster.
Backend: Created a lightweight Flask API to:
- Handle user queries
- Search embeddings
- Use GPT-4 to generate summarized insights
Frontend (planned): React + Tailwind dashboard with a natural language search bar and interactive charts using Chart.js and Mapbox.

Challenges we ran into

MongoDB SSL & Authentication Issues: Faced persistent OperationFailure and TLS handshake issues from Colab when trying to connect to MongoDB Atlas.
Environment Management: Securely managing API keys (OpenAI and MongoDB) across local and cloud environments.
Semantic Search Tuning: Ensuring the right data is matched to the user’s intent using vector similarity.
Time Constraints: Building the full pipeline (data, backend, future frontend) in under a few days with minimal budget.
Debugging Colab Dependencies: Compatibility issues while installing MongoDB and OpenAI libraries in Colab.

Accomplishments that we're proud of

Successfully implemented end-to-end vector search on economic data using OpenAI embeddings.
Created a modular backend that can plug into any dataset and allow semantic querying.
Designed a future-ready architecture for a full-stack AI-powered data insight engine.
Overcame multiple environment and access issues to connect MongoDB and OpenAI in real time.

What we learned

Practical experience with vector databases, semantic search, and embedding-based querying.
How to integrate AI APIs (OpenAI) with cloud databases (MongoDB Atlas) in real-world pipelines.
The importance of clean data preprocessing before performing semantic operations.
The criticality of good backend design to allow flexible input/output interactions with AI.

What's next for DataLoom

Frontend Launch: Develop a clean React UI for querying and displaying visual insights.
Add More Datasets: Support for environmental, population, and education datasets.
User Authentication: Add Google login and individual dashboards to save insights.
Insight Reports: Export results as PDF reports or slides.
Deploy Backend: Use Google Cloud Run or Render to deploy the Flask API publicly.
Fine-tuned AI Models: Improve prompt engineering and explore fine-tuning GPT responses with domain-specific tone.

Built With

chart.js
flask
google-colab
javascript
mongodb
openai-api
pandas
pymongo
python
react
typescript

Updates

Anirudh Badampudi started this project — Jun 16, 2025 08:44 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.