What it does

RetailX is an end-to-end AI retail analytics tool that:

Cleans and processes millions of retail transactions

Detects suspicious/fraudulent activity using intelligent logic

Uses GPT to generate insights from filtered data

Connects directly to Databricks (no CSVs)

Visualizes real-time insights in a sleek Streamlit dashboard

Supports user filters by department, region, and fraud status

Provides exportable, GPT-enhanced business summaries

How we built it

Databricks for processing 3M+ rows of Instacart data

PySpark & Pandas for data cleaning and ETL

Streamlit to build the interactive UI

OpenAI (GPT-4) for AI-powered fraud summaries

Cloud Deployment (local/Streamlit Cloud)

Fully integrated pipeline with no manual CSV steps

Challenges we ran into

Handling large data in a lightweight, deployable app

Hitting GPT and Databricks API limits (free tier)

Ensuring fast load times despite 3M records

Building a smooth user experience for non-technical users

Accomplishments that we're proud of

End-to-end working product with real dataset

Fully interactive Streamlit app with smart AI insights

True real-world simulation of what data teams build

GPT integration without breaking performance

Clean, modular codebase with no hardcoded workarounds

What we learned

Scaling data pipelines with PySpark and Databricks

Best practices in merging AI with product thinking

Building polished dashboards for business users

Balancing performance and AI-powered customization

What's next for RetailX

PostgreSQL integration to store fraud alerts

User authentication for HR or retail managers

Scheduled batch fraud detection with Airflow

Visual anomaly detection using ML

Deploying a mobile version for retail field teams

Built With

Share this project:

Updates