What it does
RetailX is an end-to-end AI retail analytics tool that:
Cleans and processes millions of retail transactions
Detects suspicious/fraudulent activity using intelligent logic
Uses GPT to generate insights from filtered data
Connects directly to Databricks (no CSVs)
Visualizes real-time insights in a sleek Streamlit dashboard
Supports user filters by department, region, and fraud status
Provides exportable, GPT-enhanced business summaries
How we built it
Databricks for processing 3M+ rows of Instacart data
PySpark & Pandas for data cleaning and ETL
Streamlit to build the interactive UI
OpenAI (GPT-4) for AI-powered fraud summaries
Cloud Deployment (local/Streamlit Cloud)
Fully integrated pipeline with no manual CSV steps
Challenges we ran into
Handling large data in a lightweight, deployable app
Hitting GPT and Databricks API limits (free tier)
Ensuring fast load times despite 3M records
Building a smooth user experience for non-technical users
Accomplishments that we're proud of
End-to-end working product with real dataset
Fully interactive Streamlit app with smart AI insights
True real-world simulation of what data teams build
GPT integration without breaking performance
Clean, modular codebase with no hardcoded workarounds
What we learned
Scaling data pipelines with PySpark and Databricks
Best practices in merging AI with product thinking
Building polished dashboards for business users
Balancing performance and AI-powered customization
What's next for RetailX
PostgreSQL integration to store fraud alerts
User authentication for HR or retail managers
Scheduled batch fraud detection with Airflow
Visual anomaly detection using ML
Deploying a mobile version for retail field teams
Log in or sign up for Devpost to join the conversation.