Inspiration

One of the most fascinating stories in retail data science comes from Target’s Pregnancy Prediction model.

Target’s analytics team once made headlines when they predicted a teenage girl’s pregnancy, based solely on her purchase behavior, before her own parents knew. By identifying subtle shifts in buying patterns (unscented lotion, vitamin supplements, cotton balls), they trained a predictive model that inferred pregnancy stages with remarkable accuracy.

That story stuck with me.

I thought:

If big retailers like Target can extract powerful insights like this, why can’t small businesses do the same, without a data science team or expensive tools?

That’s when it hit me: Maybe I could build a simplified version of what Target did a fully automated system that helps small retailers unlock trends, patterns, and actionable insights from their sales data.

And so, ShopSense was born. A fully automated pipeline that does everything:

  • Cleans raw sales data
  • Finds trends, spikes, and hidden correlations
  • Summarizes them in natural language using LLMs
  • Emails a beautiful PDF report, no code, no dashboards, just results

I wanted to democratize data-driven decision-making, and bring Target-style intelligence to every business owner’s inbox.

What it does

ShopSense is a fully automated, serverless data pipeline that transforms raw e-commerce sales data into insightful, AI-generated PDF reports. The entire process is event-driven and requires zero manual intervention after the initial setup.

Here’s the user journey:

  1. Upload Data: The user simply uploads their raw sales data (as a CSV file) into a designated Amazon S3 bucket.
  2. Automated Processing & Cataloging: The upload triggers a series of AWS services. An AWS Lambda function first cleans the data. This clean data then triggers AWS Glue to crawl the data, understand its structure, and create a queryable database catalog.
  3. AI-Powered Insight Generation: Once the data is cataloged, another Lambda function uses Amazon Athena to run queries against it. The results are then fed as context to a powerful Large Language Model (Mistral/DeepSeek/Gemini via OpenRouter) which generates deep, analytical insights based on the sales patterns, trends, and key metrics.
  4. Automated Report Delivery: The AI-generated insights are used to create a professional PDF report, which is saved to another S3 bucket. Finally, Amazon EventBridge runs on a schedule (e.g., every 10 minutes in my use case or weekly or monthly), finds all newly generated reports, compiles their links into a single email, and sends a comprehensive summary to the user.

In short, ShopSense automates the entire workflow from raw data to a finished intelligence report in your email.

How it was built

ShopSense is built entirely on a serverless architecture within Amazon Web Services (AWS), making it highly scalable, cost-effective, and low-maintenance.

  • Data Lake & Storage: I use Amazon S3 as the foundation. There are separate buckets/folders for raw data, cleaned data, query results, and the final PDF reports. S3's event notification system is the catalyst for the entire workflow.
  • Compute & Orchestration: AWS Lambda is the core of my project. I have multiple Lambda functions, each responsible for a specific task:
    • Processing raw data upon upload.
    • Initiating the AWS Glue crawler.
    • Querying with Athena and calling the OpenRouter LLM API.
    • Generating the PDF report from the AI's JSON output.
    • Sending the final summary email.
  • Data Cataloging: AWS Glue (specifically, the Glue Crawler) is used to automatically infer the schema of my cleaned sales data and create a metadata table in the Glue Data Catalog. This makes the data instantly queryable.
  • Data Querying: Amazon Athena allows me to run standard SQL queries on the data in S3 without needing to manage any servers. It's the perfect tool for extracting the specific sales statistics I need to feed to the AI.
  • AI & Intelligence: I use the OpenRouter API to access powerful Large Language Models like Mistral/DeepSeek/Gemini. By providing the queried data from Athena as context in my prompt, I can ask the model to perform complex analysis, identify trends, and summarize key findings.
  • Scheduling & Notifications: Amazon EventBridge acts as my scheduler, triggering the final Lambda function to collate report links and email the user. This ensures timely and consistent delivery of the sales intelligence.

The entire infrastructure is defined as "Infrastructure as Code," allowing for easy deployment and management.

Challenges ran into

One of the biggest challenges was robust error handling and chaining the services together seamlessly. An event-driven architecture is powerful, but if one link in the chain fails, the whole process can stop. For instance, I initially ran into UnboundLocalError in my Lambda function because the LLM API calls were failing (due to invalid model IDs), and the error wasn't being caught properly before the code tried to process a non-existent result. I solved this by implementing more resilient try-except blocks and ensuring my code could gracefully handle API failures.

Another challenge was optimizing the prompt for the LLM. Simply feeding it raw data wasn't enough. I had to experiment extensively with prompt engineering to get the model to produce consistently high-quality, relevant, and correctly formatted insights. This involved structuring the data context clearly and crafting very specific instructions for the analysis I wanted.

Accomplishments that I'm proud of

I am incredibly proud of creating a truly "hands-off" automated pipeline. The fact that a user only needs to perform one action—uploading a file—to receive a detailed, AI-generated report in their inbox is a huge accomplishment. It completely removes the manual, repetitive work that inspired the project.

I am also proud of the serverless-first approach. By leveraging services like Lambda, S3, and Athena, the operational cost of ShopSense is extremely low, scaling to zero when not in use. This makes it a financially viable solution for the small and medium-sized businesses I aimed to help.

What I learnt

This project was a deep dive into the power of event-driven architectures on AWS. I learned how to effectively chain together multiple serverless services, using S3 events and Lambda triggers as the "glue."

I also learned a great deal about the practical side of integrating LLMs into a data pipeline. It's not just about the API call; it's about data preprocessing, effective prompt engineering, and handling the variability of AI-generated output. I discovered that the quality of the insights is directly proportional to the quality of the context and the prompt you provide.

What's next for ShopSense

The future for ShopSense is focused on adding more intelligence and user control.

  • Interactive Dashboards: I plan to use the structured data in my S3 data lake to power a simple, web-based dashboard (perhaps using Amazon QuickSight or a custom web app) for users who want to explore their data visually in addition to the PDF reports.
  • Customizable Prompts: Allowing users to submit their own questions or define the kind of insights they care about most (e.g., "Focus on profit margins" or "Analyze customer return rates").
  • Broader Data Integration: Expanding beyond simple CSV uploads to connect directly to e-commerce platforms like Shopify or WooCommerce for even more seamless data ingestion.
  • Predictive Analytics: Leveraging more advanced machine learning models to move from historical analysis to forecasting future sales trends.

Built With

Share this project:

Updates