Data cleaner as a service

A serverless solution to automate messy data cleanup using AWS Lambda and Pandas, all through a simple CSV upload.

Comment

Image of Data cleaner

Inspiration

While working on data science projects, we repeatedly encountered challenges like:

Inconsistent formatting
Missing values
Duplicates
Non-numeric categorical data

What it does

Removal of missing values

Duplicate row elimination
Categorical encoding
Cleaned file storage & delivery

How we built it

We used a combination of frontend and AWS serverless services:

Accomplishments that we're proud of

Built a full-stack cloud-native pipeline with automated CSV cleaning
Successfully deployed a functioning Lambda layer with Pandas support
Created a user-friendly interface that works across platforms
Delivered a tool that can be genuinely useful for both technical and non-technical users

Challenges we ran into

packaging Pandas and boto3 inside Lambda due to size limitations
Creating a Lambda Layer with all required dependencies
Handling file size and memory constraints (/tmp limited to 512MB in Lambda)
Managing SES sandbox restrictions while testing email delivery
Dealing with CORS and frontend-backend integrationp ## What we learned How to create and deploy Lambda Layers with third-party Python packages
Working with presigned URLs and managing file uploads securely
Using AWS SES to programmatically send emails
How to keep frontend simple yet functional while integrating it with a cloud backend
Importance of good UI/UX in data-focused tools

What's next for Data cleaner as a service

Let users choose cleaning options (e.g., fill nulls, encode, normalize)

Support more formats like .xlsx, .json
Add data quality reports with stats and charts
Deploy on a public URL (e.g., Netlify or S3 static hosting)
Create an admin dashboard to monitor cleaning history and usage

Built With

Updates

Payal Sahu started this project — Jun 27, 2025 01:55 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.