Inspiration
While working on data science projects, we repeatedly encountered challenges like:
- Inconsistent formatting
- Missing values
- Duplicates
- Non-numeric categorical data
What it does
Removal of missing values
- Duplicate row elimination
- Categorical encoding
- Cleaned file storage & delivery
How we built it
We used a combination of frontend and AWS serverless services:
Accomplishments that we're proud of
- Built a full-stack cloud-native pipeline with automated CSV cleaning
- Successfully deployed a functioning Lambda layer with Pandas support
- Created a user-friendly interface that works across platforms
- Delivered a tool that can be genuinely useful for both technical and non-technical users
Challenges we ran into
- packaging Pandas and boto3 inside Lambda due to size limitations
- Creating a Lambda Layer with all required dependencies
- Handling file size and memory constraints (
/tmplimited to 512MB in Lambda) - Managing SES sandbox restrictions while testing email delivery
- Dealing with CORS and frontend-backend integrationp ## What we learned How to create and deploy Lambda Layers with third-party Python packages
- Working with presigned URLs and managing file uploads securely
- Using AWS SES to programmatically send emails
- How to keep frontend simple yet functional while integrating it with a cloud backend
- Importance of good UI/UX in data-focused tools
What's next for Data cleaner as a service
Let users choose cleaning options (e.g., fill nulls, encode, normalize)
- Support more formats like
.xlsx,.json - Add data quality reports with stats and charts
- Deploy on a public URL (e.g., Netlify or S3 static hosting)
- Create an admin dashboard to monitor cleaning history and usage
Log in or sign up for Devpost to join the conversation.