Inspiration

While working on data science projects, we repeatedly encountered challenges like:

  • Inconsistent formatting
  • Missing values
  • Duplicates
  • Non-numeric categorical data

What it does

Removal of missing values

  • Duplicate row elimination
  • Categorical encoding
  • Cleaned file storage & delivery

How we built it

We used a combination of frontend and AWS serverless services:

Accomplishments that we're proud of

  • Built a full-stack cloud-native pipeline with automated CSV cleaning
  • Successfully deployed a functioning Lambda layer with Pandas support
  • Created a user-friendly interface that works across platforms
  • Delivered a tool that can be genuinely useful for both technical and non-technical users

Challenges we ran into

  • packaging Pandas and boto3 inside Lambda due to size limitations
  • Creating a Lambda Layer with all required dependencies
  • Handling file size and memory constraints (/tmp limited to 512MB in Lambda)
  • Managing SES sandbox restrictions while testing email delivery
  • Dealing with CORS and frontend-backend integrationp ## What we learned How to create and deploy Lambda Layers with third-party Python packages
  • Working with presigned URLs and managing file uploads securely
  • Using AWS SES to programmatically send emails
  • How to keep frontend simple yet functional while integrating it with a cloud backend
  • Importance of good UI/UX in data-focused tools

What's next for Data cleaner as a service

Let users choose cleaning options (e.g., fill nulls, encode, normalize)

  • Support more formats like .xlsx, .json
  • Add data quality reports with stats and charts
  • Deploy on a public URL (e.g., Netlify or S3 static hosting)
  • Create an admin dashboard to monitor cleaning history and usage

Built With

Share this project:

Updates