Inspiration
In many real-world domains like healthcare, finance, and education, acquiring high-quality labeled data is extremely difficult due to privacy concerns, legal restrictions, or simply the lack of availability. While working on AI models during academic and hackathon projects, I faced repeated setbacks due to small or incomplete datasets. This motivated me to build a tool that could generate realistic, domain-specific synthetic data — fast, private, and highly customizable — to fill that critical gap.
What It Does
The Smart Synthetic Data Generator allows users to:
- Select a domain (e.g., healthcare, finance, retail, education)
- Instantly generate structured, realistic datasets using intelligent field-type logic
- Export synthetic data as CSV for immediate use in model training or testing
It supports deep customization through schema JSON files and can be extended easily for new domains.
How I Built It
- Frontend & Hosting: Built using
Streamlitfor rapid prototyping and interactive UI. - Backend Logic: Python-based logic using
Faker,Pandas, and custom field rules for realism. - Schemas: Domain-specific
.jsonschemas define field types and relations. - Deployment: Deployed using platforms like Render/Streamlit Cloud for public access.
Code example:
row[field_name] = self.generate_field_value(field_type)

Log in or sign up for Devpost to join the conversation.