🚀 Inspiration
In today’s AI-first world, access to high-quality data is essential — but real-world data is often locked behind privacy concerns, compliance constraints (like GDPR, HIPAA), or business silos.
Many companies and researchers spend weeks cleaning, anonymizing, or even waiting for access to critical datasets. This slows down ML experimentation and product development.
We built SynForge to break that bottleneck.
Our goal was to design a domain-agnostic synthetic data generator powered by Generative AI that lets anyone generate high-fidelity, anonymized, and domain-specific datasets just by describing the schema.
Whether you're an ML engineer, a student, or a startup — SynForge lets you build datasets before data even exists.
🧠 What it does
SynForge is a full-stack Generative AI application that helps users:
- 🧾 Define a dataset schema in plain text (e.g.
name:string, age:int, salary:float) - 🔮 Automatically generate domain-specific synthetic data (finance, healthcare, retail, etc.)
- 🧠 Utilize Novita AI's powerful LLM APIs to synthesize intelligent values
- 🧹 Optionally anonymize the data while maintaining realism
- 📦 Download or preview generated datasets for use in training, testing, or POCs
- ☁️ Access everything through a live AWS-hosted web UI
You simply write the schema, select a domain, and click "Generate" — SynForge returns intelligent tabular data in seconds.
🛠️ How we built it
- Frontend: Built with React.js, styled with Tailwind CSS, and deployed to AWS S3 for static hosting.
- Backend: A FastAPI application deployed on AWS Lambda via the Serverless Framework.
- AI Integration: We used Novita AI’s
txt2tableLLM for generating smart values from schemas. We call their REST API via secure keys. - Cloud Infrastructure:
- Amazon API Gateway to expose endpoints
- Amazon S3 for hosting
- Amazon CloudWatch for logging and alerts
- IAM for secure permissions
- Dev Tools: Built and scripted using CursorAI, Sora, and tested with Artillery for load performance.
🧗 Challenges we ran into
- 🔐 Secure credential management: Passing secrets (like Novita API keys) into AWS Lambda required dotenv overrides and secure env injection.
- 🐳 Docker & Serverless Compatibility: We faced timeouts with
serverless-python-requirementsuntil we fixed Docker permissions on macOS. - 🌐 CORS policies: Frontend calls to Lambda APIs through API Gateway required careful handling of
CORS_ORIGINSand request headers. - 🧠 Prompt engineering for data synthesis: Generating structured tabular values from schema required tuning the request body to Novita’s models.
- ⚡ Limited credits: We had to optimize API usage due to token quota limits.
🏆 Accomplishments that we're proud of
- ✅ Successfully deployed production-ready GenAI backend on AWS Lambda
- 🚀 Built and shipped a React UI that’s user-friendly, minimal, and responsive
- 🤖 Created a reusable schema-to-dataset pipeline using Novita’s GenAI models
- ⚙️ Monitored system usage with CloudWatch dashboards and alarms
- 🌐 Fully hosted on AWS with no vendor lock-in
📚 What we learned
- We gained hands-on experience with Serverless deployments, Lambda environments, and FastAPI on AWS.
- Learned how to integrate GenAI APIs into real-world applications (beyond simple demos).
- Understood how to prompt engineer LLMs for tabular data, not just text generation.
- Appreciated the power of modular deployments using
serverless.ymland environment-based builds. - Mastered secure key management and CI/CD practices on a tight deadline.
🔮 What’s next for SynForge
- 🔐 Add user authentication to save and export datasets per user
- 🧾 Support CSV/XLS exports, preview tables, and column validation
- 📚 Train custom models on private schemas using fine-tuning pipelines
- 🌍 Multilingual support for global users
- 📊 Data quality scoring & visual analytics to evaluate synthetic dataset realism
- ☁️ Integrate Amazon Bedrock to remove dependency on external APIs
We plan to evolve SynForge into a plug-and-play Synthetic Data as a Service (SDaaS) platform for startups, researchers, and enterprises.
Made with 💻, ☁️, and ⚙️ by Team SynForge
Built With
- amazon-cloudwatch
- amazon-lambda
- amazon-web-services
- pydantic
Log in or sign up for Devpost to join the conversation.