SynDator: Synthetic Data Generator

Inspiration

In the global workplace, development often occurs across multiple geographical locations. However, due to privacy rules, it's not always possible to share data with development teams. This lack of data sharing can cause delays in validating processes, jobs, and algorithms developed by remote developers, ultimately leading to project delivery delays and increased budgets that reduce profitability. In some cases, real data is not yet available, making it challenging to proceed with development. To address this issue and improve time and profitability in software development, we were motivated to build a tool for synthetic data generation.

What it does

SynDator: Synthetic Data Generator has the capability to generate data without samples. It can generate data replicating table schema or even work without table schema information.

How we built it

We built using power of LLM snowflake Arctic and Streamlit-in-Snowflake features.

Challenges we ran into

The challenge that we faced in the token size available with the snowflake arctic. It would have been great if we got larger token size.

Accomplishments that we're proud of

Our tool is really cool at generating synthetic with minimum information, really helpful to kick-start with data for development.

What we learned

During the hackathon, we gained hands-on experience working with the Snowflake Arctic and its replicate API. Although you didn't end up using the replicate API in our final code, the knowledge we acquired about these technologies will undoubtedly prove valuable in future projects.

What's next for SynDator: Synthetic Data Generator

Generating synthetic data for multiple tables with referential integrity, business constraint's and real world like statistical distribution of values is on our roadmap.

Built With

arctic
python
snowflake
streamlit

Submitted to

THE FUTURE OF AI IS OPEN

Created by

Ryo Shibuya
Data Superhero who loves designing, building, and evangelizing data architectures.
Jalindar Karande
Cloud Data Architect love to continuously explore cutting-edge technologies in the modern data stack.
Mukul Degweker
Sr. Advisor for Data, Analytics & AI delivering solutions across a wide variety of industries.
Vinayak Shinde
Engineering head delivering business value via a diverse technology stack from applications to BigData
Ashay Dhavale
Cloud Data Architect who loves to explore cutting edge technologies and design innovative business solutions.

Updates

Jalindar Karande started this project — May 21, 2024 12:15 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.