Inspiration

In the global workplace, development often occurs across multiple geographical locations. However, due to privacy rules, it's not always possible to share data with development teams. This lack of data sharing can cause delays in validating processes, jobs, and algorithms developed by remote developers, ultimately leading to project delivery delays and increased budgets that reduce profitability. In some cases, real data is not yet available, making it challenging to proceed with development. To address this issue and improve time and profitability in software development, we were motivated to build a tool for synthetic data generation.

What it does

SynDator: Synthetic Data Generator has the capability to generate data without samples. It can generate data replicating table schema or even work without table schema information.

How we built it

We built using power of LLM snowflake Arctic and Streamlit-in-Snowflake features.

Challenges we ran into

The challenge that we faced in the token size available with the snowflake arctic. It would have been great if we got larger token size.

Accomplishments that we're proud of

Our tool is really cool at generating synthetic with minimum information, really helpful to kick-start with data for development.

What we learned

During the hackathon, we gained hands-on experience working with the Snowflake Arctic and its replicate API. Although you didn't end up using the replicate API in our final code, the knowledge we acquired about these technologies will undoubtedly prove valuable in future projects.

What's next for SynDator: Synthetic Data Generator

Generating synthetic data for multiple tables with referential integrity, business constraint's and real world like statistical distribution of values is on our roadmap.

Built With

  • arctic
  • python
  • snowflake
  • streamlit
Share this project:

Updates