Arctic Vault

Inspiration

All of us are quite inspired and intrigued by Generative AI. Testing an open-source model like Snowflake Arctic was the perfect opportunity to explore some of our ideas. Given our background, we decided to focus on one of the model's key features: creating SQL queries. After much deliberation, we decided to create an agent that could help any user generate an optimized Data Vault 2.0.

What it Does

The assistant aims to generate SQL queries that can be used to create the skeleton of a Data Vault 2.0 in Snowflake. We have defined three steps:

Arctic Data Analyst: This is the initial phase where the user uploads CSV files containing the necessary tables for their Data Vault. The Arctic Data Analyst helps the user evaluate the current data landscape. It checks for missing tables, identifies relationships, and ensures that all necessary components are present. This phase involves a detailed analysis to help the user understand any gaps or issues in their data structure, setting a solid foundation for the next steps.
Arctic Data Architect: Building on the insights from the Data Analyst, this agent engages in a deeper conversation with the user. It compiles and synthesizes information from the initial analysis to produce a comprehensive report. This report includes all the detailed specifications and schemas required to generate SQL queries for the Data Vault 2.0 tables. The Data Architect ensures that the data model adheres to best practices and optimizes for performance and scalability.
Arctic Data Engineer: In the final phase, the Data Engineer receives the detailed report generated by the Data Architect. This agent is responsible for translating the report into actionable SQL queries. For every Hub, Link, and Satellite table outlined in the report, it produces the necessary SQL code. These queries are crafted to be executed directly on Snowflake, creating the structured and optimized Data Vault 2.0.

How We Built It

We used Python and Streamlit to develop the solution. We collaborated on the same repository, incorporating ideas iteratively. To simplify implementation, we used Replicate to generate outputs from the model. Much of the work involved Prompt Engineering, testing different versions, and leveraging our expertise. Additionally, we coded and updated the UI using Streamlit, aiming for a simple, user-friendly interface.

Challenges We Ran Into

The main challenge was correctly selecting the parameters for the model, in addition to the system prompt. We noticed that the model is very sensitive to changes in settings and often produces random output after completing a task. It appears that the connection with Replicate might be hiding some underlying issues.

Accomplishments That We're Proud Of

We are quite happy with the solution we provided. Despite having a limited time window, it seems to be the start of a more complex and complete solution. Setting up the UI and different tasks using Generative AI with such ease is truly impressive.

What We Learned

We gained more experience with Streamlit, a tool that enables even non-front-end engineers to quickly set up a UI. This is incredibly impressive. Moreover, testing an open-source model proved to be crucial for specific tasks in our daily business.

What's Next for Arctic Vault

We believe that fine-tuning the model can significantly enhance performance and help us perform specific tasks more easily, reducing volatility in the results. Having an already performant open-source model is a significant advantage, and we aim to exploit its full potential.

Built With

python
replicate
streamlit

Updates

Paolo Tatti started this project — May 21, 2024 02:56 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.