Snowflake Cortex FAA Incident Reports RAG Model

Generated from OpenAI DALLE program

Inspiration

Aircraft accidents have been in the news lately due to equipment issues. Could these issues be prevented through early analysis or prevention? FAA performs a very detailed investigation report but these reports can be up to 80+ pages per incidents. So this RAG model will allow analysts or FAA themselves to review quickly and take action

What it does

We are using the snowflake LLM models to read all the FAA incident documents uploaded into snowflake internal stage. We then built a streamlit application that acts as a front end on the documents where users can ask questions.

How we built it

We are using snowflake LLMs that are provided in a trial account and built a streamlit as a front end interface. We grouped the documents by their content using an LLM to summarize the documents and categorize them into Human, Weather, and Aircraft incidents. We were able to build the entire process by repurposing the snowflake workshop within 2 weeks.

Challenges we ran into

The biggest challenge was getting the LLM to not assume or "hallucinate" when reading the documents uploaded. It took a while to get the LLM to actually look at our documents uploaded. In the beginning stages of development, we were assuming that the LLM was reading our documents, but it was actually just generating descriptions based on the questions we asked.

We also wanted to integrate the documents to more structured data but ran out of time to join the documents based on date with other data points, such as weather data.

Accomplishments that we're proud of

We are proud of creating a LLM to read uploaded documents and summarize the documents very quickly. Once we got the LLM to read our uploaded documents, it took a very short time to develop and produce generated output from our personal documents from FAA. We were impressed with the TOML runtime security secrets feature during deployment and the inclusion of private key public key authentication methods to securely connect between Snowflake, Streamlit and Github. In fact, within Github, we were challenged to use the dual factor authentication and collaboration settings to share our code.

What we learned

How easy it was to create a LLM model from snowflake's interface and create a front end web interface to interact with the documents. It took minimal effort to build the architecture and the summarization part of the documents would be beneficial to our professional work. Snowflake LLM is very diverse and we were able to save time and cost on leveraging Snowflake infrastructure and did not even use a fraction of the $400 trial account budget to complete this project. There is no requirement to install a vector database or buy a server with GPU. The underlying infrastructure is baked into Snowflake Cortex and our cost is based on usage.

What's next for Snowflake Cortex FAA Incident Reports RAG Model

Integrating this model to structured data and potentially joining this data with work orders requests with aircraft or other items. Then perform predictive maintenance now knowing what the most common or impactful equipment causes for incidents. The incidents cost up to 3.1 million dollars so this exercise would lower cost to FAA and other airports.

Built With

github
python
snowflake
sql
streamlit

Submitted to

RAG 'n' ROLL Amp up Search with Snowflake & Mistral

Created by

I worked with Simon on the backend logic and configuring the Streamlit app to point to the documents that Simon has uploaded. I also configured the streamlit app to work on the community page since streamlit on snowflake works slightly differently than the community one.

Michael Han
I worked with Michael on the video presentation; equipment; speech and video organization. I gathered the FAA Incidents final data documentation and started with the "WHY" and used the FAA administrator persona in the LLM prompt engineering. We thought about using Multimodal LLM had the black box audio for the aviation incidents made available to the public.

Simon Kah Fye Chung

Updates

Michael Han started this project — Jan 14, 2025 03:24 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.