Auf-Log: Prompt-centric Log File analysis

File upload on mobile
Summary on mobile
Graph on desktop
Question answer on desktop
More information view on desktop

Inspiration

Who doesn’t love log files? Well, at least the information contained in them is really valuable. However, as one has to read through endless lines of monotonous computer-generated output, it can become tiring. Therefore, we believe that even experts miss a significant amount of information.

What it does

Unsure what the content of your log is? Instead of scrolling for hours without end, just use Auf-Log for fast and easy log analysis! Our app enables the analysis of log files using natural language and plots. Users can simply use text prompts and ask the system questions like "are there any ssh errors" our system will then not only give an answer in natural language but also provide relevant log lines clustered by similarity. Our system works almost in real-time and can run on any modern machine with internet connection.

How we built it

Using Python and streamlit, in an agile fashion. Frontend and Backend development were closely linked in development which enabled a user-centric development of features. We use the OpenAI GPT-3 API to understand natural language inputs and to generate natural language summaries of the results. We generate keywords from the natural language inputs which are then used to filter the log. The resulting lines are clustered by pattern reoccurance and given to the GPT-3 API to generate a summarizing response. Users are shown different clusters of log lines that matched their response and have the option to see all entries in each cluster.

And of course: With a lot of hard work, love for coding, and long nights.

Challenges we ran into

Even when using a distilled model on the GPU and batch processing wherever possible, the initial loading and pre-computation to handle chat messages swiftly took too long. We further tried to reduce the number of instances to handle based on pattern reoccurrences, as sometimes only the timestamp and PID is different. Using Levenshtein distance, we significantly reduced the number of lines to search through without loosing diversity. While we thought this approach was elegant and fun to implement, we had a business meeting with our stakeholders (ourselves) and decided to proceed with a hopefully faster search-based approach.

Accomplishments that we're proud of

We’re super happy that our prototype is working and that the main features we thought of are successfully implemented! Apart from that, we were somehow able to crash the VM number 11.

What we learned

Most importantly, the user comes first, eventhough we would have loved to develop a super involved AI Question-Answering model but due to performance issues this would have hurt the user experience. Further, we’ve learned about common NLP tasks like QA and computing embeddings and what the challenges are when putting them into practice (that runtime - uuuuh, not good with so much data). With these problems, we had to do some creative thinking! Also, one of us has become a streamlit professional over the last two days. That application is superbly built!

What's next for Auf-Log

A bright future! (And a list of features that would increase the usability and speed: Being able to de-select keywords, assign importances, and add keywords. Hosting a generative language model on the GPU. And maybe even trying out the original approach again, this time with a fine-tuned distilled DeBERTa or something. But for us, it’s mainly sleeping.)