Docker Log Sentiment Analyzer

Inspiration

Traditional log analysis can prove to be a time-intensive and unwieldy task. Logs frequently encompass a substantial amount of data, and manually sifting through them can become a laborious and error-prone undertaking.

Instead of spending valuable time on parsing and comprehending each individual log entry, the Docker Logs Sentiment Analyzer aims to direct attention towards what is truly significant. The prioritization of logs entails dedicating time and focus to the most vital and pertinent log entries, a fundamental aspect of efficient issue resolution.

Within logs, you can often find "indicators" or critical pieces of information that facilitate the identification of the nature of an issue recorded in the log. These indicators may manifest as error messages, warnings, timestamps, or specific patterns that offer insights into the underlying issue. The Docker Logs Sentiment Analyzer identifies these indicators and classifies each log item into each of the following bins: information, error, warning, critical and debug.

What it does

When a Docker container is running, it generates logs that capture various events, including errors, warnings, and informational messages. Parsing Docker logs involves extracting data from these logs to make them more accessible and meaningful. Docker logs can contain a mix of different log levels, such as errors, warnings, and informational messages. Categorizing these logs into these levels is a fundamental step in log analysis. Errors and warnings are typically indicative of problems, while informational and debug messages provide context. By categorizing logs into different log levels, users can focus their attention on the logs most likely to contain information related to a problem's origin.

How we built it

Generating Logs using a Python Script: We first created log data through a Python script. This step involved generating a variety of log entries or events, which could mimic the types of logs commonly found in real-world applications. These logs may have encompassed information such as system events, user actions, errors, and other relevant data.

Utilizing NLP and Machine Learning for Sentiment Analysis: After obtaining the log data, we used Natural Language Processing (NLP) and Machine Learning techniques to develop a sentiment analysis model. This model was trained to assess the emotional tone or sentiment conveyed within the logs. The training data for this model consisted of 35,000 log entries. It's important to note that these entries were already categorized and vetted, meaning each log had a predefined sentiment label (ex: information, error, warning, critical and debug). The data was then divided into a training set (80%) and a test set (20%) to facilitate model training and evaluation.

Testing the Model: Once the model was trained, we tested and refined its performance. Testing involved using the 20% reserved test data to evaluate the model's accuracy and its ability to correctly classify the sentiment of log entries.

Integration with Flask API and React Frontend: Upon achieving satisfactory results with the model, we integrated the model into a Flask API. This API served as the backend for our application and enabled interaction with the model. Users could send log data to this API, and it would return sentiment analyzed results for the provided logs. Additionally, we connected this Flask API to a React frontend, creating a user-friendly interface. The React frontend allowed users to interact with the system, input logs, and receive sentiment analysis results, making the sentiment analysis accessible to end-users in a visually appealing and intuitive way.

Challenges we ran into

Gathering Logs: In the initial stages, we encountered obstacles when it came to the collection of logs because the essential data was not readily accessible.

Constructing a Tailored Dataset: The process of generating our dataset proved to be intricate, primarily due to the difficulty in determining the necessary data volume and standardizing the information. Fortunately, we were conscious of the potential for incomplete and unreliable testing results resulting from this, which added complexity to our dataset's comprehensiveness assessment.

Labeling the dataset: Labeling our log dataset was challenging because some data may be inherently ambiguous, making it challenging to assign a relevant and accurate label. Furthermore, after generating our initial dataset, we realized the classes in our dataset were imbalanced, so we had to re-genenerate extra data points for these under-representations.

Data Quantity and Variety: Another impediment we faced was finding a substantial volume of data that could cover diverse testing scenarios. Given that containers produce various log types across different platforms, ensuring diversity within our dataset presented a considerable challenge.

Accomplishments that we're proud of

Working with People of Different Skills: In retrospect, one of our most significant accomplishments was the way we brought together individuals with diverse skill sets. It was no easy task, but it made a world of difference. This diversity in expertise fueled innovation and helped us tackle challenges from various angles. It's amazing how this experience showcased our ability to communicate effectively, coordinate tasks seamlessly, and capitalize on each team member's strengths to reach our common goal.

Using Docker and AI/ML for Real-World Problem Solving: One standout achievement was our use of Docker and AI/ML to solve a real-world problem. We took the power of containerization with Docker and harnessed the capabilities of AI and machine learning to address a practical issue. Docker made application deployment a breeze, and our machine learning models did wonders in categorizing logs into levels by determining sentiments of logs. It highlighted our ability to bridge the gap between theory and real-world applications.

Presenting Work in an Easily Comprehensible Way: The art of presenting complex work in a way that's easily understood is often underrated. It's something we worked hard to achieve. Breaking down intricate technical details into clear, distinct words written and orally. Our ability to communicate effectively, especially to a broader audience, made all the difference. This accomplishment highlighted our storytelling skills, data visualization abilities, and our knack for conveying the significance of our work to various stakeholders.

What we learned

Docker and AI/ML Synergy: We learned how Docker simplifies deployment, ensuring consistency and accessibility in various environments.

Web App Deployment with Docker: Docker's versatility made developing web apps easier amongst ourselves by eliminating compatibility issues and streamlining packaging.

Effective Team Communication: Effective communication within the team improved collaboration, fostering a shared understanding and open dialogue.

Feedback-Driven Improvement: We discovered the value of sharing ideas and incorporating feedback, leading to project refinement and personal growth.

What's next for Docker Log Sentiment Analyzer

Expand Sample Data: We need to enrich our dataset with more diverse log data to improve the model's reliability and adaptability by covering a broader range of real-world scenarios and sources.

Experiment with Different Models: Enhance accuracy and intuitiveness by exploring various machine learning and NLP models to identify the best-performing one for our use case.

Implement Prioritization Mechanism: Develop an automated prioritization system that categorizes logs based on potential impact and traces the root causes of issues, streamlining and enhancing issue resolution efficiency within our system.

Built With

amazon-web-services
chatgpt
docker
figma
git
github
google-colab
python-(flask)
react

Updates

Amila De started this project — Nov 05, 2023 08:46 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.