What it does

Our project focuses on detecting anomalies in the cooling systems of diesel-powered trains. Using a 2GB dataset of operational data, our system pre-processes and analyzes temperature, pressure, and operational metrics to identify deviations indicative of potential failures. We implemented machine learning algorithms like K-means clustering and Isolation Forest to distinguish between noise and significant anomalies. The real-time dashboard we developed visualizes these anomalies, enabling the rolling stock team to make informed decisions quickly, enhancing train reliability and reducing downtime.

How I built it

The project was built in several phases: data preprocessing, exploratory data analysis, anomaly detection, and visualization. We started by cleaning the data, removing null values, and standardizing timestamps. Statistical and visual EDA techniques helped us understand underlying patterns and perform feature engineering. We then applied K-means and Isolation Forest algorithms for anomaly detection. The final phase involved developing a dashboard using tools like Python’s Dash and Plotly, providing real-time insights into the state of train cooling systems. Integration with weather data allowed us to contextualize anomalies with environmental factors.

Challenges I ran into

One of the main challenges was managing and processing the large dataset efficiently, especially with irregular sampling times and duplicate entries. Integrating real-time weather data posed another significant challenge, requiring us to synchronize disparate data sources accurately. Additionally, distinguishing between noise and true anomalies in the dataset required fine-tuning our machine learning models, which was both a technical challenge and a learning opportunity.

Accomplishments that I'm proud of

I am particularly proud of our team's ability to develop a robust anomaly detection system that can operate in real-time, a critical requirement for the operational efficiency of railway systems. The dashboard we created stands out as a testament to our project's success, providing clear, actionable insights to the rolling stock team. Our effective collaboration and problem-solving under pressure were key in overcoming the project's technical challenges.

What I learned

This project deepened my understanding of time-series data analysis, machine learning algorithms, and real-time data visualization. Working with a large, real-world dataset sharpened my data preprocessing and feature engineering skills. I also learned the importance of cross-disciplinary collaboration, as integrating meteorological data required understanding beyond traditional data science.

Built With

Share this project:

Updates