LavinMQ Guardian

On-demand model serving and update with LavinMQ message broker

Inspiration

We were inspired by the inefficiency of reactive maintenance. We wanted to build a proactive, self-improving system that could predict machine failures before they happen, using the power of real-time data and a human-in-the-loop to continuously improve the model. but serving on demand, to either machine or human, can be tricky. The demand may vary, from very high during the day, to very low during the night.

If you use virtual resources in the cloud, autoscaling can help with adding more resources when needed, and then freeing them when demand decreases. However, autoscaling is not nimble enough to cope with accidental spikes. To deal with such situations, our on-demand solution needed a message broker, LavinMQ, allowing us to cope with demand spikes, such an approach is more resource-efficient.

What it does

The LavinMQ Guardian is a real-time predictive maintenance system. It ingests telemetry data from machinery, uses a machine learning model to predict potential failures, and automatically updates the model as human-labeled data becomes available. This ensures the model's predictions are always accurate and current.

How we built it

We used a microservices architecture with LavinMQ at the core. We set up separate services for data ingestion, human-in-the-loop labeling, model training, and model serving. Each service communicated with the others exclusively through dedicated LavinMQ queues, ensuring a decoupled and resilient system. We used Python for our machine learning and serving components, and a simple web interface for the human labeler.

Challenges we ran into

Handling the continuous, high-volume stream of telemetry data was our biggest challenge. We also had to design a robust mechanism for the on-demand model updates to happen without interrupting the serving process. LavinMQ's high throughput and efficient connection management were key to overcoming these hurdles.

Accomplishments that we're proud of

We successfully implemented a fully automated, end-to-end model training and serving pipeline. We're particularly proud of the seamless, zero-downtime model update process, and the human-in-the-loop system that allows the model to continuously learn and improve on its own.

What we learned

We learned the importance of asynchronous communication in building scalable, real-time systems. LavinMQ's reliability and performance proved to be crucial for managing complex workflows and ensuring data integrity between different services.

What's next for LavinMQ Guardian

We plan to expand the system to handle more complex data types, such as video and audio streams, for advanced anomaly detection. We also want to develop a more sophisticated alerting and visualization dashboard for end-users, and explore integrating the system with existing industrial control systems.