Inspiration

According to engineers at NMC, every day, thousands of dollars worth of GPU uptime in HPC datacenters are wasted due to preventable downtime during maintenance. The current implementation of maintenance resolution within datacenters is inefficient. Repetitively creating Jira tickets for every issue in a datacenter is tedious. Some even go around the system by using sending quick "hey can you fix this" messages on slack.

Work orders can take months to be resolved, but need to be done in days. A big inspiration was being able to use my knowledge in automation to solve a real problem.

Various parties can benefit from this solution. Engineers are no longer required to create jira issues again and again, datacenter technicians no longer work with fragmented workflows, and companies can save thousands of dollars worth of resources.

What it does

This solution takes the current tedious, inefficient system of jira tickets and transforms it into a streamlined, expedited process by creating work orders that are created once, and managed automatically. Maintenance is not only quick, but documented thoroughly through this solution. It provides end to end oversight of work orders and even resolves issues as complete on Jira.

How we built it

This project was started with building the API framework. I first built the endpoints which would provide data and integrated it with Jira's API. I then worked on creating a dual layer Agentic RAG model which creates and provisions dynamic work orders. I used NVIDIA's Brev platform along with NIM and nemotron to build a layer for embedding and generation. The embedding layer takes multimodal data such as technical specifications, Jira tickets, and technician notes and turns them into vector embeddings. The generative layer takes those vectors along with a user query and generates instructions on how maintenance can be approached. Finally, I finished the front end, using React and Vite to create the dashboard and views for work orders and chat.

Frontend Built with React w/ Vite Backend Built with Python 3.13, FastAPI, routes library, Jira Api, langgraph, NVIDIA NIM, Brev, OpenAI embedding, nemotron

Challenges we ran into

There were various challenges throughout this project. A big challenge was scaling down the project. I Initially planned on making a comprehensive solution to many inefficiencies and adding features such as cable/pipe optimization and edge computing with handheld work devices, but realized that I was aspiring for too much and that maximization of efficiency was better than optimization for this problem.

The Jira API was also difficult because there were three versions with the earlier two having many deprecated endpoints. Building the model however, was the most challenging as I was unfamiliar with building agentic rag models and combining two models for one purpose was a challenge which stretched my skills. Having to wait for new deployments of GPUs was also tedious.

Accomplishments that we're proud of

I am proud of the integration with Jira. It removes the need for engineers to learn new business practices or switch to a completely new platform by directly interfacing with and streamlining existing workflows.

What we learned

I learned a lot, especially about lightweight models, the Jira API, and RAG AI models. I found tooling especially interesting as I never thought about looping input context until a good response was generated.

What's next for Reflex - Work Order Automation

This product has the potential to grow and further increase efficiency in the data center space. Features such as the ones I was not able to implement only add on to the benefit to efficiency and the project has the capacity to grow and integrate with other systems as well.

Built With

Share this project:

Updates