Inspiration

The inspiration for SafeStep came from standing on the manufacturing floor and watching the "Binder Gap" in real-time. In high-stakes environments, Standard Operating Procedures (SOPs) are the difference between a safe shift and a catastrophic injury, yet they are almost always trapped in static PDFs or dusty physical binders. Workers can’t safely flip through pages while their hands are on a CNC machine or a robotic arm. I realized that for safety to be effective, it has to be hands-free, heads-up, and high-fidelity.

What it does

SafeStep is a voice-interactive AI assistant designed for the "Hands-Busy" world. It allows managers to upload existing manuals or generate new ones from scratch using generative software. Once a procedure is finalized, the system assembles high-clarity, context-aware audio instructions. Workers interact with the SOP via voice commands, moving through steps only when they are ready. Every interaction is logged in a secure, audit-ready ledger, ensuring that compliance is always put first.

How we built it

SafeStep was built using a robust, hybrid cloud architecture designed for low latency and high scalability:

Gemini API parses complex industrial manuals and translates the highly technical information into conversational scripts for workers. ElevenLabs provides the high-fidelity audio engine, utilizing custom prompting to ensure the tone and pacing cut through background noise. MongoDB Atlas serves as the central data hub, storing multi-layered SOP documents and customized audio files for editing and improvement. Cloudflare R2 acts as the object storage for audio files, allowing us to stream instructions to workers with minimal data egress fees and install easily on equipment. A Node.js/Express backend paired with a React frontend creates an intuitive interface for managers to review and edit AI-generated scripts before they go live.

Challenges we ran into

The biggest challenge was the noise generated on the factory floor. Standard AI voices often sound robotic and get lost in the low-frequency hum of a manufacturing plant. I had to iterate heavily on the ElevenLabs prompting, adjusting stability, clarity, and pacing, to find a "voice" that felt like a calm, authoritative mentor that could cut through the noise. Additionally, building a "From-Scratch" generator that met strict OSHA and FDA 21 CFR compliance requirements required fine-tuning my prompts for Gemini to ensure the generated instructions met the standards and safeguard workers.

Accomplishments that we're proud of

I successfully integrated a three-way AI and data pipeline (Gemini for text → ElevenLabs for voice → MongoDB for audit logging) that feels instantaneous. I'm particularly proud of the feature that allows a manager to take a 50-page technical manual and turn it into a 5-minute interactive audio guide in under 60 seconds. I also managed to keep the architecture localized to the systems that use it, so the application stays lightning-fast regardless of where the facility is located.

What we learned

Building SafeStep taught me that in industrial AI, reliability is a feature. I learned a significant amount about audio compression in high-noise environments and how to use semantic search in MongoDB to help managers find specific safety steps across thousands of documents. Most importantly, I learned that user-centered, customizable design is fundamental when creating a system that involves diverse individuals, work styles, and levels of understanding.

What's next for SafeStep

The roadmap for SafeStep involves moving into visual aid as well.

Computer Vision Integration: Incorporating real-time form analysis to visually confirm a worker is wearing the correct PPE before the audio instruction begins.

Wearable Support: Deploying SafeStep to smart glasses and industrial headsets for an immersive learning experience

Another useful feature: Offline Mode: Developing a localized caching system so that even if a facility's Wi-Fi drops, the safety instructions never stop.

Share this project:

Updates