Inspiration
Field engineers face two massive obstacles when working with critical hardware: they either cannot access the internet, or they are strictly forbidden from doing so. Whether a technician is deep in a concrete basement, out on a remote site, or dealing with the classic IT paradox where the network they are hired to fix is entirely down, relying on a cloud connection is a liability. Even when WiFi is readily available, enterprise security policies dictate a strict Zero Trust environment. Snapping a photo of proprietary hardware inside a secure data center or hospital and uploading it to a public cloud AI is often a fireable security violation. In these high stakes moments, the professional generally knows what they are doing. However, fatigue, stress, and the sheer volume of different parts they interact with can lead to skipped steps or overlooked baseline procedures. Fieldbook was not built to magically turn a novice into an expert or to replace a highly specific technical manual. It was built to fill knowledge gaps, remind technicians of standard operating procedures, and give experienced professionals immediate confidence to begin a repair. By taking full advantage of the Zetic Melange SDK, we realized we could build a tool that solves both problems. Fieldbook provides modern conversational AI assistance completely on the edge, guaranteeing zero reliance on WiFi and absolute data privacy.
What it does
Fieldbook is a 100 percent offline, on-device mobile assistant for field technicians. Instead of wasting time typing paragraphs of context into a generic chatbot to explain a situation, a technician simply points their camera at the part. The app instantly recognizes the hardware category and offers three contextual actions: Troubleshoot, Summarize background info, or Setup. Once an action is selected, Fieldbook launches a localized, interactive chatbot designed specifically for conciseness. It completely cuts out the conversational fluff. Because the app already knows the hardware and the user's intent, the chatbot immediately delivers to-the-point baseline reference material. It provides the standard diagnostic trees, safety checks, and foundational context the engineer needs to structure their workflow with certainty. It gives professionals exactly what they need to start a repair, without forcing them to read a generated novel in the field.
How we built it
We utilized the Zetic Melange SDK to run our entire pipeline locally on-device. To optimize for mobile hardware limitations, we designed a decoupled Visual RAG (Retrieval Augmented Generation) architecture:
- We use a base CLIP2 model. Rather than relying on generic generation, we built a local dictionary of precomputed custom embeddings for highly specific industrial hardware.
- When the camera scans a part, CLIP generates an embedding and compares it against our local dictionary to find the closest match.
- We present the top 5 hardware choices to the technician. This prevents dangerous AI hallucinations and ensures the user always has the final say.
- Once the hardware string and the user intent are confirmed, the app unloads the vision model and passes that text payload into a local LLM to generate the conversational support. ## Challenges we ran into Our biggest challenge was the hardware limitations of mobile devices. Initially, we considered using a Vision Language Model. Running a vision encoder, a projection layer, and a generative language model simultaneously in active memory would cause severe thermal throttling and app crashes. We tested a small-scale version and the app consistently crashed due to lack of memory. We had to rethink the architecture entirely. We solved this by decoupling the models and using standard text as the bridge between the vision identification phase and the generative text phase. This allowed us to manage the device's RAM efficiently by only keeping one active model loaded at a time. ## Accomplishments that we're proud of We successfully implemented a complex, two-model AI pipeline that works perfectly in airplane mode, offering a genuinely secure enterprise solution. We also created the precomputed custom embeddings, which allowed us to force the AI to be highly accurate and deterministic. This bypasses the hallucination risks typical of generative vision models. ## What we learned We learned that when building for the mobile platforms, bigger is not always better. Unified VLMs are incredible, but for strict, crucial applications, a modular, decoupled pipeline is far more resilient. We also learned how to aggressively manage memory states on mobile devices and how mathematical vectors can be manipulated locally to create lightweight, highly specific databases. ## What's next for Fieldbook Currently, Fieldbook relies on the SLM's internal baseline knowledge. Our immediate next step is to introduce a secondary, local Text RAG pipeline. By storing embedded chunks of verified PDF manuals directly on the device, we will be able to inject exact pinouts, proprietary error codes, and highly specific model data into the offline chat.
Log in or sign up for Devpost to join the conversation.