Inspiration Factory floors have a problem: broken machines, no internet. A technician stands in front of a conveyor with a fault code E04 and a burnt relay, but no manual nearby. Maybe they have a blurry phone photo. Maybe a voice recording of what the machine sounded like before it died. They call a supervisor. They wait. Downtime costs money. We wanted to build something that works offline. A local assistant that doesn't need the cloud, doesn't need connectivity, doesn't phone home with proprietary data. Something a tech can trust because every answer points back to: "here's the manual page," "here's what we fixed last time," "here's the part number."

What it does FixFirst Edge is an offline multimodal search engine for industrial maintenance. You give it: Text: error code, symptom, machine ID Image: photo of the broken part or schematic Voice: technician's description (transcribed locally) Filters: machine type, model number, fault severity

It returns three linked pieces of evidence:

Manual section — the exact PDF page and excerpt from the equipment manual Similar incident — a past fix on the same or similar machine (with downtime, parts replaced) Candidate part — the part number most likely to need replacement Everything is traceable (not LLM prose) and runs entirely offline—no cloud, no API calls, no data leaving the plant.

How we built it Architecture Frontend: Next.js 14 (App Router) + TypeScript + Tailwind. Single search box accepts text/image/voice and renders evidence in three columns. Backend: FastAPI (Python 3.11/3.12) wraps Actian VectorAI DB gRPC client. Exposes /api/diagnose (primary) + lower-level /api/search/* endpoints. DB: Actian VectorAI DB in Docker. Single incidents collection with three named vectors:

text_vec (BAAI/bge-small-en-v1.5, 384 dims) — manual text, incident descriptions image_vec (CLIP ViT-B-32, 512 dims) — schematics, damage photos audio_text_vec (bge-small over whisper transcripts, 384 dims) — voice notes

Local models: sentence-transformers, faster-whisper (tiny.en), pdfplumber. All CPU, ~1.3 GB cached, no internet required.

Pipeline Ingest: User drops PDFs, CSVs, images, voice files into data/raw/ Backend parallelizes: PDF → chunks (pdfplumber) → embed with text_vec Image → CLIP embed into image_vec, tag with metadata (machine_type, fault_code, part_no) CSV (incidents, parts, error codes) → parsed, metadata indexed (doc_type, machine_type, model_no, fault_code, severity, part_no) Voice → transcribe (faster-whisper) → embed with audio_text_vec All → upserted to Actian with FilterBuilder on 6 indexed keyword fields Search (text query example): Embed query with text_vec Hybrid RRF fusion: Lane 1: dense ANN on text_vec (top-50) Lane 2: extract fault_code/model_no from query → re-run ANN with strict filters on those fields (Actian-native) Merge with reciprocal rank fusion (k=60) Return top hit Diagnose (multimodal, the main flow): Image query: image_vec ANN → hit's metadata (model_no, fault_code) seeds text hybrid Text query: hybrid RRF directly Voice query: transcribe + hybrid RRF + fuse audio_text_vec ANN For each: extract manual_doc_type filter → 3 additional scoped retrievals (manual section, incident, part) Template evidence: manual excerpt + incident fix summary + part number All runs ~850 ms median, ~1100 ms p95 on a 16 GB laptop, CPU-only, over 3 PDF manuals, 30 incidents, 25 parts, 13 error codes, 6 images, 5 voice notes.

Challenges we ran into Multimodal in one collection, not three Problem: Naive approach = separate collections for text, image, audio. Cross-collection joins = slow + complex. Solution: Actian's named vectors — one incidents document can carry all three embeddings. Query by modality. No joins. Single RRF pass over hybrid lanes. Image search doesn't work for maintenance Problem: Two schematics look identical but have different fault codes. A burnt relay in a photo is just pixels—you can't diagnose purely from pixels. Solution: Image → metadata bridge. CLIP finds nearest image, extracts its metadata (model_no, fault_code), then runs text hybrid with that metadata as context. Image is entry point, not oracle. Accomplishments that we're proud of Genuine offline-first design. Not "cloud with offline fallback." Everything local by default. Runs on a 16 GB laptop, CPU-only. Disconnect WiFi after ingest—app doesn't blink. All three Actian features used intentionally. Named vectors isn't cosmetic. Filtered search isn't optional. Hybrid RRF isn't a nice-to-have. Each solves a real problem in the diagnosis flow. Multimodal without separate collections. One incidents collection, three embedding spaces, one RRF pipeline. Simpler, faster, more maintainable. Traceable evidence. Every answer points to a source. No hallucination surface. Technician can verify against actual manual or actual past incident.

What we learned Vector DBs are infrastructure, not magic. Named vectors aren't about fitting more vectors in one place—they're about keeping related data coherent so your query logic doesn't have to. Same goes for filters. We learned to think of a vector DB as a structured index, not a black box. Hybrid retrieval is worth the complexity. Dense semantic search alone misses error codes and part numbers. Exact metadata matching alone misses symptom phrases. RRF over both lanes is genuinely better than either alone. The math is simple (reciprocal rank) but the insight is deep: let the algorithm decide which signal matters. Offline is a feature, not a constraint. We started thinking "how do we make offline work?" and ended up building a stronger product. No latency, no data egress, no auditability loss. Industrial sites prefer this. It's not a compromise. What's next for Firstfix-edge-master Short term (next 30 days) Real data pilot. Partner with a small manufacturing site to ingest real manuals, past incidents, parts catalogs. Validate latency and accuracy on real workflows. Fine-tune filtering. Current 6 indexed fields (doc_type, machine_type, model_no, fault_code, severity, part_no) are a start. Real data will reveal what other fields matter. Voice note UI. Currently records WAV files. Need in-UI transcription display so technician can verify before searching.

Built With

Share this project:

Updates