Inspiration
Financial scams are increasingly multimodal: a fake invoice with a QR code, a phishing email, and a follow‑up scam call are all parts of the same attack. We wanted a Snapdragon laptop to act like a local fraud analyst that watches what you see and hear, without sending any data to the cloud.
What We Built
FinCrime Multimodal NPU Engine is an on‑device system that fuses vision, audio, text, and OCR into a single explainable risk score. Frames and audio buffers are captured (CameraCapture, AudioCapture), passed through inference engines (vision.py, audio.py, text.py, ocr.py), then combined by a logistic fusion model in RiskFusionEngine. A Streamlit dashboard (app/ui/dashboard.py) exposes six pages - real‑time detection, model performance, risk fusion, interactive demos, and architecture so that we can see the engine “think” in real time.
How We Built It & What We Learned
We used Qualcomm AI Hub to prepare NPU‑ready ONNX models, ONNX Runtime + QNN for inference, and modular Python components for each modality. We learned that on‑device AI is not just about a single model: you need clean capture, preprocessing, orchestration, fusion, and logging (EncryptedLogger) to build something trustworthy and explainable. The fusion tab and rule‑based fallbacks in text.py taught us how important explicit, interpretable logic is when you’re dealing with financial risk.
Challenges
The main challenges were: wiring everything to run fully on‑device on Windows (getting onnxruntime, audio, and Streamlit to cooperate), designing a fusion model that’s both simple and serious, and building a UI that is visually impressive but still directly connected to real pipeline code. Balancing “hackathon‑fast” with a production‑style architecture was the hardest yet most rewarding part.
Log in or sign up for Devpost to join the conversation.