Inspiration
Online scams have become nearly indistinguishable from legitimate popups and system alerts. They are especially harmful to seniors and young users who may trust what they see on screen. We wanted to build a tool that protects people in real time by detecting scams visually, not just through network or file scans. Lookout started from a simple question: What if your computer could understand what you see and warn you when something looks wrong?
What it does
Lookout watches your screen in real time and uses on-device AI to identify potential scams. It captures the screen, parses every visual and text element using OmniParser, and sends that structured data to a quantized Phi-3 Mini model that reasons about context. Within seconds, Lookout can tell if a popup or login prompt is legitimate or suspicious. All processing happens locally, which keeps user data completely private.
How we built it
We built the backend using Flask for lightweight local communication and MLX for on-device model execution on Apple Silicon. OmniParser handles OCR and layout extraction, providing both the text and spatial information of on-screen elements. The parsed data is analyzed by Phi-3 Mini, which runs quantized for speed and efficiency.
Our frontend is built with Electron and React, styled with Tailwind CSS for a clean and modern interface. The Electron app communicates with the local Flask server to trigger scans and display results instantly. Everything runs directly on the device with no external calls.
Challenges we ran into
At first, we relied only on standard OCR, which worked but missed important layout context. We later switched to OmniParser, which combines OCR with spatial understanding of on-screen elements.
However, OmniParser initially took far too long to process full screens. We profiled the pipeline and identified bottlenecks in the vision model. After experimenting with replacements and even attempting a full model swap, we ultimately decided to simplify. We focused only on YOLO for element detection and OCR for text extraction, which gave us faster, more consistent results.
We also dug deeper into OmniParser’s workflow to better understand its internal structure and identify inefficiencies. One big improvement came from filtering out empty or irrelevant text boxes, which significantly reduced processing time.
A few other challenge we ran into were: 1) The model was parsing its own on-screen warning messages during inference, leading to a recursive loop. We addressed this by making the window invisible to screenshots. 2) Duplicate alerts were being published when the user stayed on the same window for an extended period of time. We created a cache that stored recently alerts, and utilized the LLM-as-a-judge pattern to filter duplicates out.
Accomplishments that we're proud of
We are proud that Lookout performs full-screen OCR, layout parsing, and AI reasoning locally in seconds. Achieving this entirely on-device means users get both privacy and performance. We are also proud of the simplicity of the interface and how seamlessly it connects advanced AI with real human protection.
What we learned
We learned how powerful local inference can be when optimized correctly. Combining tools like OmniParser, MLX, and quantized language models allowed us to create something that feels immediate and trustworthy. We also learned a lot about user trust and the importance of clear, privacy-focused design in security applications.
What's next for Lookout
With real-time detection and on-device reasoning already working, our next step is to expand what Lookout can understand and protect against.
We plan to add cross-modal awareness, allowing Lookout to correlate on-screen content with system context such as open applications or network activity. This would let it detect scams that appear legitimate visually but trigger suspicious background behavior.
We also want to develop adaptive user protection, where Lookout learns individual user habits over time to tailor alerts. For exaxmple, it could recognize a user’s normal login screens or payment workflows and flag deviations automatically.
Another direction is multi-language detection, so Lookout can identify scams written in other languages or mixed-language popups.
Finally, we want to build Lookout Lite, a mobile and browser extension version that uses distilled models for lightweight real-time protection anywhere users browse or work.

Log in or sign up for Devpost to join the conversation.