Inspiration
The inspiration for Satark hit close to home. Recently, a family friend was scammed of ₹50,000. She described a terrifying phenomenon: she received a call from an “official” and felt hypnotized. For two hours, she stayed on the call—completely disconnected from her surroundings and unaware that the theft was happening in real time.
We built Satark to be the third ear that doesn’t panic—providing a rational reality check when the user is under psychological pressure.
What It Does
In Hindi, Satark means Alert. It acts as a real-time guardian during voice and video calls, using a three-pronged AI approach to catch what a stressed human might miss:
The "Ear" (Native Audio): Using Gemini 2.5 Flash, it listens to the raw audio stream. It doesn't wait to turn speech into text; it understands the intent and tone of the scammer instantly, flagging "Digital Arrest" scripts before they can take hold.
The "Brain" (Factual Grounding): Using Gemini 3 Flash, it fact-checks the scammer on the fly. When a scammer says, "I'm sending a CBI warrant via WhatsApp," the AI uses Google Search Grounding to tell the user: "False. Government agencies do not use WhatsApp for legal warrants."
The "Eyes" (Visual Forensics): Using Gemini 3 Pro, it watches the video feed to spot deepfake glitches or inaccuracies in police uniforms that a panicked eye would never notice.
How We Built It
We designed a multimodal pipeline focused on zero latency:
Live API
We bypassed traditional lag by streaming audio directly to Gemini’s native models for semantic analysis.Multimodal UI
We built a web-based Heads-Up Display with TailwindCSS. It’s designed to be understood at a glance shifting from a calm Green to a warning Orange or a critical Red as the threat escalates.Screen-Share Integration
To bypass strict permission silos of communication apps, we used the Screen Capture API as a universal “eyes-on” tool, working across platforms like WhatsApp and Zoom.
Challenges We Ran Into
The hardest part was moving from concept to code.
We initially attempted a native Flutter app but hit major roadblocks:
- Real-time camera screen capture limitations
- Automated snapshot restrictions
- Mobile OS security constraints
Direct integration with encrypted apps like WhatsApp was impossible due to limited API access. We pivoted to a screen-sharing approach, a creative workaround that allowed our AI to see and hear across any third-party app without deep system integration.
Accomplishments We’re Proud Of
- Achieved native audio-to-intent detection, eliminating the slow transcription → text analysis pipeline.
- Saved critical seconds—essential during active scams.
- Successfully implemented Google Search Grounding, allowing the AI not just to flag a scam, but to prove it using live web evidence.
What We Learned
- Technology alone isn’t enough, psychology is half the battle.
- Scammers exploit fear, authority, and urgency.
- Multimodal AI (Audio + Vision + Search) is essential, as modern scams increasingly rely on deepfakes and sophisticated social engineering.
What’s Next for Satark
Our goal is to evolve Satark from a screen-shared prototype into a seamless system utility.
Planned next steps:
- Refine the Flutter implementation for high-frequency frame analysis.
- Improve deepfake detection accuracy.
- Build a Community Threat Map, where scam scripts detected by one user instantly update a shared database—protecting thousands of others in real time.
Built With
- api
- audio
- gemini
- react
- tailwind
- typescript
Log in or sign up for Devpost to join the conversation.