Inspiration
Last month, my mum received a call from "Bank Support." The voice on the other end didn't just know her name; it sounded exactly like the branch manager she had spoken to weeks prior. She was seconds away from authorizing a transfer when I walked into the room. I noticed something subtle a robotic "smoothness" in the pause between sentences and hung up.
It was a deepfake clone. She was lucky, but luck is not a security strategy.
That moment of panic sparked AspitaTech. We realized that in the "Zero-Trust Era," where Generative AI can clone a voice in 3 seconds, we cannot rely on our ears anymore. We need a firewall for human connection.
What it does
AspitaTech is a Biometric Voice Firewall that sits between you and your audio stream (Zoom, Teams, Phone). Instead of just listening for keywords, it analyzes the physics of the voice in real-time.
- Live Defense: Monitors audio streams and provides a "Traffic Light" overlay (Green = Verified Human, Red = Synthetic/AI).
- Physics Check: Detects "Jitter" (frequency perturbation) and "Shimmer" (amplitude perturbation)—micro-tremors that biological vocal cords produce but current AI models often fail to replicate.
- Forensic Lab: A drag-and-drop tool to audit suspicious audio files (like WhatsApp voice notes) for fraud evidence.
⚙️ How we built it
We built the core engine using Python and Librosa for acoustic signal processing.
- The Brain: We used a hybrid approach. A fine-tuned Wav2Vec2 model detects semantic anomalies, while our custom Physics Engine calculates the biometric liveness score.
- The Interface: We used Streamlit for the frontend to create a "Glassmorphism" dashboard that looks like a high-end security tool.
- The Engineering: We engineered a custom
AudioRecorderclass that taps into the system loopback (WASAPI on Windows, PulseAudio on Linux) to analyze system audio without needing virtual cables.
Challenges we ran into
"The Cloud Has No Ears." Our biggest engineering headache was deploying to the cloud. Streamlit Cloud (Linux) has no physical sound card, which caused our audio drivers (PyAudio) to crash the entire server.
- The Fix: We wrote a "Cloud-Safe Protocol" that detects the environment. If it sees a server, it switches to a "Simulation Mode" (Dummy Recorder) to prevent crashes, while keeping the "Forensic File Audit" fully functional.
Accomplishments that we're proud of
- Building a Hybrid Detection System (AI + Physics) that catches deepfakes that purely neural-network detectors miss.
- Creating a Simulation Lab that allows users to test "Real vs. Fake" audio instantly to trust the system.
- Designing a UI that feels like a "FinTech Product," not just a script.
What's next for AspitaTech
- Shae.ai Integration: We want to deploy this as an API for wellness platforms to ensure patient empathy is based on real human voices.
- Mobile SDK: Moving the inference engine to ONNX to run locally on Android/iOS for secure phone calls.
Log in or sign up for Devpost to join the conversation.