Inspiration
Communication is one of the most universal human struggles yet most tools built to help are either too expensive, too complex, or too slow to give feedback when it actually matters. The name Zariya means medium or channel in Urdu/Hindi and that's exactly what we wanted to build: a bridge between what you mean to say and how the world actually hears you. We were inspired by the millions of people who freeze in interviews, lose confidence mid pitch, or simply never get feedback on how they speak not just what they say. We wanted to build something that works in real time, feels human, and is accessible enough to reach anyone even on WhatsApp. But then we pushed further. What if the same voice understanding layer could detect distress? What if someone in danger could trigger an SOS just by speaking? That's when Zariya became more than a communication coach it became a safety net too.
What it does
Zariya is a real time AI powered communication and safety platform with two core modes:
- Communication Coach
Records your speech via mic and transcribes it in real time Analyzes your response for filler words, pauses, clarity, tone, and confidence Uses Google Gemini to generate structured feedback + an improved version of your answer Plays back the enhanced response in a natural human voice via ElevenLabs TTS so you hear how you should have said it Delivers a full feedback report directly to your WhatsApp via the ElevenLabs WhatsApp Agent no app download needed Zapier Webhooks trigger automated downstream actions logging sessions, notifying contacts, or firing follow up workflows the moment a session completes
- Safety Layer (911 / SOS Mode)
Continuously listens for distress signals in speech keywords, tone shifts, panic patterns When triggered, instantly sends an SOS alert or '911' text message to a pre configured emergency contact Zapier Webhooks power the SOS automation pipeline the moment distress is detected, a webhook fires and Zapier handles the alert routing, SMS dispatch, and emergency contact notification without any manual intervention Works silently in the background, making it accessible in high risk or vulnerable situations
How we built it
Frontend: React based web app with real time mic input, audio visualization, and a clean feedback dashboard Backend: Flask and Python that orchestrates the full pipeline:
Speech to Text: Mic audio streamed and transcribed in real time Speech Analysis: Custom logic to detect filler words, pause frequency, pacing, and confidence markers Google Gemini API: Prompt engineered to understand interview context, generate structured feedback, suggest improvements, and produce a rewritten response Speech to Speech: Raw spoken answer → Gemini refined text → ElevenLabs TTS → played back as natural audio ElevenLabs WhatsApp Agent: Delivers the full feedback report as a WhatsApp message post session Zapier Webhooks: We built custom webhooks that connect Zariya's backend events to Zapier automation flows including session completion triggers, SOS alert routing, emergency SMS dispatch, and feedback report delivery. This let us automate complex multi step workflows without building each integration from scratch SOS System: Keyword + tone detection layer that fires an emergency SMS to a saved contact when distress is detected routed through the Zapier webhook pipeline for reliability and speed
Knowledge Base: A structured interview tips and communication patterns knowledge base that Gemini queries to generate contextually relevant, personalized feedback
Challenges we ran into
Real time audio latency was our biggest enemy. Streaming mic audio, transcribing it, sending it to Gemini, getting a response, and playing it back via ElevenLabs all in near real time required aggressive pipeline optimization and async handling to avoid lag that would break the experience. Gemini prompt engineering took far more iterations than expected. Generic prompts produced generic feedback. We had to carefully structure prompts with role context, output format constraints, and knowledge base injection to get responses that were actually specific, useful, and consistent. ElevenLabs voice sync getting the TTS playback to feel natural and timed correctly with the UI feedback display required careful audio buffering and state management on the frontend. ElevenLabs WhatsApp Agent setup was non trivial configuring the agent, mapping session outputs to structured WhatsApp messages, and handling delivery reliability took significant debugging. ElevenLabs client session server going down mid hackathon was a real crisis. We had to quickly build a fallback TTS layer to keep the demo alive while the service recovered a stressful but valuable lesson in resilience engineering. Zapier Webhook reliability ensuring webhooks fired correctly under real time conditions, with proper payload structures and error handling, required careful testing. Getting the SOS pipeline to trigger instantly and reliably with zero tolerance for failure was particularly high stakes.
Accomplishments that we're proud of
Built a fully working speech to speech pipeline mic input → transcription → Gemini refinement → ElevenLabs voice playback end to end, in real time Successfully integrated the ElevenLabs WhatsApp Agent to deliver feedback reports outside the app entirely Shipped a dual purpose platform communication coaching AND emergency SOS detection in a single hackathon window Built and connected Zapier Webhooks to automate the entire post session and SOS alert pipeline turning single events in our backend into multi step real world actions Survived a live API outage and kept the demo running with a custom fallback Built something that genuinely feels human the voice feedback doesn't sound like a robot, it sounds like a coach
What we learned
Prompt engineering is an engineering discipline. Vague instructions to an LLM produce vague results. Structuring prompts like you'd structure an API contract made all the difference. Real time audio is hard. Latency compounds at every step transcription, inference, TTS, playback. Every millisecond matters when you're building something that has to feel instant. Third party APIs will fail. Building without a fallback plan is a liability. Our ElevenLabs outage forced us to think in redundancy, which made the final product more robust. Webhooks are underrated infrastructure. Using Zapier Webhooks to handle automation flows meant we could wire up complex multi step actions SOS alerts, session logging, WhatsApp delivery in a fraction of the time it would take to build each integration manually. Accessibility is a design decision. Delivering reports via WhatsApp instead of forcing users into an app wasn't an afterthought it was the right call, and it opened Zariya up to a much wider audience.
What's next for Zariya
Mobile app with always on background listening for the SOS layer making it a true personal safety tool Expanded Zapier automation library pre built Zaps for common workflows like calendar scheduling post session, Slack team notifications, CRM logging for enterprise users, and escalating SOS alerts to multiple contacts in sequence Multi language support Zariya's name is rooted in South Asian languages; its reach should be too Personalized coaching profiles track improvement over time, identify recurring weak spots, and set communication goals Live interview simulation mode AI interviewer asks questions, Zariya coaches in real time, full session report delivered post call Enterprise tier team communication coaching for sales, HR, and leadership training Deeper Gemini integration multimodal analysis including facial expression and body language via camera input
Log in or sign up for Devpost to join the conversation.