Why It Matters
Growing up in a large city is exciting. New York, San Francisco, Chicago; these sprawling metropolises have so many things to do, places to go, and people to meet. However, urban life comes with its own dangers - even with a tactful degree of caution, or "street smarts", the possibility of stumbling into a bad situation is always present.
A classmate of ours talked with us about his childhood growing up nearby in Boston. Though he loved living and exploring the city as a child, he told us that he was often unsure of whether it was safe to do so. Is anything dangerous happening nearby? What places should I avoid? He wished that there had been some way to know.
His wish isn't unique; over 250,000 people use an app called Citizen every month. According to the New York Times, Citizen employs teams of people to listen to emergency radios, the source of the most up-to-date information on potentially dangerous incidents, and send out location-based alerts. However, expansion has been slow; since debuting in New York City in 2017, they've expanded to just five cities.
People in urban areas want this information, but there are many, many cities without it.
What Villager Does Differently
Emergency radio is the best resource for live updates on crime and danger in the area, but no one has time to listen to a police scanner every day. This is where Villager comes in. We use Rev.ai's speech-to-text technology, natural language processing, and machine learning to analyze emergency radio audio and display it on our webapp. We provide this information straight from the source to the user with zero need for human resources. This, we believe, will enable both faster alert times and faster expansion to cities all over the US than Citizen can provide.
How It Works
We access live audio streams provided by Broadcastify, which we then denoise and enhance using a spectral gating noise reduction pipeline. The processed audio is then fed into the Rev.ai API for speech-to-text transcription. The returned text is then processed using a natural language processing framework that identifies key named entities in the transcript such as the threats (i.e. reported crimes and weapons) and named locations (for instance, street intersections). The named locations are then converted to exact latitude-longitude values using the Google Maps Places Search API.
The threats and locations are then broadcast to a client that renders the threats in real-time using the Google Maps API.
At HackMIT, we processed 2/3 of a day's worth of audio for the Chicago Metropolitan Area, which is comprised of 11 separate police radio feeds. From this we estimate the daily cost of running our platform per app would be less than $100, the bulk of which is accounted for by the cost of the speech-to-text processing.
Challenges, Lessons, and the Future
We realized early on that the audio quality of radio transmissions are not that great, so we had to do a lot of research on various audio processing techniques. We ended up building a custom pipeline that would remove a lot of the background noise in the audio, therefore priming it for processing by the Rev.ai API.
The other major challenge was making sure everything worked well together. This was partly mitigated by having a good division of labor, but we still lost quite a bit of sleep.
In the future, we'd like to see how well our product scales to more and more cities. Additionally, Broadcastify has a large archive of past transmissions, which we believe would be the basis of an incredibly interesting dataset for researching urban crime when processed with our app.