'There is nothing more precious to a parent than a child, and nothing more important to our future than the safety of all our children', a quote by William J. Clinton that aptly encompasses the dire need for us to be concerned with the safety of our children, the need for us to let them live in a 'Safe Space'. It is with much sorrow that we discuss the unfortunately numerous recent occurrences of school shootings in the United States, in addition to the abysmal amount of bullying that takes place both in and out of school; child safety has never been a bigger topic of concern. It is too difficult a task for adults to be aware of all that is happening in a child's life while maintaining and respecting the child's privacy, thus we develop a solution using AI, Natural Language Processing, and Speech to Text to address these extremely vital concerns and allow for children to live in a 'Safe Space'.
What it does
Our project 'Safe Space' transcribes conversations between children, capturing the semantic meaning using natural language processing technologies to represent the nature of the verbal transactions. It does so by classifying each phrase in a conversation into a linear scale, qualitatively quantifying the 'danger' present in this conversation, i.e how much negative talk is present. We define dangerous speech as any malicious speech that could adversely affect a child and result in physical, mental or bodily harm. Examples of such include words that carry strong negative connotations such as 'hate', 'kill', 'ugly', etc. which are very indicative of a conversation that could be harmful to a child mentally, and possibly lead to physical harm as well.
Our model which was fine-tuned on state of the art classifiers is able to parse the transcribed speech and extract the semantic meaning and notify responsible adults to direct their attention appropriately when very harsh phrases are used. This approach conceals the actual contents of the conversation, thus respecting and maintaining the privacy of all individuals, while allowing capable adults to offer help, guidance and prevent further altercations. Hence, this is a solution that provides a 'Safe Space' for children where the contents of their conversations are not monitored by adults, however the semantic meanings are represented such that if there is any indication of current or potential verbal abuse, corrective actions are put in place.
How we built it
Our project is built on multiple modular components. These modules include, but are not limited to AI for speech recognition and natural language processing, relational databases for effective data management, graphical user interfaces with both web and local computer front and back-ends. We connect these modules to make an end-to-end product and solution and demo a working early stage prototype that communicates the promise of this approach and the positive impact it could have.
We utilize AssemblyAI's Speech Recognition API to transcribe conversations from audio signals (speech) to text strings (text), thus rendering it a computer interpretable medium. To extract semantic meaning we use one of Cohere's Natural Language Processing model for text classification, allowing us to obtain an understanding of the semantic meaning. We carefully collected and curated a custom dataset of over 200 different phrases which represent 5 different levels of 'danger' with respect to verbal speech. This allows us to fine-tune one of Cohere's large NLP Text Classification models, and utilize it for our problem space. AssemblyAI's speech to text API allows us to extract the input text, which we preprocess, clean and filter before entry into the NLP model, thus allowing to effectually combine two state of the art AI solutions for a task. As both institute's provide support in python, we deduced python the best medium for connection.
Data Management and GUI
As important as the technological backbone is, it is just as important to have an effect way of managing data throughout multiple different areas of this project. To notify an individual, data has to go from a medium/device that records the signal, to the locality at which it is being presented. We do all such operations using CockroachDB's database services, employing PostgreSQL in our project. To build the audio collecting end/node, we use PySimpleGUI and PyAudio to stream and splice speech that is fed into (AssemblyAI) speech to text API. After processed by Cohere's NLP models, the data is uploaded to CockroachDB/PostgreSQL databases, and displayed on a web-front-end built using Django, HTML, and CSS. This is representative of the display a responsible adult would be presented with in the case, there is also a more detailed view to confirm the effectuality of our Machine Learning methods, and to allow for debugging and comparison with human performance.
Challenges we ran into
The biggest challenge we ran into was connecting all moving parts within the given timeframe. While we strongly believe our idea has been implemented in a strong fashion, effectually proving the potential good this solution could have, we found that building the front-end and GUI was a rather time consuming process. We approached this challenge by implementing the necessary components for a proof-of-concept demonstration to show off our solutions ability to take advantage of these state of the art AI technologies and provide clarity as to how this platform could potentially function.
Accomplishments that we're proud of
We are extremely pleased with our ability to integrate multiple different APIs to build an end to end solution that takes the raw audio streamed either live or asynchronously, and processes it in real-time using NLP and provides functionally accurate semantic readings of the text, sufficiently enough to provide feedback that allows adults responsible for children to monitor not the contents, but the negativity of their speech. We were able to use APIs and technologies provided by more than 3 of the hackathon's sponsors such as AssemblyAI, CockroachDB and Cohere to create and end-to-end working solution to a very relevant real world problem that is a subject of great discussion currently, even more-so after the most unfortunate recent events that have prompted individuals all over North America to worry about the safety of their children and loved ones.
What we learned
This hackathon was a great learning experience for our team. We learned a great deal about communication, not just communicating between teammates, but also communicating different modules of our project with each other. At the very fundamental level, our project is one that allows children to better communicate their problems with adults, it gives them the chance to send a cry for help without demanding their courage to speak out. We believe we made a great use of our resources by compartmentalizing and collaborating with our teammates on multiple problems to deliver and effective solution. In terms of technical skills, it was our first time using these APIs, and more than 70% of the toolkits used in this project were completely new and unseen to the members of this team. Thus, this experience enhanced our collaborative team-work skills in addition to our technical agility.
What's next for Safe Space
The dream of Project Safe Space is to revolutionize child-care and child-adult communication. In order to do so, here is a short but not exhaustive/comprehensive list of what we believe would take our project even further:
- Support for multiple peripheral audio devices
- Smartphone support for the adult's monitor/web-front-end
- Larger and more diverse datasets to increase robustness of model
- More visually appealing and salient GUI
- Allowing for a self-learning mechanism where data is continually fed back to optimize performance.