紫紫帮Violet Gang

Inspiration

As online chatting becomes an integral part of our lives in an era profoundly influenced by Covid-19, a more comprehensive mechanism is needed to filter the fraud or malicious information online that could probably harm the underage community. Chat rooms can be a great online space to hang out for kids and especially teens who share similar interests, thoughts, or experiences. But at the same time, the same place can be a dark and dingy alley infested with online predators. Since there is a lack of supervision for online chatting content, there is an urgent need for a mechanism that regulates the gray area and protects the mental and physical health and safety of teenagers.

What it does

Detecting malicious use of words related to sexual harassment or with intent to defraud, warn those users with these behaviors, and censor the inappropriate messages. When someone is suspected to be minor, turn on the detection mechanism and tell the other user to mind their behaviors. When the user abuses language multiple times, the platform would warn users with pop-up windows, and they will be banned.

How we built it

For the chat room platform, we developed a web application that can run on both mobile devices and computers. It is modified based on an existing chat room platform. We design our own UI/UX and add the supervision mechanism inside the platform. For filter function, we build a hashmap for storing ban words, and applies the DFA algorithm to realize censoring and replacing The key to the retrieval of sensitive words lies in the storage and fast search of sensitive words. For the storage of sensitive words, we employ HashMap and LinkedList data structure, which is way faster than the normal traversal. Since it builds the sensitive word database with multiple keywords as the subnodes, and attach more words that contribute to a ban sentence as their subnodes, we could prevent repetitive check and therefore save time. For the searching algorithm, the program would iteratively find the value that each key in the input request corresponds to in the hashmap. When the input contains sensitive words, they will be marked. Single sensitive characters will also be marked. And, even though some characters are pointing to sensitive words, when they are not marked as sensitive themselves, the program will not make a false alarm.

Challenges we ran into

We are new to Javascript, thus are not familiar with calling certain complex functions, while a chatroom requires complex front-end designs; As four CS and data science major students, there is no one specializing in UI/UX design, so we learned it from scratch; We ran into problems transforming our UI/UX design into CSS language; Accomplishments that we are proud of We implemented DFA to avoid keyword detection being interrupted by other characters in the middle; The censor algorithm can process inappropriate messages in as fast as 148 milliseconds, therefore can realize real-time censoring; The protection of minors is a serious problem and a grey area lack of regulation, and realizing such a mechanism is meaningful, and it filled the gap by enabling more efficient and automatic regulation;

What we learned

We had no experience building the front end of a web application or designing the UI/UX, but we tried doing so in this contest; We had a deeper understanding of the process of building web applications, and we realized we have much to learn and explore in the future; We had learned some algorithms to process the text with high efficiency and accuracy;

What’s next….

Censoring inappropriate images with computer vision using deep learning techniques; Conducting real-time NLP on messages is difficult, but when suspicious messages appear, they can be first sent to human examiners to check whether the context is healthy for the underage;