- Right now when addressing underrepresented communities such as refugees, immigrants, poc and more you describe it all simply in three categories: Wrong information, No communication, and No Trust .
- Underrepresented communities are not receiving the right communication regarding the Census, their community, COVID-19, and may more instances.
- When they are to receive information, it is sometimes late, mistranslated, or just not even bothered to translate and be accessible.
- Based on previous research from MIRA, many major immigrant community, such as Vietnamese, Arabic, Haitian-Creole , Khmer, Lao, Spanish-speaking and Brasilian, are usually found in Facebook Messenger and WhatsApp
- These communities are supposed to be reached out in a more trust-worthy way based on our previous assumptions
What it does
- An AI-based chatbot on social media chats such as Facebook’s Messenger and Whatsapp, which is expected to propagate the information of 2020 US Census and COVID-19 to everyone in need, especially the non English speaking immigrants.
- This project is aimed at ensuring the inclusion of immigrants and refugees, especially undocumented folks, in the Census. We are reaching the users where they already are at: Facebook, WhatsApp, instead of throwing them off with a suspicious text message. ## How we built it
- We implemented the bot based on TFIDF methodology with Python, Twilio and Flask.
- This project is deployed on heroku platform so that it is able to serve the client 24/7!
- We used the TfidfTransformer feature from sklearn. TFIDF develops an algorithm based on the frequency that particular phrases are found in the data set. It characterizes words as important and unimportant based on their frequencies.
- We create a list of questions and answers in the three different languages, and then run the TfidfTransformer function on this list.
- When the user sends a message, their phrase is matched to the closest question, and the corresponding answer is sent
If there is not a question with a high enough similarity factor, then the user is forwarded to a human representative
Challenges we ran into
There are two main obstacles. The first one is the lack of training data. In terms of machine learning, the more adequate the data is, the better the output usually is. However, even though we tried to augment the data, the data could barely guarantee a satisfying result. This led us to switch to an alternative: we were no longer keen on building a totally generative bot, instead we built a retrieval robot based on TF/IDF methodology. In other words, this sort of bot retrieves the one that is most likely to be the answer from the answer pool. It works well if being asked the questions that it knows, otherwise human assistance is in need. Defining the scope and goal of the project within one sprint. It was easy to dream big, but we decided to meet and develop well to the most immediate needs of our standing partner MIRA. Prioritization was a challenge but also key to our deliverables for this project.
The second difficulty is that we had a hard time selecting a suitable platform to deploy our project. We tried a lot of services including AWS, Google Cloud and heroku. All of them did not work until we successfully downsized the project and make it run on heroku.
Accomplishments that we're proud of
Proud of the abundant testing and outreach to immigrant communities and users in order to design and develop according to their responses regarding conversation language, bot personality, and attitudes of chatbots in general.
How easy and intuitive it is to install!
What we learned
Working with a partner directly like MIRA and how to work when waiting for materials and communication in order to not fall back and deliver.
Collaboration to bring the project together, and working in a completely diverse team full of different backgrounds, ethnicities, and languages.
The basics of cloud deployment through Heroku
So in short, every day we spent on this project is a learning day.
What's next for Social Media Chat Bot for 2020 Census Propagation
- We plan on improvements on cloud development: with the increasing FAQs collected, chances are that our slug size will be a lack. In this case, we are going to select a more cost-saving platform to host our work.
- And according to the representative from MIRA, her department is hoping to design a chat bot shown on their website. So we are going to transfer what we have done to this website and any other optional social media apps such as snapchat and WeChat. This would maximize our impact as far as we reach deeper and farther, and is a promotion campaign to increase knowledge to both users and organizations, as well as incorporating a community announcement section.