Long short-term memory (LSTM ) is an artificial recurrent neural network architecture used in the field of deep learning. It is commonly used for time series prediction, speech recognition, music/rhythm learning, handwriting recognition, and sign language translation. It is very difficult to visualize for the average software developer. Our team was inspired to tackle this problem by making this machine learning model as easily accessible as possible. This was done by developing a natural language processing machine learning project with a data visualization.
What it does
When the user enters in a review, the model analyzes whether each word is of positive or negative sentiment. When the user enters in a review, the model analyzes whether each word is of positive or negative sentiment. It outputs the review with each word highlighted green or red respectively.
How we built it
The model was built using Keras based on an IMDB movie review sentiment classification dataset. We created and packaged a BentoML service and were able to get it Dockerized.
Open Source Technologies
The training and definition of the model was done through Keras. We served and containerized the model using BentoML. Data pre-processing was done using scikit-learn.
Our team of three people from different time zones applied the Open Source best practices using Github for easier collaboration. We used Github Projects to plan our project on a high level in addition to the use of branches, pull requests, reviewing each other’s code, writing a comprehensive README, and using issues to track tasks.
Challenges we ran into
- Coordinating conflicting schedules and timezones. We missed being able to pair program
- Deciding on a simple neural network architecture to implement and train
- Recovering the character-by-character breakdown was kinda hacky
- Finding a way to deploy the application. We tried pythoneverywhere, heroku, and firebase - ultimately decided to use Github Pages
- Github Merge conflicts
- Figure out how to serve the model using BentoML and Docker struggles
Accomplishments that we're proud of
It works! We were really proud of the fact that we managed to go from 0 BentoML knowledge to a working BentoML deployment in the span of a hackathon. The hack is complete and the team achieved everything we wanted. For a live demonstration of our complete hack https://github.com/MLH-Fellowship/0.1.2-sentiment-analysis-visualization.
What we learned
- Building a data modeling pipeline end-to-end
- How to package and serve a BentoML model
- Implement the Keras tokenizer outside of Keras
What's next for Movie Review Sentiment Analysis
- Train with other data sets to test how sentiment of other topics is shown (since IMDB dataset was used exclusively).
- Make the Web UI more clean