"Tell me a story" -- these are words we hear from children, friends, and even Tinder dates. With a finite number of stories in the world, it is valuable to find ways to construct new stories to tell those around us. What better way to create a totally random story than to mix a bunch of unrelated topics together? Even better, what if a machine did it?
This is an amazing area to explore for children. There is no limit to their imagination, and the more unusual the story the better. We would be reinforcing their creativity mindset from a young age, and creativity is the key to the greatest innovations and things we know today. By providing them the potential to imagine through the unusual stories we tell, they will inevitably be sharper and have a more memorable childhood.
What it does
StoryGen does exactly that: it is able to generate a variety of unique stories from image(s). It analyzes an image (or multiple images) that the user uploads and determines what the best corresponding caption(s) would be. Using those captions as a prompt, it generates multiple creative stories, which are displayed in a simple and appealing interface. In the case of multiple images being uploaded by a user at once, the model will generate stories related to all of them together!
How we built it
The backend was built using state-of-the-art machine learning and NLP models and architectures. Specifically, the image captioning model was built using a CNN encoder and LSTM decoder. We used the resnet-152 model pretrained on the ILSVRC-2012-CLS image classification dataset. Story text generation uses OpenAI's GPT-2, a massive transformer-based language model trained on enormous amounts of text data across the Internet. The front-end is built mainly using Firebase and Google Cloud.
Challenges we ran into
Finding efficient and effective image captioning and text generation models were difficult. In particular, most text generation models were unsuited for generating large amounts of text, and would generate text that was not fluent, had poor grammaticality, and did not make sense. Combining the image captioning and text generation components of the model was difficult as well. In terms of the front-end, there were difficulties with Firebase and the Google Cloud and it took time to connect back-end to front-end and to get everything streamlined and working well.
Accomplishments that we're proud of
We managed to build a very unique, creative, and decently working story generation bot, built using state-of-the-art machine learning and NLP technologies, that is able to generate multiple different stories for both a single image and multiple images at once. We were able to effectively combine image captioning and story text generation and merge back-end and front-end effectively.
What we learned
Steven: Learned to utilize OpenAI's state-of-the-art text generation model GPT-2. Also gained further exposure to more areas of NLP and computer vision including image captioning.
Melody: Finally learned how to use Firebase. Only took two full years.
What's next for StoryGen
The generated story text can still be improved, but this will be a more large-scale research project, as we will have to improve upon state-of-the-art text generation models. Further, the image captioning can be improved by training on larger datasets for longer amounts of time and incorporating more advanced machine learning and NLP techniques including attention. Improving the image captioning to be more detailed and specific may also lead to better story text generation. Lastly, the front-end can be improved to be more automated and aesthetic.