Gianna Finear posted an update — Nov 29, 2022 02:05 PM EST

Introduction:

We are implementing something new. We have decided to create a plot synopsis based on the most important words extracted from a picture. We arrived at this idea because we wanted to combine what we have learned in deep learning with what we learned in computer vision and artificial intelligence. A photo of an image will be passed in and we will define the most important words based on that image. We then wanted to create some body of work using those words. We decided to write a synopsis because we have not seen this done before. We are motivated by existing papers on content generation. This problem falls into multiple categories such as classification, structured prediction, and content creation.

Challenges:

The hardest part of the project so far has been formatting inputs and outputs to all fit together within our pipeline since our components were all implemented independently. We had to modify our image processing component to allow users to input their own images, rather than only allowing for images within the training/testing sets. In addition, each component of our model requires its own dataset (the caption generator and the summary decoder), so we had to consider this large amount of data and processing when structuring our repo to make sure we could keep each component modular. At this point, we’ve completed our image processing and encoding components, and must still work on the decoder. This has been the most involved component of the project, since we are entirely constructing the architecture ourselves instead of referring to external sources.

Insights:

Right now we have finished the first part of the project which is to caption photos. We have accuracy and perplexity scores to show when this model trains. Additionally, we have included a feature where someone can upload their own photo. Our model will then return a caption for them. This can be manually checked and is returning a cohesive caption that accurately describes the photo. Right now this portion of the project is performing exactly as expected. The second part of the project right now does not have any concrete results. We downloaded and preprocessed the data and have built the encoder and it is running as expected, but we are still working on the decoder. Once this is done we will have perplexity metrics and it will print a summary that can manually be checked for accuracy.

Plan:

Yes, we believe we are on track for our project. We are halfway done and have a clear understanding of what we need to complete moving forward. Currently, we need to dedicate more time to the decoder and generator for our book ideas. As of now we believe our original plan is possible so we are not making adaptations. We have ideas on how to expand if we finish earlier than planned; however, we think that our original plan has enough parts to keep us challenged during the project.

Log in or sign up for Devpost to join the conversation.