Creating a more exclusive experience of enjoying images

What it does

Takes an image, finds the sentiment of the image and does object recognition. Later a RNN makes a poem on the image. The RNN is trained on famous poets of the past like Shakespeare,Robert Frost and T.Elliot. The RBM generates the music from the image based on its sentiment.The RBM is trained on famous musicians and try to convey the feelings of happiness, sadness, scary and suspense.

How we built it

We created two deep generative models: RNN and RBM. Later, we used the Microsoft Cognitive Services API to get objects and emotions given an image. This API has been implemented in the iPhone. As soon as the iPhone receives the tags and emotions, it sends to the servers for text and music generation. Connection between the iPhone and the servers is established using Flask and Rest APIs. On top of that, we create a GIF using 20 frames of the image. We provide the user a culmination of all senses in the form of text, music and image using just a picture. What makes it even more interesting is, the text and music is dynamically generated by the computer. And, the poems generated are created based on the style of different poets.

Challenges we ran into

  1. Deep neural nets require a lot of computation power due to which training takes a lot of time. And since we were creating models for different poets and different emotions of song, time was a huge constraint.
  2. Establishing communication between our servers (laptops) and the iPhone was very complicated. We used sockets initially but it wasn't working.
  3. After working on the previous problem for few hours, we came up with the idea of implementing Flask and creating our own Rest APIs.
  4. Learning something as diverse as Rest APIs was a challenging problem.
  5. After creating all the models of the project, getting everything to perfect together was difficult, but we completed it.

Accomplishments that we're proud of

  1. Two of our members did not have enough expertise on front end development. But, they learnt about Flask and Rest APIs and was able to merge it with their existing back-end code.
  2. The fact that we were able to work together as a team on different concepts and implementations and ultimately being able to merge everyone's code without much difficulties shows how well we communicated as a team.
  3. Initially we didn't think of implementing GIFs for a given image. It was one of our team-mate's idea that lead to that and this kind of shapes up our entire application as a whole.
  4. We created a new iteration of GIF which includes music and text generation.
  5. Most importantly, we have been able to inculcate modern state of the art techniques to bring back the art form of legendary poets and musicians into something which is considered to be the most famous form of media -- Image. This is our version of 'Hack the Past'.

What we learned

  1. We learned the importance of working together as a team and communicating at every point of time. This gives all the members an idea of the progress being made by the members.
  2. Setting a deadline for every task boosts up the performance of all the team members and also gives enough time to validate an evaluate the project at the end.

What's next for Deep Sense

  1. Our main motive will be to entertain the users by providing a fun, intuitive and an interesting UI which not only presents the information in a fun way but also brings back the memory of poets and musicians who are not alive, by dynamically generating poems and songs in their style.
  2. Along with the existing RBMs and RNNs, we can introduce a recommendation engine that suggests users about poets with respect to the location of the user. eg: If the user is in Cambridge, MA suggestions like Robert Frost might pop up. In India, suggestions like Rabindranath Tagore might appear.
  3. For the users, we will be improving the performance of our app for a better overall experience.
  4. In terms of research, our data can be used by scientists to measure correlation between different forms of data -- image, text, music and come up with better models of interaction between different forms of media.

Built With

Share this project: