Deep Sense

Logo
Screenshot-1
Screenshot-2

Inspiration

Creating a more exclusive experience of enjoying images

What it does

Takes an image, finds the sentiment of the image and does object recognition. Later a RNN makes a poem on the image. The RNN is trained on famous poets of the past like Shakespeare,Robert Frost and T.Elliot. The RBM generates the music from the image based on its sentiment.The RBM is trained on famous musicians and try to convey the feelings of happiness, sadness, scary and suspense.

How we built it

We created two deep generative models: RNN and RBM. Later, we used the Microsoft Cognitive Services API to get objects and emotions given an image. This API has been implemented in the iPhone. As soon as the iPhone receives the tags and emotions, it sends to the servers for text and music generation. Connection between the iPhone and the servers is established using Flask and Rest APIs. On top of that, we create a GIF using 20 frames of the image. We provide the user a culmination of all senses in the form of text, music and image using just a picture. What makes it even more interesting is, the text and music is dynamically generated by the computer. And, the poems generated are created based on the style of different poets.

Challenges we ran into

Deep neural nets require a lot of computation power due to which training takes a lot of time. And since we were creating models for different poets and different emotions of song, time was a huge constraint.
Establishing communication between our servers (laptops) and the iPhone was very complicated. We used sockets initially but it wasn't working.
After working on the previous problem for few hours, we came up with the idea of implementing Flask and creating our own Rest APIs.
Learning something as diverse as Rest APIs was a challenging problem.
After creating all the models of the project, getting everything to perfect together was difficult, but we completed it.

Accomplishments that we're proud of

Two of our members did not have enough expertise on front end development. But, they learnt about Flask and Rest APIs and was able to merge it with their existing back-end code.
The fact that we were able to work together as a team on different concepts and implementations and ultimately being able to merge everyone's code without much difficulties shows how well we communicated as a team.
Initially we didn't think of implementing GIFs for a given image. It was one of our team-mate's idea that lead to that and this kind of shapes up our entire application as a whole.
We created a new iteration of GIF which includes music and text generation.
Most importantly, we have been able to inculcate modern state of the art techniques to bring back the art form of legendary poets and musicians into something which is considered to be the most famous form of media -- Image. This is our version of 'Hack the Past'.

What we learned

We learned the importance of working together as a team and communicating at every point of time. This gives all the members an idea of the progress being made by the members.
Setting a deadline for every task boosts up the performance of all the team members and also gives enough time to validate an evaluate the project at the end.

What's next for Deep Sense

Our main motive will be to entertain the users by providing a fun, intuitive and an interesting UI which not only presents the information in a fun way but also brings back the memory of poets and musicians who are not alive, by dynamically generating poems and songs in their style.
Along with the existing RBMs and RNNs, we can introduce a recommendation engine that suggests users about poets with respect to the location of the user. eg: If the user is in Cambridge, MA suggestions like Robert Frost might pop up. In India, suggestions like Rabindranath Tagore might appear.
For the users, we will be improving the performance of our app for a better overall experience.
In terms of research, our data can be used by scientists to measure correlation between different forms of data -- image, text, music and come up with better models of interaction between different forms of media.

Built With

flask
ios
iphone-sdk
keras
microsoft-cognitive-services
python
swift
tensorflow

Submitted to

Created by

I worked in the development of the iOS app using Swift. I mainly worked in the backend of the iOS app, implementing Microsoft Cognitive Services API (Computer Vision and Sentiment Analysis), in Swift. Then, I helped to implement the restful API developed in Python (Flask). Lately, I contributed with the communications between server and iPhone, where I created and played MIDI (audio) files in iOS. I strongly contributed in the main design of the app.

Challenges:
Implementing the Microsoft Computer Vision and Emotion Analysis for iOS was a real challenge. Microsoft doesn't have any documentation in Swift to make it work... However, there was a GiHub project really useful. The MS evangelists were really helpful as well!
Also, working with MIDI audio (and sending over the net) was a real challenge as it was something really new for me.
Lastly, helping to implement the Restful API was challenge due to the main reason.

Fabian Vergara
Changing the world one line of code at a time
I worked on creating the Restricted Boltzmann Machine to generate music.
(Blogs of Dan Shiebler on RBM's and RNN'S really inspired me to explore on this technique more.)
The Microsoft Cognitive Services were used to detect the sentiments in the image and I created a deepRBM, kind of a stacked RBM architecture for music generation.Using that I trained 5 models on sentiments like:
1. Happiness
2. Sadness
3. Scary
4.Suspense

Later I worked on a REST API on Flask that is used to collect and send data to the frontend, mainly the MIDI files which were encoded and sent .

I guess the big challenge here was the Integration with Front end to transfer MIDI files. A lot of techniques were tried

fenil doshi
I worked on the front-end of the app and connecting the iOS app to the backend. In the front-end, I worked on implementing the AVFoundation camera to allow the user take a 20 frames of an image taken in 0.5 seconds. I also worked with Fabian to decode the base64 into MIDI and this was one of our biggest challenges.

Ahmed Bekhit
I worked on the poem generation part of the backend. I used Keras with Tensorflow as the backbone. Models were trained on .txt files containing poems of famous poets.
After getting access to the poem, I was responsible for writing a server program which interacts with the iOS device. This was done using Flask and my own Rest API.

Challenges :
1. Getting data set of poems of different poets was time consuming.
2. Being able to train the models for separate poets was challenging (RNN training is computationally expensive, hence, takes a lot of time).

Rohit Saha