Audiofy

Home Page
Live Mode
Ebook Mode

Inspiration

Prolonged hours of continuous reading may be exhausting and may miscommunicate actual emotion to the reader. It is a conventional human behavior to get tired reading length texts and misinterpret literature. What if we enhance a reader’s experience as per the sensation of the text such that he is able to connect better with the literature? Pursuing the same idea, Audiofy accompanies a reader with suitable background music such that the sentiments of the text are delivered in a more enriching and effective manner.

What it does

Audiofy uses literature and music parallely i.e. it generates music based on the emotions conveyed by literature. The application specifically focuses on novels and generates music that captures the change in the distribution of emotion words. This entire process can be segregated into 3 phases:-

Text recognition using Optical Character Recognition
Sentimental Analysis of the captured text
Audio Retrieval based on retrieved emotion from literature

The application can be used in two different modes :

GO LIVE MODE : This is a real time implementation where a user can leverage our application while reading a physical novel and not just an Ebook. User just needs to have a video capturing device (preferably a mobile phone) which will capture real time images from the novel and process it in the background.
EBOOK MODE : This mode can be best utilized when one wants to have an exceptional experience with ebooks. Yes, our application can audiofy an ebook as well. Just upload a PDF and you are good to go. Happy reading !

How we built it

Audiofy is a django web-app and it is composed of 3 parts (Text extraction/OCR, emotion analysis and music generation). Text extraction and emotion analysis run parallely on two separate threads. Music is played using Javascript's built-in play audio method.

Text extraction
- For paper books, we capture the image of a page using an IP webcam and for ebooks we take a screenshot of the pdf page viewed by the user.
- Using OpenCV we preprocess the image, which is then fed into the OCR engine
- Pytesseract performs OCR and extracts the text information
Emotion Analysis
- Using expertai-nlapi we find the sentiment score of the text.
- If the sentiment score does not have a strong polarity, we further use an LSTM based deep learning model to predict the emotions
Music generation
- Finally, based on the predicted emotion we play the song from our curated playlist of songs tagged to emotions.

Challenges we ran into

There are always certain challenges which one has to surpass to accomplish the task in the best possible way. We also ran into some challenges which demanded good efforts and brainstorming. Following are the challenges we ran into :

Version incompatibility between the frameworks and languages used in our project. For eg, we realized at the end moment that the ExpertAi version we installed wasn't compatible with Python's version installed on our machine.
Virtual collaboration was a little challenging task for us initially, as it wasn't easy to convey our ideas in a crystal clear way. But we were able to overcome this limitation in a short span by conducting regular meetings and maintaining a Github repository.

Accomplishments that we're proud of

During our entire journey with this project, there definitely are some attainments our team is really proud of. Some of the remarkable milestones are listed down below :

Ebook Mode was added by the team in the last minute and it required some serious brainstorming and efforts. It was all due to seamless collaborative efforts of all the teammates that we were able to integrate this new feature within a very short span of time.
After the application was up, we asked some of our peers to review and provide honest feedback on the usability of our app. It was really encouraging to get some positive feedback from them and it felt like our efforts had paid off well. There definitely were some enhancements which were suggested and we tried our best to incorporate them in the app.
We are really proud of our strategy and even load distribution where each one of us contributed in his area of expertise.

What we learnt

On the engineering side, we learnt how to create a product from an idea, which solves a common problem we faced. The process of developing an entire end-to-end application in a team was a great learning experience for us.

The way to go about various designs for Audiofy was another valuable experience we had while building it.

On the technical side, we learnt about multiple domains under natural language processing like sentiment analysis, and optical character recognition. We also got to explore the expertai-nlapi offerings and its apis for sentiment analysis. We learnt how to create deep learning models using tensorflow. We learnt about image preprocessing using openCV. We also learnt how to integrate the model into a django based web-application

What's next for Audiofy

The field of sentiment analysis is a new exciting direction of research given the large number of real-world applications where it is important to discover people’s opinion in better decision making. One of the significant components of this field is the development of techniques for sentiment analysis. People have recently begun to express their opinions on the Web, which increased the need to analyze the opinionated online content for different real-world applications. In literature there is a lot of research to detect sentiment from the text. Certain additions can be made to enhance the user experience which are listed as follows:

Consider actions being performed in the text like running, water splashing, chirping birds or horse riding and inject sound effects related to it.
Taking into account the environment portrayed by the writer like rains, thunderstorms, winds, crowded places or restaurants to enhance the sound effects.
Tracking users eye-ball to calculate the reading speed and accordingly adjust the audio’s tempo.
Incorporate broader range of emotions in the application.