Inspiration
I got this inspiration from my old school best friend who is visually impaired. We used to spend time together always. He used to ask me what is happening around him always and I used to explain to him so that he could create an imaginary world around him.
So I was thinking about this for the past couple of days and I thought that if there is an application that would take photos using a camera and translate the scenery into text and explain to the user what is happening around him in real-time. It will be so good and it will help visually impaired people to have a new life. So I fixed this as my dream project to do and gift him on his upcoming birthday.
Though it is not able to build a real-time application in this limited time. I have created a bot that can take an image from the user and outputs textual understandings of the image as captions I think it will be a stepping stone to my dream project.
What it does
I take an image as an input from the user and it understands the image very well, learn from the image and from previous learning too. finally it produces textual understanding of the image as captions.
How we built it
--> we have used 15000 images and respective captions to create this project
--> First we started with cleaning captions and images
--> Then we removed unnecessary words numbers from the captions to make them precise
--> We Filtered Words according to a certain threshold frequency, that is no of times a word appears in a sentence.
--> finally we got 1845 keywords from 40000 words from the dataset
-->The we used image preprocessing technique to convert image to feature vector
--> We used tokenization and word embedding techniques to process the captions
-->Then we created a Description for training data by creating Dictionaries to Map each Image to its corresponding captions
--> we created new data to make our model more accurate using data generator functions
-->Then we extracted the image feature from the image dataset and created our deep CNN Resnet model and trained it using google cloud machine learning APIs and clusters
--> Then we deployed our model using some backend scripts and HTML using Twilio APIs.
Challenges we ran into
We ran into challenges each and every minute:
--> we faced a big challenge while preprocessing the caption and removing unwanted words from each sentence has many symbols and numbers aligned improperly.
--> It was so difficult to find a method to get keywords from the whole data set but finally we came up with a threshold frequency method
--> Then we faced more and more problems while training the model as it does not meet our expectations we trained it again and again 18 times by hyper tuning it vigorously but finally we were able to get the expected, model accuracy. It ate most of our time
--> Next challenge we face while creating an embedded matrix for our captions but it was a little tricky to solve.
-->We faced little problems during deployment but we were able to overcome it by using Twilio APIs
Accomplishments that we're proud of
--> We were able to create a model as we expected in a short span of time of 24 hours.
--> The model we created was pretty accurate than we expected
--> I was able to cross halfway in my development of dream project
-->The deployment of the project was so good and the web pages looks so attractive
What's next for Image Annotation
The future development of this project would be a mobile application that takes photos around the user and translates it into textual understanding and explains to the user what is happening around him through voice.
Another idea that I have is image segmentation based on the scenery such as mountains .forest .party.etc.
Built With
- api
- flask
- google-cloud
- html
- machine-learning
- python
- tensorflow
- twilio
Log in or sign up for Devpost to join the conversation.