We wanted to create something that people would benefit from and in this pandemic times, people with disabilities are the most affected thus we channeled our attention towards improving their online experiences
What it does
Chrome extension that communicates with external REST api with GPU that runs deep learning model ( image captioning model). The model takes the website on which the extension is run, extract images from it and run the deep learning model for images ( model creates a captions/short descriptions for images). Chrome extension then, replace all alt properties for html images with this short description.
How we built it
We created Rest API in django, landing page in plain html and bootstrap ( hosted on github pages, link : link ), We used deep learning model from pytorch ( second model in pytorch, first one in tensorflow) and used Google Chrome Extensions API. We tested it on localhost because we didn't have resources for external server with GPU sufficent enough for the model/ Currently on github hosted model is 1 ( smaller one) because of we didn't have resources for 2 ( 1.5 GB of weights )
Challenges we ran into
Model is large, we didn't have that many resources for it ( although it's very well trained and gives very good results). Firstly, we did try to learn small model for it but it didn't provide good results ( trained on COCO dataset for few hours ). We ended up using model from publication with already calculated weights. We also didn't have any experience with creating chrome extensions.
Accomplishments that we're proud of
Creating working chrome extension and whole pipeline logic for it ( submitting MHTML to external server, parsing it, extracting jpg's and png's, giving them to deep learning model and returning results).
What we learned
Lot's of things about deep learning for image captioning and how to create chrome browser extensions.
What's next for SMARTLook
Hosting it on external server with sufficient resources for Deep Learning model. Also we need to train model further ( it was very challenging to build on our own, although the model from publication is working, we didn't have GPU's good enough for it, so we need to develop it further with proper equipment )