SMARTLook

Inspiration

We wanted to create something that people would benefit from and in this pandemic times, people with disabilities are the most affected thus we channeled our attention towards improving their online experiences

What it does

Chrome extension that communicates with external REST api with GPU that runs deep learning model ( image captioning model). The model takes the website on which the extension is run, extract images from it and run the deep learning model for images ( model creates a captions/short descriptions for images). Chrome extension then, replace all alt properties for html images with this short description.

How we built it

We created Rest API in django, landing page in plain html and bootstrap ( hosted on github pages, link : link ), We used deep learning model from pytorch ( second model in pytorch, first one in tensorflow) and used Google Chrome Extensions API. We tested it on localhost because we didn't have resources for external server with GPU sufficent enough for the model/ Currently on github hosted model is 1 ( smaller one) because of we didn't have resources for 2 ( 1.5 GB of weights )

Challenges we ran into

Model is large, we didn't have that many resources for it ( although it's very well trained and gives very good results). Firstly, we did try to learn small model for it but it didn't provide good results ( trained on COCO dataset for few hours ). We ended up using model from publication with already calculated weights. We also didn't have any experience with creating chrome extensions.

Accomplishments that we're proud of

Creating working chrome extension and whole pipeline logic for it ( submitting MHTML to external server, parsing it, extracting jpg's and png's, giving them to deep learning model and returning results).

What we learned

Lot's of things about deep learning for image captioning and how to create chrome browser extensions.

What's next for SMARTLook

Hosting it on external server with sufficient resources for Deep Learning model. Also we need to train model further ( it was very challenging to build on our own, although the model from publication is working, we didn't have GPU's good enough for it, so we need to develop it further with proper equipment )

Github REPO

https://github.com/Ache17/oxford_hack

Built With

bootstrap
chromeextensionsapi
django
javascript
pytorch

Submitted to

Oxford Hack 2020

Created by

I created the frontend of the project. I've used javaScript and chrome extension API to get the images from the page and used restful apis to communicate with the backend. I've also created the landing page for the project.

Alex Constantin
I worked on training the deep learning model, and implementing it within the backend part so that the API can get a text description for each image. I also worked on creating and styling the website for the extension.

Andrej Velichkovski
I created backend ( django REST API ). Created prototype for the extension. Also worked on the deep learning ( created second model, from the github description). Integrated it in the backend and adapted the code for it.

Maciej Lewandowski

Updates

Alex Constantin started this project — Nov 15, 2020 12:46 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.