Inspiration
A friend of mine is working on a side project that involves monitoring crops. He wants to build a system that can process pictures of the crops and find defects, classify different weeds, and find early signals of crop damage. However, he does not know how to write code nor does he know any artificial intelligence, particularly computer vision. So him, and many others who have great use cases for computer vision are unable to complete their projects because they don't know programming.
What it does
A system where you can import images - either from your own library or from Google images - into the platform, label them using a easy to use interface, and with a click of a single button, train the model. The system also makes the model available for use in the browser or via an API.
Generally, there are 4 steps to any computer vision pipeline:
1) Gather data
2) Label the data
3) Find the appropriate model and train it
4) Deploy the model
All 4 of these steps are included in the EZ CV system. The ultimate goal is that an individual does not need to know how to program or know any artificial intelligence to get cutting edge computer vision results.
How I built it
There were a few major components to this project: the user interface, the server, and the algorithm to train the ML model.
The frontend was built in normal HTML and CSS with AngularJS as the MVC framework. I used the HTML5 Canvas API to create the ability to draw a bounding box over images. The search bar is powered by an amazing open source script (https://github.com/hardikvasa/google-images-download) that scrapes google images.
The server was built in Python using Flask.
Due to the time constraints, I decided to focus on a single problem within computer vision: object localization. This problem focuses on finding the types (car, bird, etc.) and locations of objects. To train such a model, the input data consists of images and and goal is to predict bounding boxes around the objects. So to focus on this problem, the UI only enables creating bounding boxes on images.
Challenges I ran into
Within the frontend, the Canvas API is very finicky, so that required a fair amount of debugging.
I decided to use Google Compute Engine for the server and training the computer vision algorithms. This was my first time using Google Cloud so I had to learn to navigate the console, understand the configurations, and debug errors in launching instances. I particularly was facing challenges with networking because I wanted to expose the server to the external IP so I had to add a few additional permissions.
Accomplishments that I'm proud of
I am surprised that I finished the entire front-end entirely within the hackathon. The front-end contains most of the functionality I envisioned, which is really fulfilling knowing that I was able to reach a goal I set prior to the hackathon.
What I learned
How to manage a project. There were many moving parts in this project and many frames of mind to think in (Design, User Experience, Server, AI/Computer Vision) as well as switching between multiple languages and frameworks (Languages: HTML, CSS, Javascript, Python and Libraries/Frameworks: AngularJS, Keras/Tensorflow).
What's next for EZ CV
I would love to use this for myself because in the past year of research and project development in computer vision that I have done, many times I am faced with annotating data and training models, which is a tedious process and can be blackbox-ed. I hope to bring more computer vision algorithms to the service: pure classification, semantic segmentation, instance segmentation, optical flow, and many more! Finally, I hope to help my peers like the one who inspired me use the service to solve their computer vision problems.
Log in or sign up for Devpost to join the conversation.