Crowd Sourcd

Inspiration

We chose to work on a crowd-sourced tool for data labeling, to simplify the data pipeline for deep learning models.

Nowadays we have amazing tools at our disposal, like systems for language translation, or movies recommendation, computer vision and autonomous driving, and much more! But all these systems are very data-hungry, and most of the time of a Deep Learning engineer can be spent on data collection and preparation. With the right tools, like ours, this time can be reduced.

But not only: lots of data is not enough, we need to ensure that the quality of the data is high. From this perspective, leveraging an open-source crowd-sourced platform is a unique opportunity to achieve a gold standard in data quality and to work towards building an automated system that is more fair, inclusive, and resilient to bias.

Thus, Crowd-Sourced is a free platform that lets users worldwide contribute to labeling textual and graphical data for the intention of data for machine learning and as an open-source way of collecting human-labeled, gold-standard data for machine learning models.

This project comes with basic UI support, and easy to use interface, a set of ready to use interactions, and most importantly, all of the content provided with ease for the user to label so that you can have whatever data for labeling you like in the database without worrying about how the data will be served and labeled.

What it does

It's a platform that lets users help in labeling data sets really quickly, either through graphical or textual formats, by letting them a window where they can see different ways of showcasing the different ways of inputting the labels. The final labels are decided then on the basis of a consensus from the majority of the votes. In short, it enables an easy and quick way for the labeling of the dataset, just like Google Captcha does with its image labeling feature.

The web application is built by keeping the following aspects in mind,

🎁 Modern – Project created using the latest features of React (State management using Hooks)
💻 Responsive – Highly responsive and reusable UI components, that change depending on the provided props, since the UI library used here will be Material UI, which provides responsive components out of the box already
🚀 Fast – Buttery smooth experience thanks to the lightweight implementation of best practices in ReactJS
⚙️ Maintenance - The project is built with Docker Compose, following the easiness of adding and removing services, with easy to add code for maintainability purposes

How we built it

This section lists down the technologies which were used in the making of this awesome project! They are as following,

Makefile ❤️ scripts for automating many of the processes
Black (formatting) ❤️ Flake8 (linting)
FlaskAPI ❤️ MongoEngine (ORM) ❤️ VirtualEnv
Reacts ❤️ Material UI ❤️ yarn
EsLint with React to make sure no bugs arose
GitHub ❤️ with the issue and a pull request template
MongoDB as the database used
Docker/Docker Compose
Linux ❤️ wget ❤️ zip for automating dataset generation and setup
Python ❤️ requests lib, for using API from Unsplash

Challenges we ran into

Quite a lot,

Deciding on the details of the workflow
Deciding on the technology, and making it easy for everyone to follow along with all the issues and the work needed to be done
Deciding on the UI, and the whole team keeping on par with the quick learning curve and idea
Trying to convert the idea into the most MVP like as much as possible
Maintaining best practices with branches, Github issues & PRs
Making sure everyone was on the same page
Dealing with hidden bugs with Docker Compose, MongoDB, the server especially
Having to deploy the frontend somewhere
Linting, formatting, to make sure the code quality was high
Simplifying many processes by using a Makefile

Accomplishments that we're proud of

We're proud of a couple of things

Very rapid development
Rapid learning and understanding of the solution
Quickly adapting to a workflow
Not getting overwhelmed with a sense of feeling we won't make it
Including so many technologies, stacks, and overall ideas to get this MVP out there
Discussing very frequently and keeping in touch with everyone to make sure good progress is made
An issue/PR/branch system of GitHub. As of now, we have 6 closed issues, 2 open issues, and 16 closed PRs, with a total of 70 commits
Overall, having fun!

What we learned

A lot, for all of us,

On technical expertise, JS, Python, Automating, Bash Scripting, Dataset generation, API calls, Makefile
From a people perspective, time zone communication, deciding on a solid single idea and building on top of that

What's next for CrowdSource

Expanding a bit more on the idea, letting users upload their own data, letting 3rd party websites use this functionality in their own web applications, a bit like google captcha. We might go on to make a personal profile for each user, with proper authentication and everything, for each user to upload his/her own personal datasets to be labeled, and then easily introducing a more expansive platform with deals with audio and other formats of data sets well.

Built With

bash
docker
docker-compose
flaskapi
github
javascript
linux
makefile
markdown
material
mongodb
mongoengine
python
react
script
scss

Submitted to

MLH Fellowship Orientation Hackathon - Batch 3

Created by

Worked on the frontend with ReactJS and Material UI, made a very easy to use interface that used many React best practices.

Also worked on Docker Compose for integrating the frontend, the backend, and for the MongoDB database, contributed to making some server side automation for the data, set up the base architecture of the project, and worked on the README.md

Saif Ul Islam
I worked on the back-end, dealing mainly with Flask and MongoDB. This was my first time building a web server and using these tools, so I definitely learned a lot with the help of my teammates.

I also worked on the design of the API exchange schemas, managed the GitHub issue board and set up our domain name + client deployment.

Private user
Worked on data collection. Learned mainly about MongoDB and Flask, first time I used both tools.

Giancarlo Fissore

Updates

Saif Ul Islam started this project — Jun 11, 2021 11:30 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.