Introduction

What: Industrial data labelling-as-a-service

Why: Labelled data of sufficient quality is required to effectively utilize machine learning

How: Crowdsource energy industry’s data labelling jobs to independent subject matter experts

Who: Energy companies want their data labelled & industrial subject matter experts want to prove their expertise

Vision

Enabling and accelerating digital transformation of the energy industry by providing labelled data.

Machine learning and data labelling

Machine learning (ML) is already a billion dollar industry. Tens of thousands of people across the globe work with labelling data sets that machine learning algorithms can use for training. Labelling itself is a $150 million industry (2018). Heavy-asset industries, and the energy industry specifically, is lagging behind in their digital transformation journeys - applications of machine learning are limited to pilot projects and proof of concepts. One of the reasons that industrial data labelling has not yet been executed at scale has been a shortage of subject matter experts.

Crowdsourcing industrial data labelling

With hundreds of thousands of people losing or set to lose their jobs in the energy industry in the wake of the oil price collapse and COVID-19, there is a need for innovative business models that quickly can provide relevant work directly to highly-skilled energy professionals. By crowdsourcing labelling of industrial sensor data, energy companies enhance their data sets - allowing for more accurate results in existing data-driven processes and enabling machine learning. The World Economic Forum has estimated the value of digital transformation just in the Oil & Gas industry to $1.6 trillion. Labelled data is essential to get anywhere close to this number.

Solution

A web-based application where companies can post data labelling jobs, allowing subject matter experts around the world to:

  1. Test their data labelling abilities on a data set. We compare their labelling accuracy to a small labelled data set provided by the customer
  2. If they perform well on the test, we release the entire dataset to them, anonymizing company details to preserve confidentiality if necessary.
  3. Upon completion of data labelling job, the labeller gets paid
  4. The data labelling results are used to generate credentials on Hedera Hashgraph. The credentials objectively measure data analysis and labelling skills. These credentials can then be immediately verified by any other party, by leveraging blockchain technology.
  5. The data is only released to the labellers qualifying for the job, limiting who gets access. Additionally, the company offering the labelling job may choose to keep itself anonymous

Built with

Java

Hedera Hashgraph / DID and Verifiable Credentials

React/JavaScript

Open source software (Gearbox.js, react-table, antd, Open Industrial Data)

Appendix

Presentation material: Slides

Backend code: Github

Frontend code: Github

Figma Prototype: Figma

Built With

Share this project:

Updates