What: Industrial data labelling-as-a-service
Why: Labelled data of sufficient quality is required to effectively utilize machine learning
How: Crowdsource energy industry’s data labelling jobs to independent subject matter experts
Who: Energy companies want their data labelled & industrial subject matter experts want to prove their expertise
Enabling and accelerating digital transformation of the energy industry by providing labelled data.
Machine learning and data labelling
Machine learning (ML) is already a billion dollar industry. Tens of thousands of people across the globe work with labelling data sets that machine learning algorithms can use for training. Labelling itself is a $150 million industry (2018). Heavy-asset industries, and the energy industry specifically, is lagging behind in their digital transformation journeys - applications of machine learning are limited to pilot projects and proof of concepts. One of the reasons that industrial data labelling has not yet been executed at scale has been a shortage of subject matter experts.
Crowdsourcing industrial data labelling
With hundreds of thousands of people losing or set to lose their jobs in the energy industry in the wake of the oil price collapse and COVID-19, there is a need for innovative business models that quickly can provide relevant work directly to highly-skilled energy professionals. By crowdsourcing labelling of industrial sensor data, energy companies enhance their data sets - allowing for more accurate results in existing data-driven processes and enabling machine learning. The World Economic Forum has estimated the value of digital transformation just in the Oil & Gas industry to $1.6 trillion. Labelled data is essential to get anywhere close to this number.
A web-based application where companies can post data labelling jobs, allowing subject matter experts around the world to:
- Test their data labelling abilities on a data set. We compare their labelling accuracy to a small labelled data set provided by the customer
- If they perform well on the test, we release the entire dataset to them, anonymizing company details to preserve confidentiality if necessary.
- Upon completion of data labelling job, the labeller gets paid
- The data labelling results are used to generate credentials on Hedera Hashgraph. The credentials objectively measure data analysis and labelling skills. These credentials can then be immediately verified by any other party, by leveraging blockchain technology.
- The data is only released to the labellers qualifying for the job, limiting who gets access. Additionally, the company offering the labelling job may choose to keep itself anonymous
Hedera Hashgraph / DID and Verifiable Credentials
Open source software (Gearbox.js, react-table, antd, Open Industrial Data)
Presentation material: Slides
Backend code: Github
Frontend code: Github
Figma Prototype: Figma