DataChain

Inspiration

Research allows the scientific method to take it's expressive form to utilize previous knowledge and discoveries for bigger and brighter ideas that aid the world around us. The contributions made by others in the community promote clearer understandings of our complex world and allow humanity to continue innovating.

The current system of fees for access and journal selection allows for peer review to discern the forms of research with the most contribution, however this method also has the downside of creating many high fee cost barriers for many in the public. Our belief is that data should be open access - free and accessible research for the public, especially government research that is funded by taxpayers. By making research openly available we can help to make contributions more easily shared and ideas more more easily spread.

What it does

Knowledge should be accessible, double-checked, and able to be expanded on. Our software takes theses fundamentals and puts them at the front. It stores machine learning models and datasets on a blockchain. The datasets can be contributed on and improved upon by saving parts of the dataset across the blockchain. In addition, due to the nature of the blockchain, the data stored on it is publicly available and spread out on computers across the world. The data on the chain is contributed by researchers and as it is improved upon, the blockchain acts as a timeline showing the improvements along the way.

How we built it

It's built on ideas similar to bitcoin however there are no coins or tokens involved. Utilizing the decentralized and stackable history nature of Blockchains, we were able to create a platform that allows research to not only be made public, but also contributed to and built on top of. We built it with the Python programming language.

Challenges we ran into

One of the main challenges that we came across was figuring out the best way to store a model on the network and have that related to an author. The challenge with this is having the models take up a lot of space on someone's computer - which would defeat the purpose of this software. We came up with a solution that used merkle trees and a custom bytecode to store parts of the model across multiple computers. The unforeseen benefit of this is that a model can be downloaded in parts from multiple computers at once, which would be faster than if the model had been downloaded from one central computer.

Accomplishments that we're proud of

The best part of this project was developing a model that was successfully stored and evaluated on our blockchain. Having a visible proof of concept made this an exciting experience to have.

What we learned

From this experience, we learned more about the mechanics of cryptocurrencies and how they can be bridged with the expansive nature of Data Science. The interdisciplinary mix was not well known but something we were excited to experiment with.

What's next for DataChain

Making more people aware of the software allows the research to be accessible to more communities. In future steps we hope to see how Data Science companies can benefit from this open software, and creating a version that would allow machine learning models to be developed by researchers in a decentralized manner with all the machines on the network.

Built With

blockchain
data
python
tenorflow

Updates

James Spann started this project — Feb 17, 2018 07:51 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.