Given the ongoing popularity of many cryptocurrencies and mining them, we were inspired to see just how good their hash functions are. We wanted to see if there was a more efficient way, using a machine learning classification algorithm to find the answer to a given problem than the current method of mining, brute force.
What it does
Given a string, our classifier(GuassianNB, DecisionTree, or RandomForest) will guess one nonce that will solve that string when fed into a given hash function that our dataset was trained based on.
Due to some issues that we ran into, and also likely due to how good the SHA-256 algorithm is, our method is about as good as brute force in respect to the accuracy of one guess - which is to be expected if there is in fact no pattern and/or our classifier was not trained on enough data.
How I built it
Since we did not have a dataset to work with, we had to generate our own dataset. Since this is akin to bitcoin mining, it took a few hours to generate a few thousand lines of data to train our algorithm on. While Bitcoin currently requires you to have about 16 zeros on the front of your hash, in order to make sure we got enough data in time, we only worked with 1 zero being required on the front of our resultant hash.
After generating our dataset, we went a few different routes with different members of the team attempting to tackle each one. Two group members worked on trying to get data on other hashing algorithms, as well as trying to generate a dataset that required 2 zeros on the front of the hashed value. We also started analyzing the data in order to find a trend that could be used to determine a nonce given a certain string.
Challenges I ran into
The first challenge that we ran into was getting a dataset to train our algorithm on. It seemed difficult to obtain actual datasets, so we figured out a way to create our own.
Other challenges we ran into were mostly time based. In order to generate larger and more useful datasets(with more 0s), we simply would need more time to let the computers run.
Our biggest challenges came when trying to analyze the data. The classification algorithms that we were using required our features to be in an integer or float format. Since our features were strings, we ran into an issue of converting them without losing data. We ended up hashing them, but we believe that there may be a better way to create our solution, but could not find one with the limited time.
Accomplishments that I'm proud of
Having learned more about how cryptocurrencies work was very exciting.
What I learned
I learned more about how cryptocurrency mining works. I also learned that SHA-256 is pretty good at making strings unrecognizable, but have not completely ruled out that there is a trend to that data.
What's next for NonceSolver
One of the next steps for NonceSolver is to generate larger datasets and train on them. We would also like to look into converting strings into integers in a way that makes them more recognizable by our classifier. We would also like to look into using different classification algorithms than the ones we used (potentially ones that will accept strings as one of our features), as well as attempting to solve the problem for different hashing algorithms than SHA-256.