I want all the Ethereum data all of the time. Using the node's RPC interface (especially if one is viewing trace data) is so slow that any sort of iterative data analysis is nearly impossible. I've partially solved this problem in a simple way. My hackathon project tries to explain a single simple idea I've used to greatly improve the speed with which I can scan the trace-level Ethereum data.
What it does
Over the past year, I've solved various parts of what I call the trace data problem. The trace data problem makes broad data analysis across the entire chain difficult and profoundly slow if one is using the node's RPC directly. I use data gathered soon after the 5,000,000 Ethereum block (about a month ago) to explain how I am able to scan the Ethereum blockchain more than an order of magnitude faster than using the RPC directly. The data itself provides the key insight I needed to properly focus my efforts solving the problem revealed in the visualization.
About a month ago, soon after the 5,000,000 Ethereum block, I created a data set that I promptly ignored. This weekend I used that data to create visualizations that help me explain what I call the trace data problem and my solution to it. I believe the slowness one encounters when trying to scan traces from the millions of transactions on the chain is fully caused by the Fall 2016 DDoS attacks. My visualization makes a strong argument of why this is the case.
How I built it
I used QuickBlocks (http://quickblocks.io) to create the raw data prior to the hackathon. This weekend, I first used C++ to collate, summarize, sanitize, and clean up the data in preparation for doing visualizations. I then used Microsoft Excell and plotly.com to build the visualizations.
Challenges I ran into
I have no prior experience using data visualization tools, so much of my time this weekend was spent exploring various tools and finally choosing plotly.com. I then spent the rest of the time learning how to use plotly to help me explain the solution I am proposing to the trace data problem.
Accomplishments that I'm proud of
I've been able to gather Ethereum related data for some time. I'm proud that this weekend I was finally able to learn how to better visualize the data. Visualizations provide easy insights into the Ethereum data that can aid in future decision making for the Ethereum community as a whole. Having easy and quick access to the data is the first step.
What I learned
Data visualization is more about telling a story and using data to support the story than it is about showing pretty pictures--but there's nothing better than an informative pretty picture.
What's next for The Ethereum Trace Data Problem
I would like to see the Ethereum community shift towards a more data-driven development mode. QuickBlocks can aid in that effort. One of the things I show in my presentation is that the solution to the Fall 2016 DDos (the state clear hard fork) had a secondary effect on the trace data problem almost equal in magnitude to the original attack. In other words, the cure was almost as bad as the malady (from the perspective of pulling that data).
I'm not saying the community should have done anything differently, we needed to clean up the 20,000,000 empty accounts, but I am saying that using deep, accurate, fast data from the chain should be a key part of the future development efforts and proposed hard-forks such as POS and sharding.