The basis of our challenge was to allow engineers to view more data than from default line charts. As more data points are computed by sensors (for example, we were asked to analyze roughly 130,000 data points), line charts not only run into scaling and unit issues but also clump data points together, thereby reducing the information able to be obtained from a specific data point. As the average computer has only 1,920 pixels across its screen, we would need 68 full HD screens in order to individually view each point as a single pixel. The basis of our solution was to bring this one dimensional trend into a multi-dimensional graphic that increases any particular point’s accessibility.
What it does
Starting from the bottom left corner, in a matrix form, the first element is the oldest element of the dataset. Continuing right for 288 5-minute increments, each row consists of a day’s worth of data points. As we had to represent 130,000+ data points, the matrix was formatted with 453 rows. For each measured output, we created an interactive chart that allows viewers to enlarge sections of the chart. This would allow engineers to view specific data points as well as certain time frames of the data. Furthermore, each data point is shaded on a colored scale based on logarithmically scaling each value to the average. This layered heat map provides another mechanism for surveillance engineers to observe data points in a visual, rather than numeric, manner.
How we built it
After coming up with a general solution to the problem, we split up into two different groups. Half of us worked on the front end while the other two worked on the back end. Within both groups, we found it easy to rework pre-existing objects and libraries to our needs. Those of us working on the frontend found the template for a very nice, interactive webpage that we nicely segmented for our different outputs. Also, the free ‘.tech’ domain given to competitors and GitHub’s “Pages” worked hand in hand to host our site. Those on the backend found a nice service called plot.ly that greatly helped with formatting and displaying the data set into a reworked layered heat map. Formatting the data with Python and writing a few quick functions, we were able to upload our data to their client and easily access them on our webpage.
Challenges we ran into
The two main issues we ran into were querying the data and handling outliers - particularly with the temperature datasets. Since we used Github to host our website, we failed to realize that the files hosted were static preventing us from running scripts and greatly inconveniencing our plans to allow querying. Although plot.ly is a publicly accessible software platform, there are limitations to how many times we can use it. As we could only perform 100 image exports and have 25 files stored in a public account, we were unable to format the charts to the extent that we wanted.
Accomplishments that we’re proud of
Considering that we are all first years at Rice University and that this is our first hackathon, we are very pleased with the fact that we managed to create something functional. We are also proud that our website is cross-platform, has a custom domain name, and is aesthetically appealing.
What we learned
A list of things that we learned were: division of labor, caffeine, terminal, bash, python, github, troubleshooting, time management, usability of external software, and how powerful it really is to use things that are already made (don’t reinvent the wheel, just roll it).
What's next for SLB Datasets
We should upload the data set to a database editor such as MySQL and query the data so that we can manipulate and view the data in a more abstract manner. Furthermore, for each output from each sensor, we created an individual Python file to format and apply plot.ly to the data. This process could be optimized by just using one file for all outputs.