The US Energy Information Administration provides a large data set of over 12,000 different houses across the country spanning 900 different factors of a household (e.g number of rooms, size of the garage, average temperature) related to annual energy consumption. The data seemed very useful, but too expansive to use. So we wanted to see if we could make it easier to find which factors influence your energy consumption the most.
What it does
We built a website where a user can submit some basic information about their household and living situation. From there, it sends that information to our backend where it is processed by our pre-trained deep neural network. The neural network then predicts their expected annual energy usage in Kilo-Watt Hours based on that array of input, which it compares to the over 12,000 other samples we analyzed from previous years.
How we built it
The website was built using Vue.js, the backend written in Flask, and the machine learning was done using Keras, a high level Python library that runs on top of Tensorflow. Additional regression was conducted using R.
Challenges we ran into
Training the neural network on a multi-dimensional vectorized input was difficult because it required normalizing each of the varying inputs to between 0 and 1, retrieving an output, and then returning it back to its original range. Additionally, we ran into some trouble actually converting the JSON format of the client's data from the post request to a Pandas Data Frame that we needed to feed into the DNN.
We needed to reduce the dimensionality of the dataset from 900 distinct variables to a sizeable number that a user could reliably input into a webapp. We tackled this problem using two approaches. The first was through a machine learning technique called Principle Component Analysis (PCA) that aims to find uncorrelated components within the dataset, and it helps to reduce the dimensionality along combinations of the dataset. The second method we used was to perform a series of regressions between each variable and the observation of energy usage, to find the variables with the highest correlation values. From these two lists we then picked a good list of variables that we thought would be the best fit for our project.
Accomplishments that we're proud of
Incorporating machine learning and statistical analysis on a large dataset was both challenging and rewarding for all of us because we had never done it before and learned a lot on the way!
What we learned
We learned how to use machine learning libraries in Python and what things to consider when training a Neural Network to perform linear regression.
What's next for Uncover Your Usage
After improving our model's accuracy and front end UI, we can expand our product to take in a larger set of features and better help users determine the cause of their continually increasing energy bill.