Inspiration

For this project, our team wanted to learn more about data science and the mining and analysis process. Because we are working within the scope of the Education track at HackMIT this year, we decided to examine the role money plays in America's education system. This insights are important since analysis such as this could affect public policy decisions made by the government as well as how Americans view the current education system and therefore could change which candidate's educational policy they believe is best for improving our nation which could influence their vote in the upcoming election.

What It Does

This project analyzes the revenues of different states in the United States over time and different relationships therein.

How I Built It

We did this analysis using R and the built in function there within. We took advantage of libraries such as tidyverse and dyplr for this project.

Challenges We Ran Into

This project showed us that the logic of R is very similar to that of other programming languages we have dealt with in the past but syntactically very different. We had many frustrations with understanding the syntax of the commands and the format variables needed to be passed in to methods to output our desired result. Although learning R was fun and built our skills as data scientists since R is very prevalent in the field, we believe that we could have achieved the same results in Python in a fraction of the time that it took us to complete this project.

What We've Learned

Aside from the insights we found from our data, we were able to learn a lot more about Data Science by doing this project for HackMIT this year. Not only was our team able to learn a new programming language, R, but we also learned more about this growing field and whether or not we see ourselves exploring opportunities in it down the line.

From a data science perspective, this project taught us the importance of tidying your data to unbias your results. Null values must be handled appropriately so that meaningful analysis can be produced. When we started hacking, we decided to include all the values from the data-set we pulled which led to the U.S. territories to be included in our analysis. However, these territories had inconsistent representations when compared to the other 50 states + District of Columbia which skewed the summary statistics and regression models we fitted for the data. By removing them, we were able to get higher r values in our regression models showing that the result was a better match. This project also showed us that visualization of data is a great tool to be able to make a preliminary estimate based on your eye is a relationship might occur. If there does appear to be a trend visually, a full analysis can be conducted to examine more closely if it is statistically significant or not.

Special Thanks

Ultimately, we want to thank the organizers of HackMIT to allow our group to come together and learn new skills together and allow us to examine something that has affected each of us individually: the U.S. Education system. Because all members of our team are from different states (Jacqueline from Maryland, Joshua from Georgia, and Ethan from Massachusetts) we were able to learn more about how funding works for our own school systems but the role it also plays into general statistics as a whole. Through our participation in the education track here we were able to not only learn more ourselves, but also produce some meaningful analysis on the education system as a whole!

Built With

Share this project:

Updates