Inspiration

  • A member of the US Congress has tasked our team with designing a strategy to allocate resources to increase educational attainment.
  • We have used data from the Department of Education Office of Civil Rights (OCR) to examine how school discipline impacts educational attainment amongst students in poverty. ## What it does
  • We have a few visualizations of how AP enrollment and suspension relate to poverty rates in public schools in Texas as well as two comparison states, California and Florida. ## How we built it
  • For the data challenge text file, we used Excel and Python to count the number of school districts in the United States for 2017-2018 using two different datasets. I used Excel and Maddie used Python
  • In order to complete tasks 1 and 2, my partner and I used the program PostgreSQL for the first time to load .csv files into the program to be used as relational tables. This would allow us to query and aggregate the data across multiple tables using common variables. We then connected the data later to Tableau to make visualizations. I made all visualizations in Tableau.
  • In order to perform .csv uploads into PostgreSQL, my partner and I used ChatGPT to generate SQL code for each table using its unique variable names and data types. Maddie did the bulk of the work of using ChatGPT to generate SQL code.
  • Some datasets were missing values, so for those cases we used Python to clean them and remove the null rows before importing into PostgreSQL. Maddie did this cleaning in Python.
  • In addition to the CRDC .CSVs provided for each school, we also were able to use the SAIPE dataset to determine the poverty rate for school districts and compare this to other variables in our data. I did this comparison in Tableau and Maddie brought the SAIPE dataset into our SQL database. ## Challenges we ran into Neither of us had ever used SQL and it was necessary to use to work with relational datasets, so this presented a significant learning curve. Maddie also was not able to contribute on the second day due to personal complications, so Margot had to complete the remaining tasks alone. If we had had more time, we would have used R to finish cleaning the data (making a join) for task 4 to analyze how income relates to academic performance in high school. We also would have done more data cleaning to turn variables with negative numbers as labels into other formats that were friendlier to running analyses. ## Accomplishments that we're proud of We successfully imported all of the data into SQL and were able to perform queries and joins using the software. I also how to perform joins and more complex mutate functions in R through the help of the mentors. I was also able to learn how to connect PostgreSQL to Tableau which made visualizations a lot easier. ## What we learned The benefits of using PostgreSQL as a database management system The importance of starting with the context first rather than the tasks so that we could better guide our work towards the final product. ## What's next for Rowdy Datathon - Educational Attainment More data science courses and perhaps some more use of SQL in the future!

Built With

Share this project:

Updates