Inspiration
Although we are proficient programmers, we set out to prove that being a valuable data analyst primarily consists of many more important skills such as data visualization and story telling. The hardest part in data analysis, we know, is sorting through the gigabytes (or sometimes terabytes) of information for trends that improve business outcomes. Power BI was our inspiration, being a low-code option for powerful charts and insights. It also doubles as your database, eliminating the need to learn databases such as Postgres or languages like SQL for modestly sized datasets. It cuts down on time, effort, and knowledge needed for business insights. The PowerQuery editor also saves your transformations on the data.
What it does
We took the advanced track and analyzed enrollment vs poverty, distribution of loan stats, and number of students enrolled in advanced mathematics vs mortgages. We offer a selection of filters to present the data in various views through online analytical processing. It is published to the Power BI cloud services for viewing in a web browser, where it can easily integrate with other Microsoft products such as Azure Data Factory, or Microsoft PowerPoint.
How we built it
We used a Python script to fix newlines in one of the source files. After that, we proceeded to learn about the datasets and load them into Power BI desktop utilizing the PowerQuery editor and M language. This tool allows you to create many intricate transformations such as type conversions, math, and text extraction with the click of a button. Simultaneously, we checked the columns in the tables which could facilitate relationships. In Power BI, relationships are used to join data while also filtering the view; thus, we needed to carefully evaluate the design of our table relationships and the direction the relationships pointed in. After this, we could easily drag and drop the columns into our graphs. We used DAX to create many custom formulae while exploring the columns and correlations within the dataset.
Challenges we ran into
We attempted to do many custom calculations using DAX in PBI. However, we could not get the filter context to pass into the DAX expression properly. It was confusing trying to navigate some of the table trees, as they were done quickly rather than properly. Part of this was due to not having complete listings of IDs and codes.
Accomplishments that we're proud of
We did not use SQL and only 5 lines of Python code to fix errors! We also did not use any database technologies. Most of our work was simple point-and-click that anyone can accomplish if you look for the button hard enough. We are proud to have deviated from the traditional route we have been taught in school.
What we learned
The best way to approach a data analysis problem is to fully understand what each of the acronyms and columns mean, and to chart out the potential relationships. Knowledge was really half the battle and the solution to all of our problems. We also learned how confusing DAX expressions become once you start reaching into data from other tables with many-to-many cardinalities!
What's next for Educational Attainment Exploratory Data Analysis in Power BI
We would like to do a lot more data validation and subsequent data cleaning. We identified many anomalies in the dataset that didn't seem right to us. These could be outliers; however they could also be the result of incorrectly recording the data. We would also like to do more robust data modeling. We would like to make fully fledged dimension and fact and dimension tables - a star schema or even a more robust snowflake schema - and have complete listings of many of the IDs and codes found in the dataset. This would enable us to optimize online analytical processing to handle data at scale and explore the data even further.
Built With
- powerbi
Log in or sign up for Devpost to join the conversation.