Our team developed the grounds for this project by first sorting through our options for the most readily available and easy to access data. We settled on FIFA World Cup data in CSV format as it was both free and already well organized. With a global increasing interest in soccer, we aimed to utilize a vast amount of historical data (every game played in the world cup through 2014) to provide context for which countries and regions are considered contemporary giants in the soccer world.

We utilized html and JavaScript with google GeoChart to create the world map containing all the data based on countries. Our script read the analyzed resulting data from the csv file and convert it into a format that could be displayed on the map.

In the analysis portion, the extract_data function first sorts through every line in the WorldCupMatches csv file and extracts every country that has competed in the World Cup until 2014. These names are added to a dictionary. The function then loops through the file once more and determines whether each country has won or lost a match based on their goals scored compared to the other team. If the countries are tied, the function then checks for penalties. The team with more penalty kicks wins a penalty. When penalty data was not provided, neither team received a win or loss point. These win/loss values were summed up for each country. Win rate was calculated as the total wins/(wins+losses) and assigned to the respective country. Very few countries had 0 wins and 0 losses. In these cases, they were given an extra loss point to avoid division by 0. The country and win rate rounded to 4 decimal places was then output to an external csv file and used as a data set for the map visualization.

We also analyzed the data based on how much they’ve score. We first read from the database in csv with pandas. We then add up all the score that each country goal and divide them with the sum of the home team goal and away team goal. This gives us an idea of how much the team score in the match and the probability to score in the future matches.

Share this project:

Updates