Inspiration

Player performance, interaction in social networks and international transfers in professional soccer are an exciting area to work in for YB fans. To better understand the interdependencies between these building blocks and to analyse real data is a great inspiration.

Social media and text analysis are currently very challenging topics. We are inspired by the fact to build solutions that work for German texts and not to stick with omnipresent English samples.

What it does

The tool automatically scrapes social media profiles and selected news sources (Blick, 20 Minuten, SRF and YB Spielberichte so far, but can be extended to deliberate news sources).

The news articles are analysed in two ways:

  • Entities are automatically extracted, i.e. about whom is the article, which organizations and which places are in the text
  • The overall wording is evaluated and scored, whether it is positive or negative

Performance

How I built it

We used python and some of the "default" data science tools on top of it. We collected as many data as we could in the short time period. Analyses were conducted using the data science notebooks (Jupyter) and then transferred into a REST-API written in Python and a frontend written in HTML5 and Javascript.

Challenges I ran into

Several sites presenting aggregated statistics are very hard to automatically harvest. News sites were rather easy in comparison.

Sentiment-analysis of non-english texts is a challenge.

Real statistical evidence for the finding could not be generated in the given time.

All soccer statistics are in favour of offensive players: goals and so on, no statistics that reflect good defensive playing,

Accomplishments that I'm proud of

The overall positive or negative "mood" of the press reports matches quite well with the performance of the players. Here we have the bias that the players that score the goal are more prominently in press and in public opinion.

What I learned

Analysing media texts and correlating the entities with the players to generate an overall view on the player.

What's next for Team 15 Kosmonauten Mobi-Challenge

We will try to collect a few more datasets and calculate some statistical evidence on some of the findings. Presenting the sentiment algorithm in a blog post to present the non-english example to the data science community.

Perhaps we will try to develop scores values for some of the soccer specific vocabulary.

Share this project:

Updates