Inspiration
We were inspired to focus our efforts on domestic youth development by a quote from Jurgen Klinsmann. In 2014 the former USMNT coach told FIFA.com: “Maybe we can find someone kicking a ball around the streets ... Maybe there is a Messi hiding somewhere here in the States. Who knows?"
With the US failing to qualify for the 2018 World Cup, attention turned quickly from the problems on the field to the state of the US development system. Were American kids with talent or potential talent being located and developed by the current system? Or was the "pay to play" system skewing the focus of youth development towards kids in the suburbs and large cities who can easily afford to participate in high-level soccer?
With Klinsmann's hope (and our own) to succeed as a national soccer program we looked to solve this problem: Given the variables of population and income, how can the pattern of youth team development teach us how much USMNT talent a typical county produces and how can we find areas that are underrepresented that could yield great talent, an American Messi.
What it does
Our application/data shows which counties in America - based on US Soccer call-up rosters and population and median household income - contribute more players than expected to the US Men's U-15 through U-23 national teams, which contributed as expected, and which were contributing fewer players than expected, and, therefore, where the opportunities lie to find the first American international superstar soccer player.
How we built it
We started by writing a scraper program that ran through ussoccer.com and compiled the names and hometown cities of all of the players called up for action in the US Men's U-15 through U-23 teams. We then mapped those players to their home counties. We gathered the populations and median household incomes of all of the counties in the US from the Census Bureau and appended those to the player data. Then we ran regression analyses of those data to find how many US Men's team players were from the typical US county by population and household income. We then mapped the residuals on a map of the United by county to show which counties were contributing more youth players than expected, which were contributing as expected, and which were contributing fewer than expected.
Challenges we ran into
We ran into a a few challenges. Difficulties included: irregular screen scraping, organizing data from different sources into the same geographic unit, joining the data and accounting for numerical similarities and anomalies.
Accomplishments that we're proud of
We accomplished several things. We developed information about youth development that is predictive and not descriptive. Real world data were used to show parts of America where we ourselves did not know that there could be more international-level youth players. We all learned more about Jupyter and regression analysis. And we all learned more about teamwork and friendship, keeping it together even with the lack of time and sleep. It was a good time.
What we learned
Through running regression analyses, the data brought us to these insights:
- Population matters. The population of a county aligns with the number of prospects that will be called to the national team. Thus, attention and resources should be allocated to populousareas that have underperformed expectations.
- Income matters more. Much more. Youth players are more likely to be called for duty with the national team at every level as the income of their home county rises. The wealth of that area has a statistically significant affect on a prospect's chances. As a statistical overview of what we all know to be a significant issue for the national program, our project creates a benchmark of the reality and expectations for youth players who have the financial ability to play high-level youth soccer and those young players who lack the opportunity to be discovered.
What's next for American Messi
The USSF can use our work and augment it with additional data and analysis. Our model makes predictions that can be tested in the real world and the USSF can re-examine past scouting efforts or send additional scouting resources to the underrepresented counties of America to find international-caliber men's soccer players.
Built With
- pandas
- python
- scrapy
- statsmodels
- us-census-bureau



Log in or sign up for Devpost to join the conversation.