Soccer Dad at Soccer Quantified provides a nice cautionary tale about the over reliance on statistics in trying to predict the outcome of individual matches like those in the World Cup. He provides a nice link to my "Statistics Are Just Numbers" post, and after reading his post I feel compelled to comment on the subtle difference in some of the statistics I have used during this World Cup.
There are two types of statistics that I have used to predict performance in this tournament - longer term demographic data within the Soccernomics model and shorter term team performance data within the Footballer-Rating.com model. They both serve good purposes at
different stages of the tournament. Let me explain.
Socceromics Model In Group Play
The Soccernomics regression uses demographic data and match results over decades to predict the most likely outcome of a match between two teams. That is to say that it will tend to make better predictions in larger sample sizes where individual matches have less
impact (i.e. not win-or-go-home). Team adrenalin and short term boost have less of an impact in group play matches than knock out round matches, both in the many games that must be played and in the fact that each team knows that they must perform over those many games. This makes the Soccernomics model better suited for group play, although many statisticians might argue that three group play matches per team and 48 total matches might still be a smaller-than-ideal sample size for such a model. I will let my last post be the judge of that.
There is an additional benefit of using the Soccernomics model in the group play rounds. It will not only provide a reasonable prediction over time, but it also plugs a gaping hole - no player rating data exists for such matches. As much as we'd like to have a more direct method to compare the teams in their group play matches, there is very little short term data available to make such comparisons. The high pressure qualifying matches that might be the closest environment to that faced by World Cup teams are completed six months prior to the start of the final tournament. Team composition, chemistry, and momentum have changed greatly in those six months. The friendlies that most teams play as warm ups to the tournament are just that - they are good warm up matches, but they are not anywhere as competitive as a World Cup match. Thus, we must turn to longer term demographic data at the outset of the tournament to get a reasonable idea of which teams are going to make it out of group play and which teams aren't.
Footballer-Rating Model in Knockout Rounds
The knockout rounds are a different matter. After three matches of group play action, we can quantify how well a team did against World Cup caliber teams in a World Cup setting. We can begin to understand which teams are lucky to be in the knockout rounds based upon their play to date, and which ones got into them by dominating their competition. At this point, a player or team-based metric that evaluates actual performance should provide a better judge of future near term performance than a long-term demographics based model. Enter Footballer-Rating's player metric model, which got seven out of the eight Round of 16 matches correct compared to the Soccernomics model that got five out of the eight matches correct. Statistically, the chances that the Soccernomics model is more accurately predicting the outcome of the knockout matches versus simply flipping a coin is only 16%, while the Footballer-Rating method's chances are 76%. Why is that?
Recall from my last post where I showed that the ability of the Soccernomics model to accurately predict a "not lose" situation greatly increases once predicted goal differentials rise above 0.5. The problem is that most of the disparity in team performance based upon demographics is gone by the time the knockout rounds start. There were only two matches in the Round of 16 where Soccernomics predicted goal differentials larger than 0.5, and in both cases the predicted winner actually lost the match. In one case the Footballer-Rating prediction greatly favored the opposite team, and in the other case it predicted a virtual draw (which did occur during regulation and extra time). Combine these low goal differentials with the fact that the knockout rounds cannot produce a tie, and we see that the Soccernomics model is not a good fit for knockout round predictions. The Footballer-Rating model, which looks at recent play at the same tournament, naturally provides better predictions.
Quarterfinal Predictions
Based upon the accuracy of the two models in the two different phases of the tournament, I will be using the Footballer-Ratings model for the rest of the tournament. However, it must be said that their is one shortcoming with that model - the authors of the website that compiles the team ratings are not updating it with knockout round match data. I would have liked to have seen such updates, as team performance can change throughout the knockout rounds as the quality of opponents continues to rise. Nonetheless, Figure 1 shows the Footballer-Rating team average scores from the group play stage.
Figure 1: Footballer-Rating average team scores from group play matches
From a previous post we know that a gap of 0.7 or greater means that the favored team's chance of winning that match is 75%. With each match exceeding this threshold, we're likely looking at a semifinals round with Brazil, Ghana, Argentina, and Paraguay. Given that Ghana's predicted gap is the smallest and Uruguay's stiff defense, I would expect that the most likely upset might come from that match. I think Germany has a reasonable shot at slowing down Argentina. The Argentines are flying so high right now that they simply must regress to the mean at some point, and teams that perform so well and all of the sudden hit their first serious challenge often choke up a little bit. However, while I think the Argentina/Germany match may be close, I don't think the Germans have enough to beat Argentina.
So, here's my prediction for semifinalists: Brazil, Uruguay, Argentina, Spain. It's been a CONMEBOL dominated World Cup. Why stop now?
No comments:
Post a Comment