Thursday, February 24, 2011

Comparing Econometric Models of the English Premier League: Reconciling the TPI and Soccernomics Data Sets

Note: This is a re-post from analysis I did back in January 2011 for the Transfer Price Index blog. I am posting it here to complete my series of posts on squad transfer costs, and to set up a forthcoming series of posts on the impact of starting XI transfer costs on table position

I’ve participated in many discussions since my original post on the relationship between a squad’s current transfer cost and their table position. Much of it has been centered on the debate over the predictive power of Soccernomics wage data versus my analysis using current transfer costs. Many readers on The Tomkins Times have come to the same general conclusions as me: each analysis has its valid points and different uses, and the two are not necessarily in conflict with each other.

I’ve also had the pleasure of discussing the two studies with none other than Stefan Szymanski. I plan on keeping much of our conversation private, but you can get a sense of his respect for the overall Transfer Price Index approach and the differences in the two data sets via his review of Pay As You Play. Stefan’s review is a positive one, summarized best in the following observation.

“[I]n a fascinating new book Paul Tomkins, Graeme Riley and Gary Fulcher have developed a method of converting transfer fee data into a squad valuation… With every squad member given a value, this can then be used to compare spending to performance in the league. It is a true labour of love, collecting all the transfer fee values for Premier League clubs going back to the beginning of the 1990s.”
Szymanski closes out his review with this glowing recommendation:

“The book is a treasure trove of interesting financial facts and would make a great gift for any football statto…”
What’s interesting is how much correlation there is between the Soccernomics wage data and the TPI’s cost of the starting XI. Stefan’s metrics in the column are both relative measures (RW for wages and R£XI for relative starting squad cost), and he observes they show 90% correlation to each other. Unfortunately, the Evening Standard did not include the very compelling graph Stefan generated as part of his review of Pay As You Play. Luckily, Stefan has supplied us with that graph and it is reproduced below.


The graph clearly demonstrates the correlation between the two metrics, the weakness of the models at either end of the table, and the strength of the model in the middle of the table. Stefan’s observation of over predicting the resources needed for top table positions has been invaluable in explaining the discrepancy between regression predictions and historical data related to Champions League qualification that will be discussed in an upcoming post.

Stefan’s review rightfully points out the reliability of the publicly audited wage data versus the TPI’s privately compiled transfer data. At the same time, I would stand by the TPI as the most comprehensive and meticulously compiled set of transfer data within the English Premier League era. It was indeed a “labour of love” for the authors, a labour that continues to pay dividends in our financial understanding of the league.

Beyond the quality of the data and its impact on any resultant statistical analysis, Stefan’s data set has a bit of an advantage over the TPI. The Soccernomics wage data looks at overall team wages, thus taking into account the total cost of operating the squad in current British pounds. Combine this with the fact that wages are a dynamic measure adjusted over time by team and player, while the TPI is a static inflation of a one-time transfer fee, and we see why wages may be a better predictor of actual team success. It’s also no surprise that the £XI metric correlates very well with that wage data, as it takes into account all the players who have made it on the pitch and how much time they spent on it. There’s no dead weight contributing nothing to the team’s performance on the pitch, good or bad.

Indeed, analysis by Graeme Riley and me has proven this point statistically. Graeme looked at the squad and XI transfer cost order versus table position, while I looked at the multiple of the average squad and XI transfer costs. Both Graeme and I calculated these for each team, and then quantified the correlation of each metric to finish position for each individual season via the square of the Pearson product moment correlation coefficient (the commonly seen R² value in a regression plot). In Graeme’s analysis, the order of £XI had a higher R² value than the order of Sq£ in 16 out of the 18 seasons. In my analysis, M£XI had a higher R² value than MSq£ in 14 out of the 18 seasons. In the final comparison, I looked at the average and standard deviation of the R² values for each metric – order of £XI, order of Sq£, M£XI, and MSq£ – to determine which provides the best, most consistent prediction of table position over the 18 seasons. The M£XI had the lowest overall standard deviation (14.7%) and highest overall average (45.4%), indicating it provided the best fit versus table position (although it is far lower than the R² values in the long-term analysis in my original post and Soccernomics). Ultimately, this confirms my preference for relative measures, especially multiples of averages, and why I prefer to look at long term averages rather than individual seasons.

On the other hand, the TPI data I used in my original analysis only considered the impact of the total cost of transfers on team performance, and neglected those of the free variety as well as trainees. It also doesn’t look to utilization rate. It essentially looks at a reduced data set from the full squad or starting XI, and the graph below quantifies how much of a reduced data set non-free transfers represent over the history of the Premier League.


The graph above shows the cumulative percentage of three types of players within the league each year as categorized within the TPI – trainees, free transfers, and the rest of the players. The vast majority of this final category consists of transfers with confirmed fees, while the rest of it consists of a small number of players whose transfer fees couldn’t be confirmed. The graph is cumulative, so to understand the percentage of free transfers for any single year one must identify the free transfer value on the graph and then subtract the corresponding trainee value from it. As an example, the cumulative percentage (represented by the upper value of the red zone) in 2001-02 is approximately 30% while the league share of trainees is about 20%. This means that free transfers made up about 10% of the league in 2001-02.

What is clearly seen via the graph is that transfers have consistently accounted for nearly 70% of the Premier League’s players since its inception. That’s not to say 70% of the players transfer teams each year, but rather that at some point in their past they were purchased by the team they played for that season. What has changed over the league’s eighteen years is the number of trainees within it. This number has plummeted from nearly 30% of league player classifications in 1992/93 to below 20% by last season. Much of this change has happened due to an increasing number of free transfers, which were given official UEFA sanction with 1995's Bosman ruling. Free transfers have gone from only 2% of league player classification in 1995 to nearly 10% last season. Overall, transfers of any variety came to represent 80% of league players by the 2009/2010 season. In many regards, the Premier League is a microcosm of the increasingly globalized world it operates within: greater international ownership and investment, greater employee mobility, fewer employees staying with a single firm from “graduation” to retirement, and increased dominance by a few brands within the marketplace.

What this all means is that any analysis of league performance on a squad basis that uses the TPI is going to miss nearly 30% of the players in the league. Given that fact and the reasonably good R-squared value my regression analysis achieved, I would consider the relationship to be a reasonably strong one. Ultimately a study by Stefan Szymanski, similar to this one where he statistically examined the causality of the wage/performance correlation, would be fascinating. We might then determine whether it was transfer fees, wages, or table position that drove the relationship with the other two. That is a very advanced analysis best left to a statistician of Stefan’s caliber.

At the end of the day, what Stefan’s analysis, my analysis, and the overall TPI database prove is that one must pay, and pay big, to compete for the top few spots in the Premier League. One must pay dearly for the right to even negotiate wages with 70% of their players that end up on their squad, and then they must be willing to pay dearly again to keep the talent to challenge for a top spot. Each metric, whether it’s based upon £XI or MSq£, has its use in quantifying the roll of ever increasing transfer budgets in a club’s success. Generally, I concur with Paul Tomkins’ assessment that “Sq£ is the only predictive tool, but £XI is surely the better retrospective analyzer.”

To a certain degree this all makes sense, as we want a somewhat meritocratic system where excellence is financially rewarded. It all gives us pause, however, when the same teams can dominate everyone else each year by outspending their rivals, sometimes even with money that had no origination in the soccer world in which each team operates.

No comments:

Post a Comment

LinkWithin

Related Posts Plugin for WordPress, Blogger...