I just wrapped up a three week break from making statistics related posts. While I was not posting on the blog, I certainly wasn't sitting idle. I treated the break much like a professor might treat a sabbatical - as a chance to explore things the daily routine does not allow, all in the pursuit of improving a core skill set or level of understanding of a topic. I used the break as a chance to begin compiling new statistics databases as well as understand new statistical analysis techniques.
One of those statistical techniques will be featured heavily in the coming weeks, especially when it comes to Premier League match analysis. In the past I have used binary logistic regression (BLR) to look at how match statistics impact the likelihood of a team winning or not winning a match. A BLR can only predict one of two outcomes, which provides a bit of a limitation when soccer matches can end in one of three outcomes - loss, tie, or win. I was challenged by several commenters to explore other statistical models that would allow the prediction of probabilities for all three match outcomes.
Such a modeling environment is called ordinal logistic regression (OLR) (see this example for a more mathematical, but readable, treatment beyon what Wikipedia provides). As the name suggests, this type of regression model uses the order of the outcomes (low/medium/high, loss/tie/win, etc.) to build the model based upon the factors' impacts on the likelihood of falling in to one of the outcome "bins". The assumptions that must be satisfied and the math behind the model are a bit more complex than a BLR, but the insights an OLR can provide are far more powerful. When applied to the EPL match data that I have courtesy of DogFace, a model showing the probability of losing, tying, or winning a match can be predicted based upon how a team performed relative to the opposition in that match.
An example of such analysis can be found in the two plots below. These plots utilize data from the 2005/06 through 2009/10 seasons for all clubs in the EPL, and set the values for shot, shots-on-goal, corner, and foul differentials to their averages by venue (home vs. away). A sweep from the minimum to maximum values for red and yellow card fantasy points is then performed. The result is the two graphs below that show the effects on the probability of losing, tying, or winning a match based upon card differential (home on top, away on bottom). Click on either to enlarge.
These two graphs give us a much clearer picture of what goes on in matches home and away. In home matches, the crossover point where the odds of losing actually exceeds the odds of winning doesn't happen until a differential of about 8 fantasy points (the equivalent of nearly three yellow cards or a yellow and a red card). For away matches it happens a lot earlier, and in fact on the exact opposite side of the neutral line at -8. Clearly, venue plays a large roll in determining the odds of winning.
Even more important than the OLR itself is the subsequent calculations that can come from it. If we know the odds of all three outcomes an expected point total can be calculated from the following equation.
Expected Points = 3*P(win) + 1*P(tie) +0*P(loss)
This now means match statistics can be boiled down to a language with which we're all comfortable: match points. This is much more intuitive than odds of losing, tying, or winning and provides a very direct comparison between teams, referees, or other factors of interest.
Applying the above equation to the graphs for home and away performance generates a much simpler graph - what was two graphs with three lines each now becomes a single graph with two lines. Regression equations for the nearly linear relationship between fantasy points and expected math points are also shown.
The regression equations now provide a direct relationship between cards and points. For every yellow card (3 fantasy points) a team's expected match points are lowered by 0.1, and for every red card (6) fantasy points) a team's expected match points are lowered by 0.2. The percentage reduction in points will vary based upon the current fantasy point differential, but let's look at the example of a team playing to a neutral fantasy point differential. Playing at home they would expect 1.7 points while away they would expect 1.1 points. This means that for a home team playing to that level, an additional yellow card represents a 6% reduction in expected points while a red card lowers expected points by 12%. For away teams, the impact is even bigger. An incremental yellow card reduces their expected points by 9% and each red card reduces the expected points by 18%.
Keep in mind that this data does not include the most recent EPL season. DogFace has been gracious enough to provide me with such data, so I will be updating my database for each club's OLR terms. Over the coming weeks of the off season I will make several posts exploring the impacts such match statistics have on a number of team's match outcome odds, as well as update my referee analysis to reflect the new data and the new approach I have outlined here.
Stay tuned...
No comments:
Post a Comment