Wednesday, April 20, 2011

Why Shot Differential Actually Hurts a Club's Chances of Winning a Match

Over several posts about Arsenal's odds of winning matches I have highlighted the fact that a club's increase in shot differential versus the opposition actually lowers that club's odds of winning the match.  This may seem a bit counterintuitive, but it actually makes sense upon a deeper dive into the data behind the binary logistic regression (BLR) model.  This post examines those details and provides the explanation.

The Effect of Shot Differential on Match Odds

Recall that the BLR model I have constructed for the wider league and individual teams consists of the following inputs:
  • A constant term
  • Venue (home/away)
  • Shot differential
  • Shots-on-goal differential
  • Foul differential
  • Red and yellow card fantasy point differential
It turns out that for the league, all terms in the BLR are statistically significant to a p-value of 0.05 except for foul differential.  When individual teams are examined the terms which are statistically significant vary by team, with statistical significance given to any term with a p-value < 0.10 to ensure a reasonable number of terms are included in each team's model.

To understand how shot differential impacts the average team's odds of winning a home or away match, the BLR values for shots-on-goal and fantasy point differentials were set to their averages by venue while the shots differential was varied from the minimum to the maximum value by venue.  The resultant plot of data is shown below (click on graph to enlarge it).


The graph shows the decreasing nature of the odds of winning as shot differential goes up both home and away.  The slope terms of each regression equation indicates that home teams pay a bigger penalty for increasing shots - only three additional shots are required to lower the odds of winning by 2%, while it takes four shots away to have the same effect.  The fact that the dashed lines, which represent the bounds of the 95% prediction intervals for the two venues, do not cross indicates the difference in odds between in home and away is statistically significant.  The average home team performance for a given shot differential is more likely to win than a similar shot differential for the average away team performance.  Similar relationships were seen for every team that had a statistically significant shots coefficient for their BLR model.

Some Binary Logistic Regression Nerdery

Before explaining why increased shot differential impacts the odds of winning a match negatively, a little digression into the basic theory of binary logistic regression is in order.  A BLR model based upon the data I have from DogFace is of the following form:


where:
p = odds of winning a match
C = constant
α = BLR coefficient associated with venue (home vs. away)
A = venue (1 = home, 0 = away)
β = BLR coefficient associated with shot differential
B = shot differential
δ = BLR coefficient associated with shot-on-goal differential
D = shot-on-goal differential
ε = BLR coefficient associated with corner differential
E = corner differential
φ = BLR coefficient associated with foul differential
F = foul differential
γ = BLR coefficient associated with fantasy point differential
G = fantasy point differential (red cards = 6 points, yellow cards = 3 points)

If the equation is re-arranged to eliminate the natural logarithm term on the left hand side of the equation, the relationship between p and the rest of the terms in the model becomes:


If the chain of exponentials on the left hand side of the equation is represented by the term x, and then the equation is re-arranged to isolate for p, the model becomes:


Thus, p increases as x increases (by the chain of exponentials increasing).  The inverse is also true - a lower x value produces a lower p value.

Isolating for the two terms involving shots means examining the impacts of β, δ, B, and D.  If (β x B) > (δ x D), the net effect will be a shots metric contribution to x that is less than one, and thus lowers p.  If (β x B) < (δ x D), the net effect will be a shots metric contribution to x that is greater than on, and thus raises p.

Not All Shots Are Created Equal

How is it that increasing shot differential ends up lowering a club's odds of winning?  It is because all shots are not created equal, and that it is actually shots-on-goal differential that increases a club's odds of winning.

The β term in the BLR must be negative for a team's odds to decrease with increasing shot differential as observed in the first graph in this post.  Indeed, this is the case for the league and club models for which shot differential is statistically significant.  The key is that it must be offset by shots that are on goal.  It turns out that the δ term for the league and all teams for which it is statistically significant is positive.

Not only is the shots-on-goal term positive, its magnitude is much larger than the negative.  Thus, the impact of shots-on-goal is greater than that of shots.  One sign of a team's strength is the ratio of δ to β - a higher δ to β ratio means that they get a larger relative benefit from shots-on-goal relative to shots.

A table of such ratios for each team that has both statistically significant β and δ terms is shown below, as well as the league's ratio.  The table is arranged in descending order of magnitude (the negative values reflect the fact that the shots-on-goal coefficient is positive while the shots coefficient is negative).


The Impact of the Ratio of Shot Differential to Shots-on-Goal Differential on Match Odds

The plots below demonstrate the effects of the ratio of shot differential to shots-on-goal differential on match odds for the Big Four.  The first graph shows the impact on odds for a home match, while the second graph shows the impact for an away match.  Shots-on-goal differential is held constant at the average for each team by venue, while the shots differential is swept by a multiple of the average.  All other match statistics - differential for corners, fouls, and fantasy points - were also set to the average for each team by venue.



A few general conclusions can be drawn from the graphs:
  • Liverpool and Manchester United experienced some of the highest negative ratios, where their odds of winning a match approach 100%.
  • Arsenal's superior ratio shows up in the shallower slope of their line in both home and away matches, although they have a lower probability of winning a match at their average form compared to the other three clubs.
  • Match odds are far more linear away than at home.
  • Chelsea seems to be the most robust to positive ratios given their higher overall odds at ratios greater than 3:1, which is where clubs begin to wipe out any advantage from shots-on-goal.
  • Manchester United and Liverpool have a very similar odds at home, while Manchester United has far superior odds away (~10% better odds regardless of ratio).
So, not all shots are created equal.  Every additional shot-on-goal that a club realizes versus the competition helps raise their odds of winning.  Indiscriminately taking a shot just to do so does nothing, presents a wasted opportunity as it never really had a chance at going in the goal, and statistically it actually lowers the team's odds of winning the match.  The key is getting the highest shots-on-goal differential to shots differential ratio as possible to gain the highest chances of winning the match.

No comments:

Post a Comment

LinkWithin

Related Posts Plugin for WordPress, Blogger...