Ranking Tutorial: Little Details that Matter

Home Field

The question of home field advantage deserves some mention early on. There are three ways to treat home field advantage: as a score adjustment, as a schedule strength adjustment, or ignoring it altogether. The last option is unacceptable, given that teams clearly perform better at home than they do on the road. In my many years of ranking various sports and leagues, I have solved for the home field factor and always found it to be significantly positive. (If home field advantage didn't measurably help teams play better, I would find negative values as often as positive.)

Of the remaining options, adjusting game scores is also troublesome. If the home field factor is 3.5 points, does this mean a 3-point win become a loss, while a 4-point win remains a win? What if the team with the 3-point win scored a touchdown with PAT in the final 30 seconds? Had they known (and cared) that I would consider a 3-point home win a loss, wouldn't they have gone for a 2-point conversion instead? In short, a win needs to remain a win because that's what the teams are worried about.

This leaves the final option of treating home field as an adjustment to the quality of one's opponent. This intuitively makes sense; playing the #25 team on the road may be like playing the #10 team at home. Likewise, a team that played an excess of road games indeed faced tougher competition than they would have with a balanced schedule.

This may seem like an obvious and trivial point, but it is something that the most accepted computer ratings do not agree on.

Other Sports

In hockey and soccer, tere are many possessions, with either a score (1 point) or not (0 points). In other sports, more than one point value per possession is possible. (Note: in baseball, one possession is one inning, not one at-bat.) There are two ways of addressing this complication. One is to consider all possible combinations of scores that would create S points; to do so is quite difficult but possible. A simpler solution is to divide S1 and S2 by a "typical" number of points per score. Tests of the full-blown probabilities indicate that the "typical" number of points is not the average, but rather the average weighted by the number of points scored. For example, if a football teams scores field goals (3 points) and touchdowns (7 points) with equal frequency, the scaling factor is 5.8. Using actual frequencies of scoring types and factoring in safeties, missed PATs, and two-point conversions, the value is 6.2. In basketball, I use a value of 2.15; in baseball 2.65. (To take the factorial of a non-integer, replace it with the gamma function.)

Overtime Games

Another note involves the treatment of overtime games. My research has found that overtime results are 50-50 propositions. In other words, the better team wins only 50% of overtime games. Perhaps it is 50.5%, but the deviation from 50% is too small to be accurately measured. Thus G(sa,sb) is set to zero for an overtime game, regardless of the final score.

Strategy Adjustments

Implicit in the binomial statistics is the assumption that the odds of scoring on any possession is constant over the course of the game. This is not true, as players and coaches will adjust their tactics based on how the game is progressing. This is modeled in two different ways. The first way reflects the changes in coaching strategy -- the winning team will try to protect the lead by playing conservatively, while the losing team will try any measure to get back into the game. Effectively, this means that both teams' scoring odds are lowered by the winning team's strategy changes, and both are raised by the losing team's strategy changes. In football and basketball, the changes made by the losing team tend to outweigh those made by the winning team; this raises F since a team in either of those sports can effectively prolong the game. The defensive changes tend to be more important in the other sports, lowering F values.

A second element of adjustments for a lead is the tendency of players to play up or down somewhat to the level of their opponent. This is seen empirically by the fact that it is much easier to predict a basketball game's margin of victory than it is to predict the total score. In other words, a team with a 15-point lead will tend to let up a little bit, which prevents the lead from getting much bigger. Unfortunately, this lowers the leading team's X value and raises the opposing team's X value. For the purposes of this section, we can model this also by changes (reductions) to F. In most sports the change is quite small (around 0-30%); in basketball it is nearly a factor of two.

Note that this correction assumes that all teams use similar measures to avoid running up the score. This is true for the most part, but points out the reason why margin of victory should not be used in postseason selections such as the BCS or NCAA basketball tournament. If teams know that margin of victory is important, they will try to run up the score, thus destroying the validity of a margin-of-victory rating.

Preseason Rankings

A discussion of the need for and calculation of priors is given in the constructing a ranking page. As described there, priors are used to constrain the overall set of team rankings to be within a typical range.

An alternate use of priors is to enhance the numerical stability early in the season, when insufficient data exists to draw significant statistical conclusions. If one has a guess of how good the team actually is, then the team's rating should equal that guess before games have been played and move away from that as data is collected. I treat this prior data the same as games, in that a team that has not yet played some minimum number of games has its schedule padded with ties against a team of its guessed strength. Thus, if the minimum is 6 games (the value used for my football ratings) and a team with a preseason rating of +0.9 has played 4 games, then 2 games are added that are treated as ties against a team of strength 0.9. This adjustment is made in the three main ratings (standard, simple, and predictive) and for the improved RPI. An analogous adjustment is made in the college pseudo-poll. Teams with priors in use have a "P" shown at the end of their rating lines.

Schedule and Conference Strength

Schedule strengths have been calculated several ways, but what is the best way of doing it? Consider two cases. Team 1 is an outstanding team that plays most of its games against mediocre teams (i.e. teams ranked near the middle of the set). Team 2 is also an outstanding team, but it plays half of its games against other outstanding teams and half against horrible teams. According to an RPI ranking, the two teams would have the same (or nearly identical) schedule strengths, since the RPI uses the straight average of the opponents' winning averages. However, it is clear that Team 2 challenged itself much more. Given average luck, team 2 probably beat all of the horrible teams but only half of the excellent teams, thus winning 75% of its games. Team 1, on the other hand, probably won 90% of the games against mediocre teams.

Thus the key factor is the team's most likely winning percentage against its schedule. To calculate this, use the principles described above. Given the team's ranking, its opponents' rankings, and the home field advantage, the odds of winning is:

   P(win) = integral(x=-inf,dr) exp(-0.5*(x^2)) / sqrt(2*pi),

where dr is the difference between the team's ranking and its opponent ranking, accounting for home field advantage.

One must then make this calculation for each game the team has played, giving the average number of wins. Dividing by the number of games gives the team's average winning percentage against its schedule. Setting P(win) to this value and solving for dr by inverting the integral calculation thus gives the "dr" value of a "typical" opponent, where "typical" means an opponent that, if played in every single game at a neutral site, would give the same average winning percentage as the set of opponents a team actually played. I define this as the schedule strength.

Conference ratings are calculated in a similar manner. A conference's rating equals the rating of a team that would be expected to go 0.500 against the teams in the conference with all games played at a neutral site. Again, this calculation is most sensitive to teams in the middle of the conference, which are the games that can go either way.

Because the schedule strength depends on a team's own strength, two teams playing identical schedules will not have the same schedule strength rating. While this is intentional (what matters is how the schedule affects the team playing that schedule), there may be times in which it is important to have all schedules rated on the same scale. To do this, a second schedule measure is provided, expected losses (ELOSS). For pro sports, ELOSS gives the number of losses an average team would have if playing the team's schedule. For college sports, ELOSS gives the number of losses an average ranked (top 25 except for hockey, which is top 15) team would have against the schedule.

Head-to-Head Games

A common complaint about computer rankings is that they appear to completely overlook head-to-head game results, and will happily put one team immediately ahead of a team it lost to. As I have shown in a study, this is indeed the statistically correct treatment. Given three games between three teams, each going 1-1, it is more likely that there were two minor upsets and one favorite winning than that there was one significant upset and two favorites winning.

That said, if one wishes to accurately rank a pair of teams, the head-to-head result should be given greater importance. The reason has nothing to do with perceived importance of such games, but rather the fact that teams match up better against some teams than they do against others. So back to the three team example, where team A beat team B, B beat team C, and C beat A. If matchup effects do not exist, team A is more likely than not worse than team B and better than team C. However, accounting for matchups, team A is probably better than team B, in the sense that they would probably win the rematch.

In my rankings, I account for this factor only in the probable superiority (standard) ranking, where it can be done fairly easily because the ranking is based on team-by-team comparisons.

Return to ratings main page

Note: if you use any of the facts, equations, or mathematical principles on this page, you must give me credit.