There is plenty of common wisdom when it comes to evaluating teams and criticizing computer rankings. One of the most common pieces is that certain games need to be considered more than others, for a few reasons.

The most obvious case is that if two teams are playing a rematch, one would expect that previous games between the two teams are more significant than their other games. The reasons for this are straightforward and very reasonable: a computer model in which each team is "X" good by nature will ignore any varieties in style. For example, a basketball team with good perimeter shooters will have an easier shot against a team with poor perimeter defenders and good inside defenders than it would against an equally good opponent for whom the reverse was true.

The second case in which common wisdom would rate games to be of unequal importance is when it comes to "big games". In other words, if looking at several very good teams, you would be inclined to consider games against very good opponents to be more telling about how good the team actually is. A team that is highly rated without having played another very good opponent is thus considered suspect.

With the available score data, it is possible to test these cases.


Predicting Rematches

This will be a fairly math-intensive section, since the tests I am running are based on a variant of the equations provided in the ratings information. For those of you who followed along in those pages, I felt you should see the exact math that went into this. For those who don't want to plow through math, I invite you to skip past the final equation in this section and resume reading.

Let's suppose that matchups are important. So rather than a team's odds of winning being merely a function of the two teams' strengths, there is an additional factor regarding the particular matchup between these two teams. Expanding the formalism from the Game Function tutorial, we would write that the probability of team A beating team B is given by:

   CP(Ra-Rb+Mab),

where "Ra" and "Rb" are the two teams' strengths and "Mab" is the adjustment for the matchup in question.

Recalling from the tutorial that the game function "G" translates into the rating difference with a 1-sigma randomness of one, this means that in a game played between teams A and B, the resulting game function should equal:

   Gab = Ra-Rb+Mab +/- 1,

where the "+/-1" term tells the 1-sigma amount that "Gab" may change because of randomness.

Now if the teams play twice, the expected value of Gab will be unchanged, but the randomization will be different. Rules of error propagation show that two random drawings, each with an uncertainty of +/-1, are equivalent to a single random drawing with an uncertaint of +/- sqrt(2). Thus we expect the difference between "Gab" from the two games to equal:

   Gab1-Gab2 = (Ra-Rb+Mab +/- 1) - (Ra-Rb+Mab +/- 1) = 0 +/- sqrt(2).

Doing a little math gives:
   (Gab1-Gab2)^2 = 2.

In other words, the average difference between the two game functions between the same set of opponents should equal exactly two. This can be tested easily in my extensive collection of scores used to compute rankings, and indeed is confirmed. (In actually carrying out this test, I had to make corrections to the G values for game locations, of course. Because of the chance that home field advantages could vary widely, I ignored rematches played at the same location.)

Now for the second part of the test, which involves three-team sets. Instead of having two teams playing twice, let us examine what happens when three teams all play each other in three games. From the above equation, we get:

   Gab = Ra-Rb+Mab +/- 1
   Gac = Ra-Rc+Mac +/- 1
   Gbc = Rb-Rc+Mbc +/- 1,

where the definitions are the same as above. Using a little bit of algebra and the fact that three random drawings, each with an uncertainty of +/-1, are equivently to one random drawing with an uncertainty of +/- sqrt(3), we get:
   Gab-Gac+Gbc = (Ra-Rb+Mab+/-1) - (Ra-Rc+Mac+/-1) + (Rb-Rc+Mbc+/-1)
               = Mab+Mac-Mbc +/- sqrt(3).

In simpler terms, because the matchup effect should be different in all three games (because of the particulars of each pair off teams), the expected combination of game outcome functions is non-zero. The expectation value of the square of this thus equals:
   (Gab-Gac+Gbc)^2 = (Mab+Mac-Mbc)^2 + 3.

OK, we're almost there. Instead of worrying about one particular set of three teams, we instead need to figure out how to use the formula above over our entire data set. To do this, we need to know the distribution of the values of Mab. Naturally they must center around zero, and the central limit theorem would suggest that the distribution should be Gaussian. Using "M" as the one-sigma width of that Gaussian distribution, we expect that the average value of:

   (Mab+Mac-Mbc)^2 = 3 M^2

Now that we have this calculated, we know that, on average, the following equation will be true:
   (Gab-Gac+Gbc)^2 = 3 ( 1+M^2 ).

Using "<x>" to denote the average value of "x", we can combine the above equation with that from the first sample to give:

   M^2 = 2/3 * <(Gab-Gac+Gbc)^2> / <(Gab1-Gab2)^2> - 1

Putting the above into plain English, we have two samples. The first sample is one in which two teams play each other twice. Because the teams are the same both times, whatever peculiarities in the matchup should be the same, and thus the expected difference between the two outcomes is merely two games' worth of randomness. In the second example, we have sets of three teams that each play the other two once. In this case, the matchup details do not cancel out, and thus the expected difference between the three outcomes equals three games' worth of randomness plus a function of how important matchups really are.

I have to admit that I was very surprised by the results. For every sport, and in every level of play, I found that M^2 was consistent with being zero. Averaging over my complete data set to reduce the effects of random noise, I measure a value of M^2 = -0.007 +/- 0.003 -- in other words, a 99.9% confidence upper limit on M of 0.03.

The implication of this somewhat counterintuitive result is that matchup particulars can be safely ignored when analyzing team strengths, as well as when making predictions of future matchups. In other words, it is folly to give more importance to previous games between the two teams than you give to other games. To the extent that matchups may have an effect, it is less than a 1-2% change in the win probability.


Games Against Similar Opponents

The second test I have run uses very similar principles to the second calculation from the preceding section. Taking my sample of three-team sets, I have divided them into situations in which the three teams are of similar ability, and those in which the three teams are of different ability. (In running this test, I used only the college data, as the difference between good and bad teams is much more in college than in pro sports.) The hypothesis I am testing is whether or not the similar-strength sets are more predictable than the different-strength sets.

As you are probably guessing, I again measured a null result. In other words, I can learn as much from a mismatch as I can from a game between comparably-matched teams. Again, it should be noted that this is not because teams play exactly the same at all times, but rather because the random effects in a single game are much greater than any small effect from the matchup.


When Should Weights Be Used?

In light of the above information, I don't want to give the impression that games are always of equal importance. So here are two situations in which a sound computer ranking system would need to weight games differently.

First, if I am trying to evaluate how good a team is right now, I would want to give a slightly higher weight to the more recent games. The reason being that, although team strengths change only a tiny amount during the season, injuries, personnel transactions, and a team's development during a season means that the team's most recent game is more indicative of its current capabilities than was its first game. Because of this, my predictive rankings do give higher weight to more recent games (though again, I should emphasize that this effect is small).

The more important situation is if you are trying to rank teams based on wins and losses only. A win-loss system that weights all games equally is basically the RPI, which is a very crude an inaccurate ranking system and cannot be recommended for use. In this system, "big wins" and "bad losses" are the most important games in the calculation, while wins over inferior teams and losses to superior teams are given the least importance. (Thus, the "common wisdom" regarding the need to test a team against tough competition is accurate inasmuch as the average person does not know how to rate a team accurately from scores, and instead needs to look primarily at wins and losses.)


Summary

Effects of matchups -- either particular matchups between two teams, or merely based on quality of competition -- are effectively zero in pro and Division I sports. This doesn't mean that such factors are foreign to the game, but rather that their impact on the game is so much less than the amount of randomness in a game that they can safely be ignored.

The effect is that, when predicting future outcomes based on scores, one should give no extra weight to previous games between the two teams in question or games against better competition.


Return to ratings main page

Note: if you use any of the facts, equations, or mathematical principles introduced here, you must give me credit.

copyright ©2005 Andrew Dolphin