The main page says that the statistics shown are inherently unreliable!.
Don't understand. Appears at first blush that the statistics are no less reliable than the ratings themselves. Actually, the statistics might be more reliable than any single rating given that the statistics are calculated over a large population of games.
If this is the case, then perhaps the statement might be changed to speak of the reliability of the ratings rather than the statistics that have been calculated from using the ratings to perform the initial sorting of players into bands for tournaments.
because with the predominant mcmahon tournament system the players getting games against better ranked players are not a random sample of all players. (but those who win the first games, i.e. more likely the stronger ones) this may have a sizeable effect in lower ranks where a slow-adjusting rating system meets a fast-improving players. just think of it as a sort of unavoidable systematic error.
Its a system is an attempt to model events. The model associates numeric values to the two objects interacting in an the event based upon the outcome of the event. The goal is to predict the outcome of future events between any two objects that have associated numeric values. The way to determine the usefulness of the model is to compare the predicted outcomes of an event with the observed outcome.
The events may be collisions between balls with associated numeric values being velocity and mass. The events may be games between two players with the associated numeric values being called "ratings". The main differences are our methods of measuring mass and velocity of the balls are distinct from the events and we generally do not allow the balls to change their mass (at all) or velocity except through collisions. So the problem is a little harder.
Regarding "getting games against better ranked players" not being a random sample, there are immediate objections. First, it seems similar to the survivor effect in life insurance. Sometimes the best predictor of living to 60 is living to be 50. This difficult does not appear to harm the insurance companies ability to predict life expectancies over the total population of policy holders. Second, each players "getting games against a better ranked players" are matched by the number of players 'getting games against a lower ranked players'. It is these very events that should increase the reliability of the ratings most quickly compared to having players play games against equal ranked players only. Third, the inaccuracy is in the ratings that the statistics of the observed events (games).
The basic objection is this:
What do I want to know?
Suppose I want to know how, on average, two random players of strength 2k and 1d will perform against each other. Can I find that information in this table? No, because the selection of the 2k and the 1d in this table is not random.
Suppose I want to know how, on average, two players of strength 2k and 1d, selected by MMS, will perform against each other. Can I find that information in this table? No, because the table contains McMahon, Swiss, Round robin and other tournament formats.
This simply not an unbiased sample.
If we had more specific data, such as the tournament setting for each game, which round it was played in, etc, etc, then we could use it to generate reliable information on specific queries. With only this table we cannot.