Velobici/Rating

Sub-page of Velobici

Rating WME in progress

Current Page (20060716)

Chinese: 等级 dĕng jí
Japanese:
Korean:

Rating shouldn't be aliased to rank, in the context of Go. However sometimes people will use the former in determining the latter.

tderz: I fully agree with above.

A rating is a calculated measure of playing strength. The most commonly used rating in Europe is [ext] GoR. However other formulae exist, on yahoo and in [ext] AGA for example.

One starting point for studying rating algorithms is [ext] ELO

Another less traditional example is the [ext] KGS Algorithm, although one could argue this is producing Ranks not Rating.

Notes

[edit]

Computation

periodic: new performance ratings are calculated of all players at a regular interval using previous ratings of opponents. (lae 11)
continuous: the new performance rating of each player are combined with previous performance rating of the same player using a weighted average based upon the number of games played in the tournament verses the total number of games played by that player. (lae 11)

[edit]

Statistical Basis

paired comparison scaling a main method of quantitative psychology (lae 4)
goal is to obtain an irreflexive ordering over the set of stimulus items (players) with relation > and obtaining
requirements to do so are 1) obtaining results for all possible combinations of items and 2) a "large" number of results for each combination. (lae 5). Neither requirement is met in practice.
probabilistic strict ordering of ratings. A strict ordering is unobtainable due to violations of the transitivity criterion and antisymmetry due to draws. (lae 5)
A classic paired-comparison scaling model: Thurstone model assumes that the flucuations in results by a particular player in any given game is a Random Variable and hence the flucuations will have a Normal Distribution. (lae 6)
Another paired-comparison scaling model: Bradley-Terry uses the Logistic Distribution in place the Normal Distribution. The Bradley-Terry model is more tractable than the Thustone model. (lae 7,8)

[edit]

Elo Rating System

Based upon the Thurstone model.
The Elo Rating System multiplies each of the differences between scale values/ratings by the product of the square root of 2 and the Standard Deviation and uses a Linear Transformation to place the mean at 2000 points. The Standard Deviation for an individual performance is defined to be 200 points. (lae 10)
Ratings do not change during an event (tournament). (lae 10)
Performance Rating Formula: Rp = Rc + D(p) where Rp is the new performance rating, Rc is the average of the performance ratings of the opposition prior to the tournament, p is the average of the sum of the probabilities of defeating each of the opponents, and D(p) is the difference between the tournament results and the probabilistic result. (lae 10)
Elo is a continuous compution of performance rating with these assumptions:

that the average competition used for the previous performance rating (previous Rc) has the same numeric value as the competition in the event being added to the performance rating (current Rc) (lae 11)
that the average of the sum of the probabilities of defeating each of the previous opponents (previous p) is within 3 standard deviations of the average of the sum of the probabilities of defeating each of the current event's opponents (current p) (lae 11)

From these assumptions, and some mathematics, the rating formula becomes Rn = Ro + K ( W - We) where Rn is the new rating, Ro is the previous rating, K = 4 times the standard deviation divided by the number of games to date, W is the number of won games plus half the number of draws and We'' is the expected number of won games plus half the number of draws based upon the rating differences with the competition over the course of the event. (lae 12)

[edit]

Elo Rating System has problems

Applying the rating calculation after a set of games is not mathematically equal to applying the rating calculation after each game. For example a player rated 2027 playing against two players, one rated 2800 and one rated 1100, will lose rating points (0.22K) if the rating calculation is applied to both games at once rather than each game separately even if the 2027 has the expected results of a loss against the 2800 player and a win against the 1100 player. (Actual result W is 1.00; expected result We for the two games using the average rating of the two players, 1949, is 1.22.)

[edit]

Elo on Rating Systems

Often people who are not familiar with the nature and limitations of statistical methods tend to expect too much of the rating system. Ratings provide merely a comparison of performances, no more and no less. The measurement of the performance of an individual is always made relative to the performance of his competitors and both the performance of the player and of his opponents are subject to much the same random fluctutations. The measurement of the rating of an individual might well be compared with the measurement of the position of a cork bobbing up and down on the surface of agitated water with a yard stick tied to a rope and which is swaying in the wind.

[edit]

Todo

Read Batchelder and Bershad 1979 The Statistical Analysis of a Thurstonian Model for Rating Chess Players Journal of Mathematical Psychology 19, 39-60.
Read http://www.ratingtheory.com/
Read Elo rating as a tool in the sequential estimation of dominance strengths
Read A Psychometric Analysis of Chess Expertise

[edit]

Chess has ties/draws, the number of which increase and the ratings of the two players increase, significantly affects rating systems
additional problems are time- and space-order errors (lae 5)

[edit]

Common Misconceptions

Ratings are not a measure of strength. They do not provide reproducible fixed points of strength the way a time in the 100 meter dash or height of a pole vault do. (lae 10)
Ratings systems do not use a standard unit of measure, hence the differences among ratings are self consistent, but the numerical value of the difference is valid only within the rating system. (lae 10)
Ratings are not uniform over time, rather values/ratings drift. Today's 1800 rating is not equal to tomorrow's 1800 rating.

[edit]

Limitations of Ratings

Ratings are meaningful only within the same system of ratings and same pool of rated players.
A single rating system will applied to two populations of players that interact infrequently will provide ratings for the two groups that are not comparable.
Ratings system use the arithmetic difference between two ratings to provide a probility that the higher rated player will win the game.

[edit]

Difficulties with Rating Systems

Change in player's abilities over time. (lae 9)
Flucuation in player's performance not due to a change in ability. (lae 9)
Change in player population due to new arrivals and departing players. (lae 9)

[edit]

References

--  The Legacy of Arpad Elo (PDF) 

--  The Method of Paired Comparisons
--  Paired Comparison Model at Springer-Verlag 

--  Bradley-Terry Model at Springer-Verlag 

--  Empirical formula for creating error bars for the method of paired comparison 

--  Stronger Opposition May Make It Easier to Win H.A. David, 1998

Velobici/Rating last edited by velobici on August 21, 2009 - 15:55

RecentChanges · StartingPoints · About

Edit page ·Search · Related · Page info · Latest diff

Velobici/Rating

Rating WME in progress

Current Page (20060716)

Notes

Computation

Statistical Basis

Elo Rating System

Elo Rating System has problems

Elo on Rating Systems

Todo

More

Common Misconceptions

Limitations of Ratings

Difficulties with Rating Systems

References