Elo Rating

    Keywords: Tournament

In the early 1960's, Arpad Elo developed the Elo rating system.

It was the first rating system that had probabilistic underpinnings. Originally, Elo developed it for the game of chess, and chess federations around the world adopted it quickly. It became popular and common for many other games too, including Go, Scrabble, table tennis, etc.


Elo Ratings

Game federations do not use identical (parameters for) rating systems. They attach different titles to a rating, and they have different rule sets to determine an initial rating for new participants.

Usually, an average amateur player's rating ranges between 1300 and 1700 Elo points.

[ext] U.S. Chess Federation's classes are:

  Elo rating    class    members
  -----------   -------  -------
  2200 - 2800   Master      4 %
  2000 - 2200   Expert      8 %
  1800 - 2000   Class A    12 %
  1600 - 1800   Class B    18 %
  1400 - 1600   Class C    18 %
  1200 - 1400   Class D    20 %
     0 - 1200   Class E    20 %

[ext] World Chess Federation's top ratings are:

  Elo rating    title
  -----------   --------------------------
  2650 - 2800   world champions
  2500 - 2650   international grandmasters

Winning Probabilities

The rating indirectly represents the probability of winning against other rated players. This probability depends only on the difference between the two players' ratings as follows:

   rating     probability
  difference  of winning
  ----------  -----------
     400       .919
     300       .853
     200       .758
     100       .637
      50       .569
       0       .500
     -50       .431
    -100       .363
    -200       .242
    -300       .147
    -400       .081

This represents the area under the standard bell-shaped curve where 200 * sqrt(2) points are taken as one standard deviation. The table shows some sample points on this curve, adequate for good approximations of rating calculations by interpolation.


Determing an appropriate initial rating

One method is: A new participant plays three initial games against opponents with already established ratings. These games, for example, account as:

  • won game: new member's rating = opponent's rating + 200 points
  • draw game: new member's rating = opponent's rating
  • lost game: new member's rating = opponent's rating - 200 points

These initial game results are averaged and used for the new member's initial rating.

Example: A new member loses a game against a 1700-opponent, draws against a 1400-opponent and wins against a 1300-opponent. The result is an initial rating of 1467 = ( (1700-200) + 1400 + (1300+200) ) / 3.


Converting Elo Ratings into Go Ranks

The Elo system can be modified to implement Go ranks at a Go server.

Internally, DGS uses:

  Points   Go rank
  ------   -------
   2300      3 dan
   2200      2 dan
   2100      1 dan
   2000      1 kyu
   1900      2 kyu
   1800      3 kyu
   1500      6 kyu
   1000     11 kyu
    500     16 kyu
      0     21 kyu
   -100     22 kyu
   -200     23 kyu
   -300     24 kyu

Related Links


Elo Rating Discussion

Tim Brent: Originally 2000 in the Chess rating was a base point, based upon a 50% score at the US Open. The original idea was using Chess to find out if mental activity decreases with aging.

PurpleHaze: This is not correct. Elo attempted to make his rating system line up with the existing USCF system that assumed a bell curve centered at 1500.

Frs: What does the Elo rating system have to do with an age-dependent decrease of mental activity?

Tim Brent: He had a theory that you could use success in chess as a basis for showing the effects of aging on mental activity, i.e. a player who could play at a 2400 level in his forties is now playing in his fifties at a 2260 level. Could it be proof that his cognitive ability went down 6 percent over that period? (Of course this theory doesn't consider that the aforementioned player might simply have started losing against a group of stronger players.)


TDerz

Depth of something ranked with ELO

The ELO rating depth also states something over the depth of the game. The total depth of a game is defined by two end points of the range of skills: the total beginner and the theorethical best play by an unfallible, allmighty creature.

Both are not easy to establish: Is someone already a beginner who just heard the rules, thereby setting the lowest standard or does it need several games until one has immersed the rules of a game and is able to play on its own? On the other end of the range on simply has to take the best player at a given time. The total beginner, yet playing on its own according to the simple rules can in Go safely be set at 30 kyu. Theoretical best play could result in the strength of an imaginable 13 dan according to measurements of standard deviations among professional games.

Only taking 20 kyu and 9 dan as endpoints makes Go until now the deepest game.[1] A rating difference of 2900 ELO points from Gu Li to a 20 kyu with 100 ELO points is a difference in insight into the game by 29 times the standard deviation (100 ELO points).

Chess in comparision has a similar endpoint (Gari Kasparow with once 2851 points, s.a.), yet the standard deviation is set at 200 ELO points. More difficult to compare due to the draws, however it results in a depth of Chess of (only) 14 layers of standard deviation if the total beginner in Chess had a rating of zero ELO points (which s?he has not AFAIK).

PurpleHaze: Experience shows that an adult of average intelligence will attain a rating of about 800 on the day they learn the rules. Small children and the handicapped may have lower ratings. Historically the lowest ratings recorded are in the New York Public School Championship - Kindergarden Division where they have in fact reached 0.

tderz: [ext] FIDE ratings do not display players below ELO 1600. Confirming you, and I've read several times that a novice's, beginner's rating would be around 1000. That implies IMO that Chess is less deep than Go: 10 layers of depths vs. 29.

I would guess that this is due to Elo-type rating systems only measuring strength relative to each other instead of absolutely. They have problems to provide an accurate measure for underrepresented parts of the playerbase. FIDE and USCF ratings are in this case a bit unreliable as Chess beginner are commonly not yet memember of an chess federation and thus not represented well. Considering the FICS where everyone is rated and the playerbase is a bit more diverse (yet still lacking in complete beginners) the weakest players are only rated at around 500 (and a true beginner would be most likely be weaker still). -- Flower, 2006-11-21

Paul Clarke: This came up a few years ago, when someone (Laszlo Mero?) analysed the 'depth' of a number of games and came up with 14 for chess (using the same argument as above) and 45 for Go (using 1 stone difference as one unit of depth, I think). I'm very sceptical that you can use these figures to show anything about the intellectual depth of the games; here's part of a Usenet article I posted at the time:

Consider a game that I've just invented: 'Gotac'. Gotac = 'Go, Toss a Coin': you play a game of go, following which the loser tosses a coin. If the coin comes down tails, he still loses, if it comes down heads the game is a draw. The best player in the world will score about 75% against the worst, so the game has only 1 level. However, anything you know about go strategy and tactics can be applied to the game and will improve your chances of winning, so it's at least as complex as go.

Comparing chess and go: if you play slightly better than your opponent in go, you will win by a small margin; in chess, you may well draw. In go, if you make a mistake against a weaker opponent, you will usually get chances to catch up; in chess, it may instantly lose you the game. Thus, there's a better chance of superior skill winning a game of go than a game of chess. This, I suspect, explains much or all of the difference in the number of levels.

tderz: I've read several of László Méro's books and also that one you mention (Kognition, Intuition und komplexes Denken, Rowohlt, 446 S., Euro 12,90, ISBN: 3499614197). I wrote an article about his depth calculation in reply to an article of [ext] Tim Krabbé, 214. 30 May 2003: Chess and Go or directly on [ext] http://www.xs4all.nl/~timkr/admag/go.htm. I cannot retrieve my own contribution at this moment, it might have been only by e-mail on the Dutch mailing list (I found it, it's in Dutch[2] and Dieter[3] also commented in the thread). Similar [ext] rec.games.fo. Concerning your examples (GoTac?):

  • I think you would see the same Go rating just very, very much compressed into the winning range 50-75%. If you'd ran ELO on it, the same levels or layers of standard deviation would reappear (this is just my guess I'm too weak at math, but convinced that an overlay of some totally random event on a normal distribution => remains a normal distribution! The volatility would change.). I.e. the best player in the world would indeed win 75% (I did ot check your calculation, I'm too lazy or stupid) and 29 standard deviations (of the mean strength away you have the weakest players, only scoring 50%. The term 'level' must thus be properly defined. I am too, often confused where this is put: Chess (75%?), Go (67% and there are other figures). I think to remember that László Méro puts it at 75% too.

Flower: I am a bit baffled by the claim that the standard deviation of a players performance (in Go) would be 100 as opposed to 200 (in chess). In [ext] this table, If the SD were 100 I would expect a Losschance of about 32% if playing against a 1 stone stronger player. To my mind this table indicates that the SD of ones performance (in Go) might be near 200 as in chess instead of 100. (which would be quite significant as then the 'depth' of Go would be reduced to about 20 (30k to 9p --> 3900 Elo delta --> /200=19.5) as compared to about 13 in chess (2850-200=2650 ---> /200=13.25) -- Flower, 2006-11-21

tderz Flower, you are correct a rating difference of about 100 is meant to distribute the winning chances of 1/3 and 2/3 to the weaker, resp. stronger player. This is the basis of ELO in Go.

In this empirical table the actual proportion of wins = Nw/Ng (winning chances) are ranging from 30% (for 4 dans) to 46.6% (11kyus) to win against an opponent just one grade stronger.

From 17k to 2k these figures are above 40%. From 2d to 5d it's 34 to 28% (I leave the 6 dans out due to too few 7dan opponents and games). In the dan ranks this table supports strongly a winning chance of only 1/3 per rating difference of 100.

As H.Hiddema states below, many Go tournaments are played under the McMahon system, which might heavily distort the statistics by (pref.) pairing strong performing, weaker ranked with weaker performing, stronger ranked players.
This could explain the unexpected high winning rate in the kyu range. However, there always exists a McMahon bar which prevents the highest ranked player (e.g. the only 6d) simply by winning with his higher MM-point (of 1 over a 5d) even when their results were the same. Hence, in a big tournament with lots of 6d around, the bar would set to e.g. 4 dan (mutatis mutandis for weaker, smaller tournaments).
The undisturbed 2-6 dan figures strongly support 1/3 winning probability.

If the outcome of this indicates that Go is indeed a very deep game, many Go players will simply nod.
Most of the discussion on ranking, ELO and rating turns about the chicken & egg question:

  • which comes first?
    • should the ranking (kyu, dan) correlate with the rating differences?
    • or should the rating be the basis of everything, hence also demotion (in kyu, dan) after some waek tournaments?

Velobici: This is too much to ask of rating systems. A rating system is nothing more than a mapping of a sparse set of paired comparison results, in the case of Weiqi game results, over a set of numbers in an attempt to produce a preference order and a measure of how much one is preferred over the other. The same methods are used in quantitative psychology for paired comparison results of food tastes, or aromas (read: perfumes), etc.

See the seminal paper: Thurstone, L.L., The method of paired comparisons for social values. Journal of Abnormal and Social Psychology, 1927, 21, 384-400. And an excellent survey work: The Method of Paired Comparisons (2nd ed.). by Herbert A. David

Consider that people's playing strength changes over time. Should this be reflected in the rating system by discounting ratings for players that either submit results to the rating system infrequently or have not submitted a result in a comparatively long period of time. Now ask which rating systems have this feature, other than KGS and Glicko.

  • while with many mathematical formulas, it will hold quite well and predict the outcomes statistically correct for small rating differences (RD), can it say something valuable for RD equal to 900?
    • is the number of handicap stones well related to rating differences of 100?
    • should handicap games taken at all (directly) into the ELO tables (if those are used mainly for even game statistics)

... and so on.

Herman Hiddema: In response to Flower: The standard deviation in the EGF rating algorithm is not actually a constant. On the front page of the [ext] GoR page, see the winning probabilities in table II. Note that the table you give ([ext] this one) is not reliable due to the fact that most go tournaments use the McMahon system.


[1] Bill: My guess is that the deepest game up to now is the form of shogi (Japanese chess) played on a 25x25 board.

[2] tderz: "''("Es ist vernünftig anzunehmen, daß das menschliche Denken unvernünftig ist!" (László Mérö in: Die Logik der Unvernunft)) daar wordt op basis van ELO-achtige toestanden de diepte van een gebied in het getal van niveaus met een bepaalde overwinningspercentage van niveau n op niveau n+1 weergegeven (zie ook Jan van der Steen's ELO-info page and [ext] http://chesslinks.org/hof/elo.html , [ext] http://www.gtryfon.demon.co.uk/bcc/Java/gradingintro.htm , e.v.a.m.) . Jan noemt in zijn papier een overwinningspercentage van 69% van de een rank sterkere speler (delta= 100 Elo punten) op de zwakere. De chess-sites noemen een verschill van 200 Elo-punten ( [ext] http://home.clear.net.nz/pages/petanque/ratings/descript.htm ) per standard deviation ' of perfomances in a single game' maar dat vindt ik moelijk vergelijkbar vanwege bijna 50% remises, witvoordeel etc. De 'Percentage Expectancies from Rating Differences' wordt dan ook weer in ca. 65% voor 100 punten verschil aangegeven (dus vergelijkbaar met Go?, remises even buiten gelaten?). Nu neem je Kasparow met, zeg ELO 2800 en een Kisei of Meijin met (Go-)ELO 2900? punten en tel je de Go- en Schaak-ratings naar beneden af totdat je het moeilijk te definieren beginners-niveau bereikt. Het gebied met de hogere exponent n van overwinningspercentages 0,67(exp.)n is het diepere. Nu, het uitkomst is (voor mij) ongetwijfeld: Go, zelf als alleen maar 20kyu voor een beginner wordt genomen (n=30). Voor schaken neem je 800-1200 (n= 14-18) voor beginners, maar zelf zou jij 100 nemen, is n alleen maar = 26. Natuurlijk zitten in mijn betoog voor alle twee spelen zwakke punten: de standard deviation en overwinningspercentages zijn niet gelijk voor alle spelsterktes (maar dat geldt voor Go en schaken). De invloed van de betere mogelijkheden van sterkere schakers om remises te maken (Petrosjan's) heeft waarschijnlik grotere invloed.''"

[3] quoted from Dieter: "''Ik sluit me aan bij TDerz en wijs op twee grote denkfouten die impliciet in dit artikel voorkomen.

1. In Go zijn meer mogelijkheden, ergo het is moeilijker voor computers die enkel op rekenkracht teren ERGO Go is niet intrinsiek dieper." De denkfout ligt hem in het feit dat er geen enkele verklaring wordt gegeven voor het feit dat de mens wél in staat is om de brute computer te verslaan in Go. Mocht ook Go een spel zijn waar spelniveau afhangt van rekenkracht, dan zou de mens evenzeer gehandicapt zijn als de computer en de computer zou evenzeer het pleit winnen. De mens puurt duidelijk spelniveau uit andere kwaliteiten dan rekenkracht, wat pleit voor de stelling dat Go wél dieper is. Men zou dit zelfs als definitie van "diepte" kunnen nemen.

2. Goede schakers worden snel goede Go-spelers ergo de kwaliteiten waarop men een beroep doet in beide spelen zijn dezelfde." Dit gaat voorbij aan volgende mogelijkheid. Hypothese: voor schaak heb je kwaliteiten A en B nodig. Voor Go A, B, C, D. Wie excelleert in A en B heeft dus veel kans om in beide spelen uit te blinken.
Enkel indien goede go-spelers ook in dezelfde mate goede chaakspelers zijn, kan men over een wederkerig verband spreken tussen de kwaliteiten. Die laatste denkfout is niet minder dan een omkering van de logische pijl, wat toch wel mag verbazen van Tim Krabbé. (Dieter)''"


This is a copy of the living page "Elo Rating" at Sensei's Library.
(OC) 2007 the Authors, published under the OpenContent License V1.0.
[Welcome to Sensei's Library!]
StartingPoints
ReferenceSection
About