Elo Rating
The Elo Rating system is a rating system developed by Arpad Elo in the early 1960's for chess and since adopted for many other games including Go. The name is not an acronym and therefore should not be written in uppercase, as “ELO”.
Table of contents |
Background
History
The Elo rating was the first rating system to be based on probability theory. Originally, Elo developed it for the game of chess, and chess federations around the world adopted it quickly. It became popular and common for many other games too, including Go, Scrabble, table tennis, etc.
Winning Probabilities in Arpad Elo's Original Work
The rating indirectly represents the probability of winning against other rated players. This probability depends only on the difference between the two players' ratings as follows:
rating difference | probability of winning |
---|---|
400 | .919 |
300 | .853 |
200 | .758 |
100 | .637 |
50 | .569 |
0 | .500 |
-50 | .431 |
-100 | .363 |
-200 | .242 |
-300 | .147 |
-400 | .081 |
This represents the area under the standard bell-shaped curve where ``200 * sqrt(2)`` points are taken as one standard deviation. The table shows some sample points on this curve, adequate for good approximations of rating calculations by interpolation.
Note: These are the probabilities used in Elo's original implementation of this rating system for chess players; see the section on Go for its adaptation to Go.
Determining an appropriate initial rating
One method is: A new participant plays three initial games against opponents with already established ratings. These games, for example, account as:
- won game: new member's rating = opponent's rating + 200 points
- draw game: new member's rating = opponent's rating
- lost game: new member's rating = opponent's rating - 200 points
These initial game results are averaged and used for the new member's initial rating.
Example: A new member loses a game against a 1700-opponent, draws against a 1400-opponent and wins against a 1300-opponent. The result is an initial rating of 1467 = ( (1700-200) + 1400 + (1300+200) ) / 3.
Elo Ratings and titles in Chess
Game federations do not use identical (parameters for) rating systems. They attach different titles to a rating, and they have different rule sets to determine an initial rating for new participants.
Usually, an average amateur player's rating ranges between 1300 and 1700 Elo points.
U.S. Chess Federation's classes are:
Elo rating | class | members |
---|---|---|
2200 - 2800 | Master | 4 % |
2000 - 2200 | Expert | 8 % |
1800 - 2000 | Class A | 12 % |
1600 - 1800 | Class B | 18 % |
1400 - 1600 | Class C | 18 % |
1200 - 1400 | Class D | 20 % |
0 - 1200 | Class E | 20 % |
World Chess Federation's top ratings are:
Elo rating | title |
---|---|
2650 - 2800 | world champions |
2500 - 2650 | international grandmasters |
Elo (or modified Elo) in Go
- see further
The formula for the Elo rating system as employed by the EGF uses different parameters from those used by Arpad Elo for chess. It takes into account that in even games, 6d beats 5d more often than 29k beats 30k. See EGF Rating System for details.
Converting Elo Ratings into Go Ranks
Many Go servers use a modified Elo system internally, and represent it as Go ranks externally. Many of them also account for the fact that the traditional handicap system is not linear, as described at Rank And Handicap#Advantage for White. So a traditional 2 stone handicap is understood to actually only be worth 1.5 stones. Some servers such as DGS use finer grained handicaps, such as changing the komi slightly in addition to using handicap stones.
Internally, DGS uses:
Points Go rank ------ ------- 2300 3 dan 2200 2 dan 2100 1 dan
2000 1 kyu 1900 2 kyu 1800 3 kyu
1500 6 kyu 1000 11 kyu 500 16 kyu 0 21 kyu
-100 22 kyu -200 23 kyu -300 24 kyu
See also
- Rating Systems
- A two dimensional measure of chess performance
- A Short Introduction to Elo Ratings
- A Simple Explanation of Elo Ratings
- Wikipedia's Elo Rating description was so much based on Chess that I had to write something on Go.
- Rating Formula – Better than Elo? Sonas Rating Formula – Better than Elo? Discusses, criticises FIDE-Elo, attempts for improvement. Different K-factor (24 instead of 10 proposed; is = 32 in Go), Include faster time control games, which receive less weight than a classical game.
- Rating Theory Homepage
- The Working of the FIDE Rating System
- Whole History Rating - A modern rating system claiming to far exceed the accuracy of Elo and solving some glitches.
- FIDE Titles and EGF Go Ratings
Discussion
Elo Rating Discussion
Tim Brent: Originally 2000 in the Chess rating was a base point, based upon a 50% score at the US Open. The original idea was using Chess to find out if mental activity decreases with aging.
PurpleHaze: This is not correct. Elo attempted to make his rating system line up with the existing USCF system that assumed a bell curve centered at 1500.
Frs: What does the Elo rating system have to do with an age-dependent decrease of mental activity?
Tim Brent: He had a theory that you could use success in chess as a basis for showing the effects of aging on mental activity, i.e. a player who could play at a 2400 level in his forties is now playing in his fifties at a 2260 level. Could it be proof that his cognitive ability went down 6 percent over that period? (Of course this theory doesn't consider that the aforementioned player might simply have started losing against a group of stronger players.)
Depth of something ranked with Elo
The Elo rating depth also states something over the depth of the game. The total depth of a game is defined by two end points of the range of skills: the total beginner and the theoretical best play by an infallible, almighty creature.
Both are not easy to establish: Is someone already a beginner who just heard the rules, thereby setting the lowest standard or does it need several games until one has immersed the rules of a game and is able to play on its own? On the other end of the range on simply has to take the best player at a given time. The total beginner, yet playing on its own according to the simple rules can in Go safely be set at 30 kyu. Theoretical best play could result in the strength of an imaginable 13 dan according to measurements of standard deviations among professional games.
OmarSyed?: I have always thought that the lower end of the rating range should be based on random play. Thus a program which plays Go randomly should have a rating of exactly 0. Then other programs which have a simple evaluation function can be played against the random program and each other to establish their ratings. Then perhaps the best of these simple programs might be good enough to win 20% against beginners. The rating of this beginner level program can be fixed so that it becomes a source and a sink and serves as an anchor for the rating system. This has been done for the game Arimaa and the beginner program has a rating of 1000 while beginners start with a rating of 1300.
Only taking 20 kyu and 9 dan as endpoints makes Go until now the deepest game.^{[1]} A rating difference of 2900 Elo points from Gu Li to a 20 kyu with 100 Elo points is a difference in insight into the game by 29 times the standard deviation (100 Elo points).
Chess in comparison has a similar endpoint (Gari Kasparow with once 2851 points, s.a.), yet the standard deviation is set at 200 Elo points. More difficult to compare due to the draws, however it results in a depth of Chess of (only) 14 layers of standard deviation if the total beginner in Chess had a rating of zero Elo points (which s?he has not AFAIK).
PurpleHaze: Experience shows that an adult of average intelligence will attain a rating of about 800 on the day they learn the rules. Small children and the handicapped may have lower ratings. Historically the lowest ratings recorded are in the New York Public School Championship - Kindergarden Division where they have in fact reached 0.
tderz: FIDE ratings do not display players below Elo 1600. Confirming you, and I've read several times that a novice's, beginner's rating would be around 1000. That implies IMO that Chess is less deep than Go: 10 layers of depths vs. 29.
I would guess that this is due to Elo-type rating systems only measuring strength relative to each other instead of absolutely. They have problems to provide an accurate measure for underrepresented parts of the playerbase. FIDE and USCF ratings are in this case a bit unreliable as Chess beginner are commonly not yet members of a chess federation and thus not represented well. Considering the FICS where everyone is rated and the playerbase is a bit more diverse (yet still lacking in complete beginners) the weakest players are only rated at around 500 (and a true beginner would be most likely be weaker still). -- Flower, 2006-11-21
Paul Clarke: This came up a few years ago, when someone (Laszlo Mero?) analysed the 'depth' of a number of games and came up with 14 for chess (using the same argument as above) and 45 for Go (using 1 stone difference as one unit of depth, I think). I'm very sceptical that you can use these figures to show anything about the intellectual depth of the games; here's part of a Usenet article I posted at the time:
Consider a game that I've just invented: 'Gotac'. Gotac = 'Go, Toss a Coin': you play a game of go, following which the loser tosses a coin. If the coin comes down tails, he still loses, if it comes down heads the game is a draw. The best player in the world will score about 75% against the worst, so the game has only 1 level. However, anything you know about go strategy and tactics can be applied to the game and will improve your chances of winning, so it's at least as complex as go.
Comparing chess and go: if you play slightly better than your opponent in go, you will win by a small margin; in chess, you may well draw. In go, if you make a mistake against a weaker opponent, you will usually get chances to catch up; in chess, it may instantly lose you the game. Thus, there's a better chance of superior skill winning a game of go than a game of chess. This, I suspect, explains much or all of the difference in the number of levels.
tderz: I've read several of László Méro's books and also that one you mention (Kognition, Intuition und komplexes Denken, Rowohlt, 446 S., Euro 12,90, ISBN: 3499614197). I wrote an article about his depth calculation in reply to an article of Tim Krabbé, 214. 30 May 2003: Chess and Go or directly on http://www.xs4all.nl/~timkr/admag/go.htm. I cannot retrieve my own contribution at this moment, it might have been only by e-mail on the Dutch mailing list (I found it, it's in Dutch^{[2]} and Dieter^{[3]} also commented in the thread). Similar rec.games.fo. Concerning your examples (GoTac?):
- I think you would see the same Go rating just very, very much compressed into the winning range 50-75%. If you'd ran Elo on it, the same levels or layers of standard deviation would reappear (this is just my guess I'm too weak at math, but convinced that an overlay of some totally random event on a normal distribution => remains a normal distribution! The volatility would change.). I.e. the best player in the world would indeed win 75% (I did ot check your calculation, I'm too lazy or stupid) and 29 standard deviations (of the mean strength away you have the weakest players, only scoring 50%. The term 'level' must thus be properly defined. I am too, often confused where this is put: Chess (75%?), Go (67% and there are other figures). I think to remember that László Méro puts it at 75% too.
Flower: I am a bit baffled by the claim that the standard deviation of a players performance (in Go) would be 100 as opposed to 200 (in chess). In this table, If the SD were 100 I would expect a Losschance of about 32% if playing against a 1 stone stronger player. To my mind this table indicates that the SD of ones performance (in Go) might be near 200 as in chess instead of 100. (which would be quite significant as then the 'depth' of Go would be reduced to about 20 (30k to 9p --> 3900 Elo delta --> /200=19.5) as compared to about 13 in chess (2850-200=2650 ---> /200=13.25) -- Flower, 2006-11-21
tderz Flower, you are correct a rating difference of about 100 is meant to distribute the winning chances of 1/3 and 2/3 to the weaker, resp. stronger player. This is the basis of Elo in Go.
In this empirical table the actual proportion of wins = Nw/Ng (winning chances) are ranging from 30% (for 4 dans) to 46.6% (11kyus) to win against an opponent just one grade stronger.
From 17k to 2k these figures are above 40%. From 2d to 5d it's 34 to 28% (I leave the 6 dans out due to too few 7dan opponents and games). In the dan ranks this table supports strongly a winning chance of only 1/3 per rating difference of 100.
As H.Hiddema states below, many Go tournaments are played under the McMahon system, which might heavily distort the statistics by (pref.) pairing strong performing, weaker ranked with weaker performing, stronger ranked players.
This could explain the unexpected high winning rate in the kyu range. However, there always exists a McMahon bar which prevents the highest ranked player (e.g. the only 6d) simply by winning with his higher MM-point (of 1 over a 5d) even when their results were the same. Hence, in a big tournament with lots of 6d around, the bar would set to e.g. 4 dan (mutatis mutandis for weaker, smaller tournaments).
The undisturbed 2-6 dan figures strongly support 1/3 winning probability.
If the outcome of this indicates that Go is indeed a very deep game, many Go players will simply nod.
Most of the discussion on ranking, Elo and rating turns about the chicken & egg question:
- which comes first?
- should the ranking (kyu, dan) correlate with the rating differences?
- or should the rating be the basis of everything, hence also demotion (in kyu, dan) after some weak tournaments?
Velobici: This is too much to ask of rating systems. A rating system is nothing more than a mapping of a sparse set of paired comparison results, in the case of Weiqi game results, over a set of numbers in an attempt to produce a preference order and a measure of how much one is preferred over the other. The same methods are used in quantitative psychology for paired comparison results of food tastes, or aromas (read: perfumes), etc.
See the seminal paper: Thurstone, L.L., The method of paired comparisons for social values. Journal of Abnormal and Social Psychology, 1927, 21, 384-400. And an excellent survey work: The Method of Paired Comparisons (2nd ed.). by Herbert A. David
Consider that people's playing strength changes over time. Should this be reflected in the rating system by discounting ratings for players that either submit results to the rating system infrequently or have not submitted a result in a comparatively long period of time. Now ask which rating systems have this feature, other than KGS and Glicko.
- while with many mathematical formulas, it will hold quite well and predict the outcomes statistically correct for small rating differences (RD), can it say something valuable for RD equal to 900?
- is the number of handicap stones well related to rating differences of 100?
- should handicap games taken at all (directly) into the Elo tables (if those are used mainly for even game statistics)
... and so on.
Herman Hiddema: In response to Flower: The standard deviation in the EGF rating algorithm is not actually a constant. On the front page of the GoR page, see the winning probabilities in table II. Note that the table you give ( this one) is not reliable due to the fact that most go tournaments use the McMahon system.
[1] Bill: My guess is that the deepest game up to now is the form of shogi (Japanese chess) played on a 25x25 board.
[2] tderz: "''("Es ist vernünftig anzunehmen, daß das menschliche Denken unvernünftig ist!" (László Mérö in: Die Logik der Unvernunft))
daar wordt op basis van Elo-achtige toestanden de diepte van een gebied in het getal van niveaus met een bepaalde overwinningspercentage van
niveau n op niveau n+1 weergegeven (zie ook Jan van der Steen's Elo-info page and http://chesslinks.org/hof/elo.html , http://www.gtryfon.demon.co.uk/bcc/Java/gradingintro.htm , e.v.a.m.) .
Jan noemt in zijn papier een overwinningspercentage van 69% van de een rank sterkere speler (delta= 100 Elo punten) op de zwakere.
De chess-sites noemen een verschill van 200 Elo-punten ( http://home.clear.net.nz/pages/petanque/ratings/descript.htm ) per standard deviation ' of perfomances in a single game' maar dat vindt ik moelijk vergelijkbar vanwege bijna 50% remises, witvoordeel etc.
De 'Percentage Expectancies from Rating Differences' wordt dan ook weer in ca. 65% voor 100 punten verschil aangegeven (dus vergelijkbaar met Go?, remises even buiten gelaten?).
Nu neem je Kasparow met, zeg Elo 2800 en een Kisei of Meijin met (Go-)Elo 2900? punten en tel je de Go- en Schaak-ratings naar beneden af totdat je het moeilijk te definieren beginners-niveau bereikt. Het gebied met de hogere exponent n van overwinningspercentages 0,67(exp.)n is het diepere.
Nu, het uitkomst is (voor mij) ongetwijfeld: Go, zelf als alleen maar 20kyu voor een beginner wordt genomen (n=30).
Voor schaken neem je 800-1200 (n= 14-18) voor beginners, maar zelf zou jij 100 nemen, is n alleen maar = 26.
Natuurlijk zitten in mijn betoog voor alle twee spelen zwakke punten: de standard deviation en overwinningspercentages zijn niet gelijk voor alle spelsterktes (maar dat geldt voor Go en schaken). De invloed van de betere mogelijkheden van sterkere schakers om remises te maken (Petrosjan's) heeft waarschijnlik grotere invloed.''"
- Translation (by PJT): (“It is rational to assume that human thought is irrational” — László Mérö in: Die Logik der Unvernunft / The Logic of Irrationality)
- On the basis of a sort of Elo business, the depth of an area is being represented by the number of levels with a given win rate. (see also Jan van der Steen's Elo-info page and http://chesslinks.org/hof/elo.html , http://www.gtryfon.demon.co.uk/bcc/Java/gradingintro.htm , etc. etc.).
- Chess-sites give a difference of 200 Elo-points ( http://home.clear.net.nz/pages/petanque/ratings/descript.htm ) per standard deviation ' of performances in a single game' but I find that hard to compare, given 50% draws, the advantage for White, etc.
- The 'Percentage Expectancies from Rating Differences' is also give by the win-rate of ca. 65% for 100 points difference, (thus comparable to Go? if you ignore draws?). Now take Kasparov with an Elo ~2800 and a Kisei or Meijin with (Go-)Elo ~2900 and count off the ratings down to the difficult to define level of a beginner. The area with the higher factor ``0.67^n`` is the deeper. Well, the result is (for me) undoubtedly Go, even if you take a beginner to be as strong as 20 kyu (``n=30``). For chess, a beginner rates 800-1000 (``n=14 "…" 18``), but even if you take 100, you only get ``n = 26``.
- Of course there are two weaknesses in my argument: the standard deviation en win-rate are not the same at all levels (which is true in both games). The greater ability of stronger chess players to reach a draw is probably more significant.
[3] quoted from Dieter: "''Ik sluit me aan bij TDerz en wijs op twee grote denkfouten die impliciet in dit artikel voorkomen.
1. In Go zijn meer mogelijkheden, ergo het is moeilijker voor computers die enkel op rekenkracht teren ERGO Go is niet intrinsiek dieper." De denkfout ligt hem in het feit dat er geen enkele verklaring wordt gegeven voor het feit dat de mens wél in staat is om de brute computer te verslaan in Go. Mocht ook Go een spel zijn waar spelniveau afhangt van rekenkracht, dan zou de mens evenzeer gehandicapt zijn als de computer en de computer zou evenzeer het pleit winnen. De mens puurt duidelijk spelniveau uit andere kwaliteiten dan rekenkracht, wat pleit voor de stelling dat Go wél dieper is. Men zou dit zelfs als definitie van "diepte" kunnen nemen.
2. Goede schakers worden snel goede Go-spelers ergo de kwaliteiten waarop men een beroep doet in beide spelen zijn dezelfde."
Dit gaat voorbij aan volgende mogelijkheid. Hypothese: voor schaak heb je kwaliteiten A en B nodig. Voor Go A, B, C, D. Wie excelleert in A en B heeft dus veel kans om in beide spelen uit te blinken.
Enkel indien goede go-spelers ook in dezelfde mate goede chaakspelers zijn, kan men over een wederkerig verband spreken tussen de kwaliteiten. Die laatste denkfout is niet minder dan een omkering van de logische pijl, wat toch wel mag verbazen van Tim Krabbé. (Dieter)''"
- Translation (by PJT): “I agree with TDerz and point out two serious fallacies implicit in this article.”
- 1. “Go offers more possibilities, hence it is harder for computers relying only on computational power, hence Go is not intrinsically deeper.”
- The fallacy lies in the fact that no explanation is given for the fact that a human can beat a computer at Go. If one’s strength at Go depended on calculating ability, then humans would suffer the same handicaps as computers, and computers would still win. Evidently human strength depends on other qualities than calculating ability, which supports the claim that Go is deeper. You could actually take this as a definition of “depth”.
- 2 “Good chess players pick Go up fast, hence both game demand the same abilities.”
- This ignores the possibility that Go requires more abilities than chess, but that those needed for chess get you off to a good start in Go.
- Only if good Go players are good at chess to the same extent can one conclude a reciprocal relationship between the abilities. This latter fallacy is no less than concluding the converse of an implication, surprising in Tim Krabbé.