# Elo Rating

Path: <= Rank =>
Keywords: Tournament

In the early 1960's, Arpad Elo developed the Elo rating system. Despite this, many can't help but put the name in all caps (ELO), as if it were an acronym of some sort.

It was the first rating system that had probabilistic underpinnings. Originally, Elo developed it for the game of chess, and chess federations around the world adopted it quickly. It became popular and common for many other games too, including Go, Scrabble, table tennis, etc.

### Elo Ratings and titles in Chess

Game federations do not use identical (parameters for) rating systems. They attach different titles to a rating, and they have different rule sets to determine an initial rating for new participants.

Usually, an average amateur player's rating ranges between 1300 and 1700 Elo points.

U.S. Chess Federation's classes are:

Elo rating class members
2200 - 2800 Master 4 %
2000 - 2200 Expert 8 %
1800 - 2000 Class A 12 %
1600 - 1800 Class B 18 %
1400 - 1600 Class C 18 %
1200 - 1400 Class D 20 %
0 - 1200 Class E 20 %

World Chess Federation's top ratings are:

Elo rating title
2650 - 2800 world champions
2500 - 2650 international grandmasters

### Winning Probabilities in Arpad Elo's Original Work

The rating indirectly represents the probability of winning against other rated players. This probability depends only on the difference between the two players' ratings as follows:

rating
difference
probability
of winning
400 .919
300 .853
200 .758
100 .637
50 .569
0 .500
-50 .431
-100 .363
-200 .242
-300 .147
-400 .081

This represents the area under the standard bell-shaped curve where 200 * sqrt(2) points are taken as one standard deviation. The table shows some sample points on this curve, adequate for good approximations of rating calculations by interpolation.

Note: These are the probabilities used in Elo's original implementation of this rating system for chess players. The Elo rating system as employed by the EGF uses different parameters for the formula. It also accounts for the idea that in even games, 6d beats 5d more often than 29k beats 30k. see EGF Rating System for details.

### Determing an appropriate initial rating

One method is: A new participant plays three initial games against opponents with already established ratings. These games, for example, account as:

• won game: new member's rating = opponent's rating + 200 points
• draw game: new member's rating = opponent's rating
• lost game: new member's rating = opponent's rating - 200 points

These initial game results are averaged and used for the new member's initial rating.

Example: A new member loses a game against a 1700-opponent, draws against a 1400-opponent and wins against a 1300-opponent. The result is an initial rating of 1467 = ( (1700-200) + 1400 + (1300+200) ) / 3.

### Converting Elo Ratings into Go Ranks

Many Go servers use a modified Elo system internally, and represent it as Go ranks externally. Many of them also account for the fact that the traditional handicap system is not linear, as described at Rank And Handicap#Advantage for White. So a traditional 2 stone handicap is understood to actually only be worth 1.5 stones. Some servers such as DGS use finer grained handicaps, such as changing the komi slightly in addition to using handicap stones.

Internally, DGS uses:

```  Points   Go rank
------   -------
2300      3 dan
2200      2 dan
2100      1 dan
```
```   2000      1 kyu
1900      2 kyu
1800      3 kyu
```
```   1500      6 kyu
1000     11 kyu
500     16 kyu
0     21 kyu
```
```   -100     22 kyu
-200     23 kyu
-300     24 kyu
```

### Elo Rating Discussion

Tim Brent: Originally 2000 in the Chess rating was a base point, based upon a 50% score at the US Open. The original idea was using Chess to find out if mental activity decreases with aging.

PurpleHaze: This is not correct. Elo attempted to make his rating system line up with the existing USCF system that assumed a bell curve centered at 1500.

Frs: What does the Elo rating system have to do with an age-dependent decrease of mental activity?

Tim Brent: He had a theory that you could use success in chess as a basis for showing the effects of aging on mental activity, i.e. a player who could play at a 2400 level in his forties is now playing in his fifties at a 2260 level. Could it be proof that his cognitive ability went down 6 percent over that period? (Of course this theory doesn't consider that the aforementioned player might simply have started losing against a group of stronger players.)

### Depth of something ranked with ELO

The ELO rating depth also states something over the depth of the game. The total depth of a game is defined by two end points of the range of skills: the total beginner and the theorethical best play by an unfallible, allmighty creature.

Both are not easy to establish: Is someone already a beginner who just heard the rules, thereby setting the lowest standard or does it need several games until one has immersed the rules of a game and is able to play on its own? On the other end of the range on simply has to take the best player at a given time. The total beginner, yet playing on its own according to the simple rules can in Go safely be set at 30 kyu. Theoretical best play could result in the strength of an imaginable 13 dan according to measurements of standard deviations among professional games.

OmarSyed?: I have always thought that the lower end of the rating range should be based on random play. Thus a program which plays Go randomly should have a rating of exactly 0. Then other programs which have a simple evaluation function can be played against the random program and each other to establish their ratings. Then perhaps the best of these simple programs might be good enough to win 20% against beginners. The rating of this beginner level program can be fixed so that it becomes a source and a sink and serves as an anchor for the rating system. This has been done for the game Arimaa and the beginner program has a rating of 1000 while beginners start with a rating of 1300.

Only taking 20 kyu and 9 dan as endpoints makes Go until now the deepest game.[1] A rating difference of 2900 ELO points from Gu Li to a 20 kyu with 100 ELO points is a difference in insight into the game by 29 times the standard deviation (100 ELO points).

Chess in comparision has a similar endpoint (Gari Kasparow with once 2851 points, s.a.), yet the standard deviation is set at 200 ELO points. More difficult to compare due to the draws, however it results in a depth of Chess of (only) 14 layers of standard deviation if the total beginner in Chess had a rating of zero ELO points (which s?he has not AFAIK).

PurpleHaze: Experience shows that an adult of average intelligence will attain a rating of about 800 on the day they learn the rules. Small children and the handicapped may have lower ratings. Historically the lowest ratings recorded are in the New York Public School Championship - Kindergarden Division where they have in fact reached 0.

tderz: FIDE ratings do not display players below ELO 1600. Confirming you, and I've read several times that a novice's, beginner's rating would be around 1000. That implies IMO that Chess is less deep than Go: 10 layers of depths vs. 29.

I would guess that this is due to Elo-type rating systems only measuring strength relative to each other instead of absolutely. They have problems to provide an accurate measure for underrepresented parts of the playerbase. FIDE and USCF ratings are in this case a bit unreliable as Chess beginner are commonly not yet memember of an chess federation and thus not represented well. Considering the FICS where everyone is rated and the playerbase is a bit more diverse (yet still lacking in complete beginners) the weakest players are only rated at around 500 (and a true beginner would be most likely be weaker still). -- Flower, 2006-11-21

Paul Clarke: This came up a few years ago, when someone (Laszlo Mero?) analysed the 'depth' of a number of games and came up with 14 for chess (using the same argument as above) and 45 for Go (using 1 stone difference as one unit of depth, I think). I'm very sceptical that you can use these figures to show anything about the intellectual depth of the games; here's part of a Usenet article I posted at the time:

Consider a game that I've just invented: 'Gotac'. Gotac = 'Go, Toss a Coin': you play a game of go, following which the loser tosses a coin. If the coin comes down tails, he still loses, if it comes down heads the game is a draw. The best player in the world will score about 75% against the worst, so the game has only 1 level. However, anything you know about go strategy and tactics can be applied to the game and will improve your chances of winning, so it's at least as complex as go.

Comparing chess and go: if you play slightly better than your opponent in go, you will win by a small margin; in chess, you may well draw. In go, if you make a mistake against a weaker opponent, you will usually get chances to catch up; in chess, it may instantly lose you the game. Thus, there's a better chance of superior skill winning a game of go than a game of chess. This, I suspect, explains much or all of the difference in the number of levels.

tderz: I've read several of László Méro's books and also that one you mention (Kognition, Intuition und komplexes Denken, Rowohlt, 446 S., Euro 12,90, ISBN: 3499614197). I wrote an article about his depth calculation in reply to an article of Tim Krabbé, 214. 30 May 2003: Chess and Go or directly on http://www.xs4all.nl/~timkr/admag/go.htm. I cannot retrieve my own contribution at this moment, it might have been only by e-mail on the Dutch mailing list (I found it, it's in Dutch[2] and Dieter[3] also commented in the thread). Similar rec.games.fo. Concerning your examples (GoTac?):

• I think you would see the same Go rating just very, very much compressed into the winning range 50-75%. If you'd ran ELO on it, the same levels or layers of standard deviation would reappear (this is just my guess I'm too weak at math, but convinced that an overlay of some totally random event on a normal distribution => remains a normal distribution! The volatility would change.). I.e. the best player in the world would indeed win 75% (I did ot check your calculation, I'm too lazy or stupid) and 29 standard deviations (of the mean strength away you have the weakest players, only scoring 50%. The term 'level' must thus be properly defined. I am too, often confused where this is put: Chess (75%?), Go (67% and there are other figures). I think to remember that László Méro puts it at 75% too.

Flower: I am a bit baffled by the claim that the standard deviation of a players performance (in Go) would be 100 as opposed to 200 (in chess). In this table, If the SD were 100 I would expect a Losschance of about 32% if playing against a 1 stone stronger player. To my mind this table indicates that the SD of ones performance (in Go) might be near 200 as in chess instead of 100. (which would be quite significant as then the 'depth' of Go would be reduced to about 20 (30k to 9p --> 3900 Elo delta --> /200=19.5) as compared to about 13 in chess (2850-200=2650 ---> /200=13.25) -- Flower, 2006-11-21

tderz Flower, you are correct a rating difference of about 100 is meant to distribute the winning chances of 1/3 and 2/3 to the weaker, resp. stronger player. This is the basis of ELO in Go.

In this empirical table the actual proportion of wins = Nw/Ng (winning chances) are ranging from 30% (for 4 dans) to 46.6% (11kyus) to win against an opponent just one grade stronger.

From 17k to 2k these figures are above 40%. From 2d to 5d it's 34 to 28% (I leave the 6 dans out due to too few 7dan opponents and games). In the dan ranks this table supports strongly a winning chance of only 1/3 per rating difference of 100.

As H.Hiddema states below, many Go tournaments are played under the McMahon system, which might heavily distort the statistics by (pref.) pairing strong performing, weaker ranked with weaker performing, stronger ranked players.
This could explain the unexpected high winning rate in the kyu range. However, there always exists a McMahon bar which prevents the highest ranked player (e.g. the only 6d) simply by winning with his higher MM-point (of 1 over a 5d) even when their results were the same. Hence, in a big tournament with lots of 6d around, the bar would set to e.g. 4 dan (mutatis mutandis for weaker, smaller tournaments).
The undisturbed 2-6 dan figures strongly support 1/3 winning probability.

If the outcome of this indicates that Go is indeed a very deep game, many Go players will simply nod.
Most of the discussion on ranking, ELO and rating turns about the chicken & egg question:

• which comes first?
• should the ranking (kyu, dan) correlate with the rating differences?
• or should the rating be the basis of everything, hence also demotion (in kyu, dan) after some waek tournaments?

Velobici: This is too much to ask of rating systems. A rating system is nothing more than a mapping of a sparse set of paired comparison results, in the case of Weiqi game results, over a set of numbers in an attempt to produce a preference order and a measure of how much one is preferred over the other. The same methods are used in quantitative psychology for paired comparison results of food tastes, or aromas (read: perfumes), etc.

See the seminal paper: Thurstone, L.L., The method of paired comparisons for social values. Journal of Abnormal and Social Psychology, 1927, 21, 384-400. And an excellent survey work: The Method of Paired Comparisons (2nd ed.). by Herbert A. David

Consider that people's playing strength changes over time. Should this be reflected in the rating system by discounting ratings for players that either submit results to the rating system infrequently or have not submitted a result in a comparatively long period of time. Now ask which rating systems have this feature, other than KGS and Glicko.

• while with many mathematical formulas, it will hold quite well and predict the outcomes statistically correct for small rating differences (RD), can it say something valuable for RD equal to 900?
• is the number of handicap stones well related to rating differences of 100?
• should handicap games taken at all (directly) into the ELO tables (if those are used mainly for even game statistics)

... and so on.

Herman Hiddema: In response to Flower: The standard deviation in the EGF rating algorithm is not actually a constant. On the front page of the GoR page, see the winning probabilities in table II. Note that the table you give ( this one) is not reliable due to the fact that most go tournaments use the McMahon system.

[1] Bill: My guess is that the deepest game up to now is the form of shogi (Japanese chess) played on a 25x25 board.

[2] tderz: "''("Es ist vernünftig anzunehmen, daß das menschliche Denken unvernünftig ist!" (László Mérö in: Die Logik der Unvernunft)) daar wordt op basis van ELO-achtige toestanden de diepte van een gebied in het getal van niveaus met een bepaalde overwinningspercentage van niveau n op niveau n+1 weergegeven (zie ook Jan van der Steen's ELO-info page and http://chesslinks.org/hof/elo.html , http://www.gtryfon.demon.co.uk/bcc/Java/gradingintro.htm , e.v.a.m.) . Jan noemt in zijn papier een overwinningspercentage van 69% van de een rank sterkere speler (delta= 100 Elo punten) op de zwakere. De chess-sites noemen een verschill van 200 Elo-punten ( http://home.clear.net.nz/pages/petanque/ratings/descript.htm ) per standard deviation ' of perfomances in a single game' maar dat vindt ik moelijk vergelijkbar vanwege bijna 50% remises, witvoordeel etc. De 'Percentage Expectancies from Rating Differences' wordt dan ook weer in ca. 65% voor 100 punten verschil aangegeven (dus vergelijkbaar met Go?, remises even buiten gelaten?). Nu neem je Kasparow met, zeg ELO 2800 en een Kisei of Meijin met (Go-)ELO 2900? punten en tel je de Go- en Schaak-ratings naar beneden af totdat je het moeilijk te definieren beginners-niveau bereikt. Het gebied met de hogere exponent n van overwinningspercentages 0,67(exp.)n is het diepere. Nu, het uitkomst is (voor mij) ongetwijfeld: Go, zelf als alleen maar 20kyu voor een beginner wordt genomen (n=30). Voor schaken neem je 800-1200 (n= 14-18) voor beginners, maar zelf zou jij 100 nemen, is n alleen maar = 26. Natuurlijk zitten in mijn betoog voor alle twee spelen zwakke punten: de standard deviation en overwinningspercentages zijn niet gelijk voor alle spelsterktes (maar dat geldt voor Go en schaken). De invloed van de betere mogelijkheden van sterkere schakers om remises te maken (Petrosjan's) heeft waarschijnlik grotere invloed.''"

[3] quoted from Dieter: "''Ik sluit me aan bij TDerz en wijs op twee grote denkfouten die impliciet in dit artikel voorkomen.

1. In Go zijn meer mogelijkheden, ergo het is moeilijker voor computers die enkel op rekenkracht teren ERGO Go is niet intrinsiek dieper." De denkfout ligt hem in het feit dat er geen enkele verklaring wordt gegeven voor het feit dat de mens wél in staat is om de brute computer te verslaan in Go. Mocht ook Go een spel zijn waar spelniveau afhangt van rekenkracht, dan zou de mens evenzeer gehandicapt zijn als de computer en de computer zou evenzeer het pleit winnen. De mens puurt duidelijk spelniveau uit andere kwaliteiten dan rekenkracht, wat pleit voor de stelling dat Go wél dieper is. Men zou dit zelfs als definitie van "diepte" kunnen nemen.

2. Goede schakers worden snel goede Go-spelers ergo de kwaliteiten waarop men een beroep doet in beide spelen zijn dezelfde." Dit gaat voorbij aan volgende mogelijkheid. Hypothese: voor schaak heb je kwaliteiten A en B nodig. Voor Go A, B, C, D. Wie excelleert in A en B heeft dus veel kans om in beide spelen uit te blinken.
Enkel indien goede go-spelers ook in dezelfde mate goede chaakspelers zijn, kan men over een wederkerig verband spreken tussen de kwaliteiten. Die laatste denkfout is niet minder dan een omkering van de logische pijl, wat toch wel mag verbazen van Tim Krabbé. (Dieter)''"

Path: <= Rank =>
Elo Rating last edited by Liso on October 20, 2014 - 07:57