For a long time now, the European Go Federation (EGF) has been using the Go Rating system (GoR). As a quick summary, this was developed in the Czech Republic, designed to allow handicap games to be taken into account, and tested on the GoGod database. Given that GoGod, in terms of strength, held a fairly static population, this was an interesting choice. How well is GoR working, what are its strengths, and what are its weaknesses?
Most rating systems are either inflationary, or deflationary. One of the first questions about any particular system to spring up is often asking about this characteristic. There is a common viewpoint that GoR is deflationary. 3 times in its history, we have had the anti inflation mechanism adjusted upwards. (Well, that's when including its creation.) We will return to this question later. GoR also has a built in mechanism to stop very large losses of rating points. First, let us quickly note that it is impossible to lose more than 100 points in a tournament, but it is possible to gain more than 100. This has the side effect of being an inflationary mechanism.
Compared to Chess, Go has a very large range of playing strengths, and it is possible, especially at a weaker level, to have a significant gain of strength over a short period of time. For example, it should be obvious that a player whose strength is initially recorded as 100, but who improves to 800 before his next tournament, will deflate the rating system if entered at the original rating. This is why the rank reset mechanism exists in GoR. In many countries, the implementation of this mechanism in tournament practice leaves something to be desired.
There is an artificial floor in GoR of 100 rating points, which corresponds to the rank of 20kyu. Not all countries implement this minimum rank, as they have tournament players weaker than this level. This causes a considerable fog in the lower echelons of the system. Anyone entering a tournament below a rank of 20kyu, will have their entry rank altered to be 20kyu. It is not unknown for somebody professing a rank of 30kyu, to have a rating of 18kyu. The reasons given for not amending the system to correct this are rather suspect in nature.
As GoR follows an Elo based system, it should follow a Bell shaped curve. This is broadly true, although the active rating distribution can be seen flickr graph to have an artifact created by this artificial floor. Over time, the shape of the curve has changed. That should not come as a surprise. The FIDE rating distribution is slowly expanding, at a rate related to its increasing population. The GoR distribution is, slowly, flattening flickr graph.
GoR is a relatively young rating system. It was created in 1996. Initial ranks were assigned by the various and diverse systems present in individual countries. It seems logical to admit that there should thus have been an inherent instability in the system when it was first created. For example, whilst a Czech, or French 1dan may have a narrow range of strength, in some other countries, this could mean a wide range of strengths. Ranks were not consistent, and we might add, we have good reason to believe that they are still not consistent.
We can say that there should be three common types of players.
It should not be necessary to discuss the expected changes in their ratings. Can we identify some of these player types from the EGD? I've found this quite difficult for type 3.
(+ denotes wins against, - denotes loses against, = denotes draws with)
1000 strength player, 1000+ 1000-, new rating 1001.12
2000 strength player, 2000+ 2000-, new rating 2000.432
This illustrates the inflation parameter's effect, which is more noticeable for weaker players.
Deflationary effect of a player, initially ranked 8k (rating 1205) , not promoted to 6k (1400), with imagined strength of 1413
Entry rating 1205, 1200+ 1300+ 1400+ 1500-, new rating 1316.147, opponents' lost points -99.079
Entry rating 1400, 1200+ 1300+ 1400+ 1500-, new rating 1437.685, opponents' lost points -38.765
Entry rating 1400, 1400+ 1500- 1400+ 1500-, new rating 1419.702, opponents' lost points -19.084
Net Loss Condition
Generally a higher rated player, beating a lower rated player, causes a net loss of GoR
2000 player beats 1950, exit ratings 2010.632 , 1939.045 , net loss 0.323
Net Gain Condition
Obviously this is the reverse of the above.
1950 player beats 2000, exit ratings 1968.045, 1983.632, net gain 1.677
This considers the case of a player, his number is denoted with a star, who has an out of date 1000ish rating. We see what happens when he beats 5 players in his first tournament in 3 years. The tournament is A class. Numbers are rounded up. There is a kind of amusing step between 1350 and 1500 for Case 1. Case 1 is where the player has his results submitted as normal. Case 2 is were the organiser (or EGF member) reset the rank before submission. Again this flogs the dead horse to point out the big internal difference in rating systems in the EGF.
I think there is certainly evidence to say that Gor can be a deflationary system. For example, add in a lot of new players (Hikaru No Go) and don't let them reset their ranks. Force them to rank up through reaching the official rating level. This will unquestionably produce a deflationary effect.
Beyond anecdotal evidence, what can we say here...
tapir: I would start at the other end. The available rating points 1) increase w/ each game by the famous factor epsilon in the rating formula 2) increase w/ rating resets 3) increase when a player receives an initial rank 4) decrease when a player leaves (death or stops playing) taking his rating point with him - 3 and 4 combined make for a net loss of rating points over time. 1 is the main source of new rating points. (Rating points at different levels aren't equal - the scale is compressed at higher ranks to fit with the arbitrary 100 points / 1 rank - so there is no net loss of "uncompressed rating points" when a game is played.)
I am not sure there is evidence to backup any claim EGD is deflationary or inflationary - what it surely is, however, is highly localized. A surge of new players (with low entry level in tournament play) will put deflationary pressure on local ratings (spreading available points over less players - say France 2007-2010), a numerically stagnating but actively playing population (little losses in rating points outweighed by the increase per game - say Netherlands) will produce inflation in comparison to other countries. International exchange of rating points might be too slow to counteract local effects that still continue. I believe the international results of the Netherlands dropping from net positive to net negative in the period and French results turning from a small net positive result to the extremes of 2007-2011 (with 60% and more winning percentage) illustrate this rather well.
:: Valid points. An attempt can be made to see how the inflationary parts balance the deflationary parts. This is not an easy task. Localisation is an annoying problem too :) As this article progresses I hope to touch on those.
Recently I thought a little more about how one could start to discover the properties of the GoR world. Creating a series of simulations appears to be the natural way forward.
The plan is to create a small imaginary GoR world, which has a population whose rating is arranged in the familiar Gaussian-like manner we'd expect, and whose location is arranged according to central place theorum. The population will play in a series of tournaments, and we will look to see how their strength diverges from their rating.
The first step is to look at a static model. What happens when we imagine that strength does not change over the course of time? Then we can introduce 3 basic types of players: improving, stable, declining. What now happens? Finally, we can implement something akin to rating-resets, to assess how this effects the population.
There are other variables to consider, such as the frequency of tournaments, and the activity of the playing population. The introduction of new players, and the exit of established players should also be something that is of importance. Geoff Kaniuk's research suggests that the pairing algorithm used in tournaments has an effect on the rating distribution, so I believe that we should stick to 1 type of tournament as a result.