Holigor's Rating Of Go Players
I moved the rating to a different host recently. Due to the misfortunes of the talkcity site I did not update it for some time. There are gaps, yet at the moment the recent international tourmaments (Toyota Denso cup and Samsung cup are included).
As for the middle July the most remarkable thing is that Cho U moved to the fifth position in the rating. This is partly because of Mok's poor recent result. Though Cho U has only a minor NHK title, his overall record is quite good. His high position might be an indication of something. Last year at the same time Kato Masao was the highest of the Japanese. Now Kato is about 20th because of the disastrous Meijin league record which devalued to some extent his Honinbo exploits.
The data on the matches are taken from Go news site at http://www.kyoto.zaq.ne.jp/momoyama/news/news.html
If you click on the names of the players you can see their record.
The algorithm is described and explained at the site.
The rating depends on the input information, certainly, and would change if the other tournaments were added.
The table has some additional information. You can see how well the players perform with Black and White, for example.
The following is the description of the initial standing (August 2001).
Yi Ch'ang-ho is the leader, of course. Look at his record. It is impressive.
Two next places are surprising. The information on these two youngsters (Gu Li and Pak Yeong-hun) is not complete though. Perhaps they do not do so well in other tournaments that I don't know about. Let's consider them new hopefuls and watch their progress.
Other remarkable features are high position of Cho Chikun despite the loss of major titles, not so good record of Ma Xiaochun and quite low position of Rui Naiwei. She has a great win/loss ratio but most wins were against other ladies and her record with men is not so great this year.
Don't pay much attention to the lower part of the table. The system cannot reliably rate players with few games.
Anonymous: I looked at your rating system, and I noticed a potential flaw. Isn't it possible that winning against a weak opponent can hurt your rating? Similarly, can't losing against a strong opponent can benefit your rating?
HolIgor: It is not the fault. It should be so. Let us imagine a situation when there are 2 players with the equal number of wins. Naturally, the player who played against stronger opponents has to have higher rating. There should not be any doubt about it.
Another: But at the same time, it certainly should not be so: if two players are equally ranked, we don't anticipate thinking of one of them as getting weaker than the other if he/she then precedes to win several games, while the other does nothing. Every win generally comes about by outplaying your opponent, and thus should probably be (always) a confirmation of strength, and an improver of rating.
And I have to repeat that this system is not ELO. The players do not have current ratings that change itteratively with each win or loss. The rating is recalculated each time with all data in the database. This week rating is not a correction to the last week's rating. One should not use the term "increase" or "decrease" of the rating because the absolute value of the rating is meaningless in terms of strength of the players. It reflects the depth of the distribution only. Yet the difference of ratings, positions of players in the table is what we are looking for. During the last year the rating of the leader (Yi Ch'ang-ho) oscillated in a wide range (about 1000 points) while at the same time the number 2 was always about 200-300 points behind.
Returning to the question of weak players I would like to say that initially I thought that all ratings would fit into the range of 1000 points. That is the limit when the system is closed and each one plays all others. But that did not happen practically because the players at the bottom of the table cannot even hope to play Yi Ch'ang-ho or Yi Se-tol in an official competition. They cannot hope to play even the players that usually lose to Yi Ch'ang-ho. They usually play the players that lose to the players that lose to the top. So, practically the rating table has the depth of approximately 5000 points. This value varies as the connectivity of the players in the table improves or deteriorates.
If the distribution were in the range of 1000 points the effects of the significant increase of the position of a weaker player after a loss to somebody good would not occur. As it is, yes, one can move up significantly after a loss to a very strong player just because the strong player agreed to play a game against a weaker one. But that happens at the bottom where the positions are not reliable. At the top the players have a lot of games and the effect of the drop of rating from an easy win is just what it has to be: your average opponents were weaker - you go a little bit down, but just a little bit.
Andre Engels: I don't agree that that is "what it has to be". I think it is ridiculous to make the assumption that the fact that someone has won a game would make you think he is weaker than you thought without knowledge of the game.
Charles Matthews It would be a paradox of inference, certainly. I have only heard about theory of ratings, not read anything intelligible. I believe 'ideal' inference is replaced by something simpler to compute, in general. Can anyone explain further?
Anonymous: The question I was trying to raise is "What makes a good rating system?" Holigor's system has the flaw "If a top player decides to play a extra game against a much weaker player, his rating is unfairly decreased." Holigor correctly points out that the top players tend to play mostly among each other, diluting this effect. As long as the players do not take Holigor's ratings seriously, this will be accurate. However, there is the potential for a player to manipulate his ranking by only entering events with very strong opposition.
Any rating system has flaws. If the ratings are used to give out something valuable, then people will try to exploit the rating system. Consider the following rating system. "A player's rating equals the total number of games won." Of course, there are lots of flaws with this system. Surprisingly, it is the system used in Contract Bridge, especially in the ACBL. (The "masterpoint" system isn't exactly like this, but for all practical purposes it is.) The advantage of such a system is that it does reward people for something you want to encourage, namely playing more.
SAS: Perhaps Holigor could consider using the NNGS rating code. The NNGS system is similar to Holigor's system in the sense that it doesn't require assigning initial ratings. But it has the advantage of being based on probability theory.
HolIgor: I've seen the description of their system. It is tackles the problem far more complex than what I wanted to do. To make a rating system for a go server you have to solve the problem of scaling the rating to the proper handicap. Therefore, it is a little bit messy, involves seeds and assigns the arbitrary probability of a win to the difference of rank. The first version of may rating did not involve anything arbitrary, though it did not solve the problem of relating the handicap to the rank difference. Later on I introduced exponential function to decrease the effect that was mentioned in this discussion. It has an arbitrary coefficient, though there are limits to it. The aging of the results and the numeber of to ripen the rating are arbitrary as well.
Anonymous: Adjusting for handicap isn't as hard as it sounds. You would just need a large sample of games and do a statistical analysis. Most rating systems make an assumption "If player A gives 3 stones to B, and if B gives 2 stones to C, then A should give 5 stones to C." This assumption is not exactly true in practice, but with a large sample of games you could properly account for it.
Noname: This rating system seems quite great. Although there are some possible ways for a player to manipulate this system, this isn't the official rating of the non-existing World Go Federation. :)
Anon: It seems the ranking page has not been updated for over 2 months. I for one hope Holigor will be able to update it soon...
HolIgor: I had some problems with the ftp to the website. You reminded me, I checked and it seems working now.
PurpleHaze: How does your system compare with that of Jeff Sonas? (see: http://www.chessmetrics.com/ and http://www.chessbase.com/newsdetail.asp?newsid=562)
HolIgor: It does not. And it does not intend to. My system just makes an ordered list of players according to the index that is meaningless out of the context of the system.
If ones wants to make a rating system of its own one should not be concerened about the starting ratings of the players. That's what in my opinion is difficult about ELO and similar systems. So, I invented a system without initial ratings. The system is the simplest thing possible. There are drawbacks though. Like, for example, a perfect player, who won one game against everybody in the list, would not be the 1st as the result.
Dalf: well at least if you iterate the ELO ratings several times, i.e. starting from equal ELO rating for everyone, then update them according to games, then end up with some ELO ratings, then restart the process using those ELOs as initial values, you'll avoid the problem of initial ratings, and this should give a decent result, roughly equivalent to sound probabilistic approaches, I would guess. The nice thing about probability-based systems, is that they give you the probability to the outcome of a match, which directly quantifies how much "stronger" the strong player is (there is a difference between winning 90% of the time, or winning 55% of the time). But looking at the "ProGoR" page, I see that the changes in ratings of one player depending on the time can really be wide... much more that the changes of rating of top players in chess. So there is much uncertainty about rating anyway.
HolIgor: Top go players play from 50 to 80 games each year. Top chess players play much less. When I first learned that the go players in Japan play on Thursdays I thought that they took it easy, I played more. Yet, in fact, a game each week is a lot. Plus there are international tournaments and sometimes they play on Sundays. As for the chess, while the game is still popular on the amateur level, at the level of grand masters the game is stagnating. Small number of games means that ratings don't change much.
Dalf: ok, that's interesting, I also found on ProGor the history of rating of players which illustrate that. Now that means that the rating is more accurate which is all more surprising (i.e. you can be lucky and win 9 out of 10 in a year instead of a normal 5 out of 10 - but winning 45 out of 50 games by being lucky is more difficult).
As for comparing with chess, I wouldn't say that the rating of the grand master is precisely "stagnating" (nor the game itself), but rather that it is stable: it is almost impossible for a chess player to perform 100 points below his rating: for instance for Kasparov it would mean playing like a lower player of the top 10, which never happened since he barely ever lost against such players (similar for #2 and #3 Kramnik and Anand) ; for a player of the top 10 it would mean playing like a lower of top 100, which almost never happens.
Still, interestingly, we can see that ratings can change fast considering the youngster Bacrot gained 70 points in one year, or older players like the veteran Korchnoi droping from top 20 to out of the top 100 (in 5 years).
Now comparing to ProGoR ratings, 100 ELO = 40 ProGoR for equal win expectancy: the surprise is that jumps of 40 ProGoR are not uncommon for Go Players (Yi Ch'ang-ho -50 in 1998, Cho Chikun +40 in 2002, Nie Wieping -50 in 2 years, Kobayashi Koichi -64 in 2001, Ma Xiaochun -90 in 3 years, O Rissei -42 in 2 years, Gu Li -30 in 2003, +65 in 2004, and so on... - I love numbers :-)) - this could have been because ProGoR change faster, but as you said, there is a higher number of games, so it should be a real change of strength (or some other explanation). This means that the Cho Chikun of 2002 had 2/3 chances of winning against the Cho Chikun of 2001 (equivalent to 100 ELO), a dramatic change of strength (compared to chess players).
Or maybe there is another factors, but the interesting thing (to me :-)) is that those big yearly changes are bigger than the difference of strength of top players (-45 means playing like a #18 player for Yi Ch'ang-ho, that is like Cho Hun-hyeon). The beauty of that being that Holigor rating is likely as good as ProGoR rating :-)
HolIgor: Chess has draws also. In go Yi Chang-ho wins about 75% of his games. He loses one game in 4 too. In go Yamashita Keigo lost 8 games in a row. This is impossible in chess.