The section Converting Elo Ratings into Go Ranks states that there is a linear relationship between Elo rating E and the dan/kyu rank system D (∆E=100*∆D for DGS). How can this be? This seems a very questionable assumption to me. Elo is based on probability of winning. Give a 5d one extra handicap stone against a 6d (as a birthday present), and in her able hands her winning probability should increase far more than if you give that stone to a 29k playing against a 30k.
It's only a model. (On second thought, let's not go to Camelot. It is a silly place.)
At the lower end of the rank spectrum the game results are more often decided by luck, which will make the average result tend towards 50/50 regardless of the relative strengths of the players, so the results (and by extension Elo ratings) tell very little of the players' actual strength.
Somewhat funnily, these games are typically (in GoR, FIDE and USCF, at least) the ones that are given the biggest K factor, so they will affect the ranking the most. This does nothing to improve the accuracy of Elo at the lower end, but it will help quickly developing players to gain points faster.
In summary, the lower end of an Elo scale is utterly useless, while some would claim that the upper end is slightly less so. :-)
-Bass
Without getting into the discussion about Luck in Go (we can just go by our intuitive understanding of "luck" here), I agree with the gist of your second paragraph. But I don't see why that should only apply to the low end. Even between a 6d and a 1d, it stands to reason that the former, on average, relies more on skill than the latter. Of course, the effect will be less, but my point is still valid: I see no justification for assuming that they are the same. So how can we equate the two?
In your third paragraph, you're raising a different issue. The problem here is caused by basing K on rank, instead of the number of data points (that is, games played), as is usual in statistics.
We can equate the cases by using a magic word, like this: "Assume the expected performance of a go player can be described by a single number on a logarithmic scale", and then working from that assumption.
If you do not find this premise plausible, then you are questioning the very foundation of the Elo system, and you would also be quite correct in doing so.
This is what I meant with the first and last paragraphs of my previous message.
Cheers,
-Bass
PS. You may be interested in this: http://www.kaggle.com/c/chess
Hmm, I don't see how that follows from your assumption. Maybe that's because I don't understand the assumption. If it weren't for the word "logarithmic", I would understand it as: "(1) For each player A, we can assign an expected performance E(A) such that all expected performances form a totally ordered set." This assumption may indeed not be true, but it's essential for any rating system. Any rating system is only a model, there's nothing wrong with that, and it's certainly not "utterly useless" for that.
In the above understanding, the added word "logarithmic" becomes trivial: If you have any scale, you can easily map it to a logarithmic scale; statement (1) is invariant under that operation.
But maybe what you meant was more than that. The obvious interpretation would be the Elo definition, or some generalization of it, maybe something along the lines of "a rating system based on the logarithm of winning probabilities". But then the statement would be a tautology: "If we can assume something like Elo works, then Elo works."
Your link is interesting, but of course it doesn't pertain to this discussion about dan/kyi, because there's no equivalent to the Go handicap system in chess. My question was not about generally criticizing Elo, but specifically about the connection between the two systems. Even if they find the ideal replacement for Elo, it still would be a probability based system. My question is: Can such a system ever scale linearly to the handicap system?
I don't think anything can scale in proportion to the handicap system, since there is a kink in the handicap scale. If the players' ranking difference is exactly one kyu, the handicap (black plays first, white gets no komi) is only half a move. Every further kyu difference adds a whole move to the handicap, so the handicap system is not consistent:
Assume that:
Now, if the scale were linear, then in a game where A plays against C the handicap should be the handicap B gives to C plus the handicap A gives to B. This equals twice the komi. What C gets instead is two stones, while white gets no komi, so the handicap roughly equals 3 x komi.
If the handicap scale were to be made linear, black should receive the komi in addition to any handicap stones. (Or alternately, black should only get a handicap stone for every two kyus of difference, with white getting komi when the difference is even)
Because the handicap system is internally inconsistent, I mentally substituted the one stone handicap with "replace the player with someone who has 100 ranking points more" in my previous comments, but forgot to mention I had done so. Sorry about that.
Excellent point! I'm aware of this kink, which is why I used 5d vs 6d and 29k vs 30k above, but I didn't realize that that very same point answers my question: If the dan/kyu system isn't linear, then it can't equate a linear system, plain and simple. I'm changing the sentence that I misunderstood to indicate otherwise.
(BTW, I just saw that we already have a description of that kink here: Rank And Handicap#Criticisms on the traditional handicap scheme.)
Oh, and in the earlier assumption the emphasis was meant to be on the "described by a single number" bit.
The logarithmic thing is just how most people intuitively think about playing strength. That is, if A wins B 7 games out of 10 and B wins C 7 times out of 10, then, if noise (luck) is ignored, most people would think that A's should win approximately nine games out of ten (1-(3/10)^2) when playing against C.
If you choose the scale to be logarithmic, you can get this result _and_ you get to keep the nice result that the rating difference between A and B is the same as between B and C, so you can immediately tell that the strength difference is the same in both pairs. Other than that, it does not really matter much.
Ignoring the noise will of course bring problems, because in real life there is always noise. This is why several improvements that take noise into account have been suggested, hence the link.
Cheers,
-Bass
Ok I've updated the main page with some more info. Sebastian asked for more explanation. A difficulty of designing rating systems for Go is the extra requirement to align them to traditional dan/kyu ranks. Here is how I think of the way EGF's system was designed.
First assign arbitrary Elo-like numbers to each rank. EGF choose to set an average 1 dan to 2100, and every rank equal to 100 points. Now through statistics they estimated the winning probabilities of players of different ranks winning in even games. This sets the parameter 'a' in the EGF formula, which varies according to the average rank of the players (unlike regular Elo where it is fixed). This 'fitting' process is what makes the transformation from Elo numbers to ranks work.
For handicaps, we essentially get it by the definition of rank differences. A 5dan vs 2dan with 3 stone handicap plus reverse komi should have 50/50 win chances. Remember at first that players initially got their ranks from this definition, and that in the paragraph above we used players with those ranks, but playing even games to fit the probabilities for even games.
So everything is nice, but of course all rating systems have problems, there may be 3 players that beat each other in a cycle (A>B>C>A). Assigning a single number does not describe this situation. In addition for Go, some players may be better at handicap than others. For handicaps we start off with the axiom that if A gives B 3 stones, and B gives C 3 stones, that A should give C 6 stones. Everything tends to work out somewhat but it's not perfect for all situations.