Rationales Behind Rating Systems
Most people who play Go are also more or less concerned with ranks or ratings[1]. There are various ideas about how Go players' ratings ought to be determined, ranging from rather arbitrary rank assignments to strict mathematical optimizations. Each particular system has its pros and cons, its friends and foes.
This page is meant to facilitate constructive discussions by giving a systematical overview about the key concepts, without advocating or putting any down. (Please do not argue about them here, but use the dedicated discussion pages or forums.)
Table of contents |
Determinants
There are two competing issues a rating system has to deal with. Sufficiently large ensembles of players are amazingly complex, and often, simplifying one will obfuscate the other:
Rating -> Results
A fundamental aspect of ratings is, they help balancing games with unknown players[2]. Related questions include "Which handicap and komi is appropriate against player xyz?" and "I want to play an exciting even game. Whom should I challenge?"
The better a system allows to predict one's winning chances against particular opponents, the more interesting, on average, the games will be for both sides. Players who treat ratings mainly as a means for high quality, well-balanced games (which they see as the real meaning of play) won't be satisfied with a system where the rating of opponents doesn't depict their real strength reliably.
Results -> Rating
In the Go world, ratings are also status symbols[2]. To many Go players, questions like, "How can I improve my rating?" or "What do I need to achieve to get to the next rank?" are important.
Rating systems which provide a clear view on the requirements for some definite advance work best here. Players who mainly see winning games as a means to increase their rating (which they then perceive as the real success), won't appreciate a system where their rating changes don't correspond understandably with success or failure.
Even though these aspects are competing, they also depend on each other. Ratings that tell nothing about the actual strengths would hardly be attractive or worth fighting for. Likewise, if results didn't have any influence on ratings, the latter couldn't indicate winning chances nor allow for balanced game settings, either.
Other aspects
Stability
Just as consistency over continents or internet servers, stability over time adds a lot to the use of a rating system. The most volatile ones can be found in temporary scales, like the scores occuring during swiss tournaments for example, where drift doesn't really matter. At bigger scales, stability is crucial though, because both main goals rely on social memory, which doesn't like frequent adjustments.
Feasibility
Aside from the criteria above, resource constraints impose hard restrictions on the choice of rating systems.
Ratings that take years to be computed on the available hardware are hardly helpful. A system that requires a committee of 500 professionals to judge about each amateur's rank will also be not very practical. On the other hand of course, not all advanced rating systems imply such unrealistic preconditions. It is well possible that a more complex system really does a better job than a simpler one.
Variants
The most commonly used rating systems include, in the order of increasing emphasis on good estimates (at the cost of being decreasingly fathomable):
Wins
This is one of the simplest rating algorithms. Each player is assigned a score according to the number of wins (or, alternatively, the win tally) in his/her record. Many championships in various sports use something along these lines. The scores are immediately understandable. Since a winner is awarded with an entire point regardless of the defeated opponent being extremely tough or weak, the scores don't tell much about the actual strengths.
ELO
Also strong emphasis on understandability, but more meaningful ratings. Takes account of the game outcomes, as well as their probabilities. See ELO Rating.
Glicko
More complicated than ELO, yet generally better prediction. Takes account of outcomes and their probabilities, as well as of the reliabilities of the estimates gained thereby. See Glicko Rating.
ML
Aims at the best prediction available at any point of time, and accepts a somewhat obscure response scheme in exchange. Takes account of game outcomes, their probabilities and the reliabilities of estimates, as well as of base data updates. See http://www.gokgs.com/help/rmath.html?helpLocale=en_US or KGS Rating Math.
--blubb
[1] Where ranks are derived from an underlying fine-grained rating scale, it obviously suffices to consider the ratings alone. Ranks that are not based on such a fine-grained scale can be suitably mapped to integers to obtain a simple (though a bit crude) rating scale.
[2] This holds at least within the community the concerned rating system is applied to, and possibly beyond.