KGS Rating Math
|Table of contents|
For KGS 3.0, the KGS rating system turned back to the old formula (It means the minimum win probability introduced in 2.6.8 is back to 0) with the 2.6.8 constants.
- k varies depending on your rank:
- k=0.85 for 30k-5k
- k=???? for 4k-1d
- k=1.30 for 2d+
- The half life of games varies depending on your rank:
- 15 days for 30k-15k
- ?? days for 14k-1k
- 45 days for 1d+
There is some documentation about KGS ranks on these pages:
Basically KGS assumes that the expected win rate of two players is:
P(A wins) = 1 / ( 1 + exp(k*(RankB-RankA)) )
where k varies from 0.85 to 1.3, depending on the ratings of the players, and the RankB-RankA is adjusted by 1 for every handicap stone and a small amount for each point of komi.
Using this formula, KGS constantly recalculates the most likely rating for every player based on the games they played. Old games are decreased in weight exponentially with a half life ranging from 45 days to 15 days (weak players have short halflifes so their old results don't affect them as much). Games older than 180 days are not considered.
expected rank win diff rate 0.0 50% 0.5 60% 1.0 70% (k=.85, for 30k-5k) 1.5 78% 2.0 85% 2.5 89%
0.0 50% 0.5 66% 1.0 79% 1.5 88% (k=1.30, for 2d+) 2.0 93% 2.5 96%
Handicapping ensures that you win around 50% of your games if your rank is accurate. However, there is quite a bit of room within each rank for some players to be quite a bit stronger than others of the same rank. Also, the handicap system actually favors white by 0.5 stones. In fact at the extreme end, suppose you are 2.99 dan (on the verge of 3d) and play a 1.00 dan (a very weak 1d). KGS will suggest a H1 game (komi=0.5), but this still makes an effective rank difference of 2.99-1.00-0.5=1.49, and white will be expected to win 88% of the time (78% for 5k and below). This is an extreme example, but even if you play all your games with players of your rank you will typically need to prove that you are 0.5 stones stronger than them to promote. So for 2d+, you need to win 66% of the time, 60% for 5k and below.
- KGS formula: P(A wins) = 1 / ( 1 + exp(k*(RankB-RankA)) )
- Elo formula: P(A wins) = 1 / ( 1 + 10^((RatingB-RatingA)/400) )
- Set RankB-RankA = 1 and solve for RatingB-RatingA:
- RatingB-RatingA = k/ln(10)*400
- For 2d+ this is 226 Elo per rank
- For 30k-5k this is 148 Elo per rank
- Note that EGF uses a modified version of Elo that sets 100 rating points per rank, and then varies parameters to make the win rates match expectations)
- The AlphaGo paper uses 230 Elo per rank, this looks like a rounding error from using a 79% win rate between ranks instead of starting from k=2.3
Here is some math showing how your rank on KGS would react to you being a 2.5 (average 2d) and going from a steady 50% win rate to some other win rate.
Some assumptions / simplifications implied:
- This is based on the information at http://www.gokgs.com/help/rmath.html?helpLocale=en_US, and a few things wms told me directly:
- The weight of a game on KGS decreases exponentially over time, with a half life of 45 days. In this I dropped all games older than 180 days (is this right?)
- KGS uses k=0.8, giving a 69% win rate in an even game between players 1 rank apart.
- These are all assumed to be even games against a 2.5 (an average 2d). It makes a big difference if you play a 2.0 (weak 2d) or a 2.9 (strong 2d).
- A game between an average 2d and average 3d at 0.5 komi does not yield a 50% win rate. It's actually 60% for the 3d since 0.5 komi is only a half stone handicap. Generally in any handicap game, white should win 60% of the time. Again this example just assumes all even games for easier math.
- Does not take into account the movements of other players, which has an impact on you (ie "rank drift").
- Does not take into account the "confidence" factor, which can cause games with players of uncertain ranks to have less weight.
- Assumes you play rated games at a constant rate. Note that with this assumption, it does not matter what that rate actually is, except that it is great enough to make KGS' "confidence" factor not have any impact.
Suppose you played for 6 months as a 2.5 (an average 2d), and then suddenly became 3.5 (an average 3d) in strength and therefore started winning 79% of your games. In 45 days, KGS will rate you as a 2.5 + 0.51 = 3.01 (weak 3d).
--------- Number of days played at new strength | -- Increase in your rating | | 0 0.00 15 0.21 30 0.38 45 0.51 60 0.62 75 0.71 90 0.78 105 0.84 120 0.89 135 0.93 150 0.96 165 0.98 180 1.00
Assume you played 1 game a day as an average player of your rank for 180 days. Then you get inspired and play a whole bunch of even games (still with average players of your rank) in one day and win them all. How will those games affect your rating?
-------- Number of games won in a row | ------- Increase in rating for 2d and above | | ------ Increase in rating for 15k and below | | | 0 0.00 0.00 1 0.02 0.10 2 0.05 0.20 3 0.07 0.29 4 0.09 0.37 5 0.12 0.44 6 0.14 0.52 7 0.16 8 0.18 9 0.20 10 0.22 11 0.24 12 0.26 13 0.27 14 0.29 15 0.31 16 0.32 17 0.34 18 0.36 19 0.37 20 0.39 21 0.40 22 0.42 23 0.43 24 0.45 25 0.46 26 0.47 27 0.49 28 0.50
For 2d and above it will take 28 games. For 15k and below it will take 6 games.
Where this page speaks about "ratings", it doesn't refer to standard ratings of players, but to effective ratings. (This also applies to strengths, which are defined in the next section.) Now, what does that "effective" stand for?
Let's say, your opponent is a strong 3k (-3.1 AGA or 1840 EGF). If you give him/her an advantage of 0.5 stones (that is, s/he takes black) and get a komi of 5.5 moku (this default komi is debatable), his effective rating is also -3.1 AGA or 1840 EGF.
But if you use different handicap and/or komi, your opponent's effective playing power, compared to yours, will be affected. His/her effective rating in that game needs to take account of that change, to allow proper calculation of outcome probabilites. Each additional handicap stone that you give to the opponent effectively makes him/her one rank stronger. If it's you who gets handicap, his/her effective rating decreases by one rank per stone. The same applies to komi, as far as it differs from 5.5 moku to white (again, this number is debatable): your opponent's effective rating is increased/decreased by 1 rank per 11 moku komi points you give/take.
So, if you play white, giving 4 stones (which gives black an advantage of 3.5 stones over white) while still getting 5.5 moku, his/her effective rating in that game is (5.5 - 5.5)/11 + (3.5 - 0.5) = 3 grades better than his/her standard rating. The mentioned "strong 3k" player will effectively be a strong 1d (1.9 AGA or 2140 EGF) then to you.
If you play white and give your opponent 4 stones, but without komi this time, Black's effective rating is (5.5 - 0.0)/11 + (3.5 - 0.5) = 3.5 grades stronger than his/her standard rating. Therefore, s/he will effectively be close to an average 2d (2.4 AGA or 2190 EGF).
Usually it's considered best choice to have handicap and komi make your opponent effectively as strong as you are. However, it's possible to use different, even weirdly different, settings. E. g., you can give a stronger player handicap. The tables and formulas at this page still apply as long as you keep in mind to use the effective ratings (or effective strengths, respectively).
blubb: I am confident the rating system can be clarified enough to help people with some basic math knowledge to actually understand its main principles.
The most basic way is to present tables of values like you can see above, even if they can show some exemplary cases only. Another attempt is to simplify the math formulas as far as possible. If you want to go beyond http://www.gokgs.com/help/rank.html?helpLocale=en_US, but feel deterred by the somewhat cryptic explanation at http://www.gokgs.com/help/rmath.html?helpLocale=en_US, read on.
First, one can define a "(linearized) strength" s of players:
s := e^(k*r) ,
where r is the common (logarithmized) rating. E. g., a value of k = 0.8 gives
s = 2.22^r .
Using this, the winning probability of player A against player B can be expressed as
s(A) P(A,B) = ------------- , s(A) + s(B)
Roughly spoken, this equation says:
If A is twice as strong as B, s/he is expected to win twice as often against B, as B is against A.
(Important note: This refers to pure strengths. In non-even games, effective strengths have to be used here, as calculated from effective ratings.)
When we look at the rating calculation procedure (a so-called maximum-likelihood estimation), it turns out that it can be transformed to a rather simple formula as well.
Let player A's record consist of N games with various opponents (B1,...,Bx,..., BN). (It doesn't matter whether the opponents are all different or, e. g., B4 is the same person as are B5 and B11. They're just referenced according to the order of games.)
The players B1 to BN have the (linearized) strengths s(B1) to s(BN), respectively.
Let qx indicate the result of game no. x as follows (ties are not taken into account here):
qx := 1 , if A won the game, 0 , if A lost the game.
Similarly, we write rx for the winning probability of player A in game no. x:
s(A) rx := P(A,Bx) = -------------- . s(A) + s(Bx)
Now, what does it mean to calculate the strength of player A? Because rx depends on A's strength, we can call it a function of s(A), in other words, rx = rx (s(A)).
The correct strength value then is the one which ensures that the equation
Avg(rx) = Avg(qx)
The correct strength makes the average winning probability value equal to the overall winning ratio.
In fact, the above paragraph doesn't cover how ratings are really calculated. If B8 for instance is an opponent who has not been playing for a long time, it's rather unsure what the result of game no. 8 actually tells about A's strength, because B8's strength itself is very uncertain.
That's why so-called "weights" wx are assigned to the games. They are intended to take account of each particular game result according to its "meaningfullness". A weight is a number between 0 (games which are not evaluable at all) and 1 (most evaluable games).
So, including weights, the correct strength value ensures that
Avg(wx * rx) = Avg(wx * qx)
By taking wx, rx and qx for vectors, that is
_ _ _ 0 = w * (r - q) .
Currently, weight depends on
- the time since the game has been finished
- the opponent's current rank confidence.
By the way, whilst the common kyu/dan ranks get the harder to increase the better you are, the strength behaves a little more according to the effort you spend on improving. However, there's a great drawback to strengths: it's irksomely difficult to tell the appropriate number of handicap stones between two players of given linearized strengths.
- a constant playing rate
- a certain, constant mix of effective strentghs of opponents.
Before the leap,
- your performance was constant over a long time
- your (weighted) win ratio was stable at q0
- your strength was in its equilibrium at s(0).
- your (weighted) win ratio switches to q1
- your strength starts moving like s(t).
Please note that the following formulas are continuous approximations that fit the better the higher the (constant) playing rate.
If the mix of opponent strengths remains the same as before, and you manage to keep the new win ratio, your recognized strength gradually changes according to
s(t) 1 + qW * E(t) ---- = ------------- s(0) 1 + qL * E(t)
where t = t - 0 time passed since your sudden improvement qW = q1 / q0 quotient of win ratios after and before the leap qL = (1-q1) / (1-q0) quotient of loss ratios after and before the leap E(t) = 2^(t/t_half) - 1 aging factor
Calculating your rank development with k = 1.3, a half life of 30 days, an old win ratio of q0 = 0.5 and a new win ratio of q1 = 1.0, we get
qW = 1.0 / 0.5 = 2.0 qL = 0.0 / 0.5 = 0.0
1 + 2.0 * E(t) r(t) - r(0) = (1/k) ln( -------------- ) 1 + 0.0 * E(t)
= ln(1 + 2.0 * (2^(t/(30d)) - 1)) / 1.3
which simplifies to
r(t) - r(0) = ln(2^(1 + t/(30d)) - 1) / 1.3
(We could fill in dates on these and a summary of the changes) pre 2.6.4 2.6.4 3.0
- python implementation of some parts of KGS rating algorithm