KGS Rating Math

Path: <= Rank =>

Table of contents
Changes for KGS 3.0

Introduction

Example expected win rates

Converting to traditional Elo

Tables on rank response

Table for suddenly becoming 1 stone stronger:

How many wins in a row to promote?

Effective Ratings

The Math Behind, Made Easy

Strength of Players

Rating Calculation

Game Result Weighting

General leap response formulas

Unchanged opponent strength

History of major rating algorithm changes

Related pages

Links to information about other rating systems

[edit]

Changes for KGS 3.0

For KGS 3.0, the KGS rating system turned back to the old formula (It means the minimum win probability introduced in 2.6.8 is back to 0) with the 2.6.8 constants.

k varies depending on your rank:
- k=0.85 for 30k-5k
- k=???? for 4k-1d
- k=1.30 for 2d+
The half life of games varies depending on your rank:
- 15 days for 30k-15k
- ?? days for 14k-1k
- 45 days for 1d+

[edit]

Introduction

There is some documentation about KGS ranks on these pages:

Basically KGS assumes that the expected win rate of two players is:

P(A wins) = 1 / ( 1 + exp(k*(RankB-RankA)) )

where k varies from 0.85 to 1.3, depending on the ratings of the players, and the RankB-RankA is adjusted by 1 for every handicap stone and a small amount for each point of komi.

Using this formula, KGS constantly recalculates the most likely rating for every player based on the games they played. Old games are decreased in weight exponentially with a half life ranging from 45 days to 15 days (weak players have short halflifes so their old results don't affect them as much). Games older than 180 days are not considered.

[edit]

Example expected win rates

        expected
 rank     win
 diff     rate
  0.0     50%
  0.5     60%
  1.0     70%    (k=.85, for 30k-5k)
  1.5     78%
  2.0     85%
  2.5     89%

  0.0     50%
  0.5     66%
  1.0     79%
  1.5     88%    (k=1.30, for 2d+)
  2.0     93%
  2.5     96%

Handicapping ensures that you win around 50% of your games if your rank is accurate. However, there is quite a bit of room within each rank for some players to be quite a bit stronger than others of the same rank. Also, the handicap system actually favors white by 0.5 stones. In fact at the extreme end, suppose you are 2.99 dan (on the verge of 3d) and play a 1.00 dan (a very weak 1d). KGS will suggest a H1 game (komi=0.5), but this still makes an effective rank difference of 2.99-1.00-0.5=1.49, and white will be expected to win 88% of the time (78% for 5k and below). This is an extreme example, but even if you play all your games with players of your rank you will typically need to prove that you are 0.5 stones stronger than them to promote. So for 2d+, you need to win 66% of the time, 60% for 5k and below.

[edit]

Converting to traditional Elo

KGS formula: P(A wins) = 1 / ( 1 + exp(k*(RankB-RankA)) )
Elo formula: P(A wins) = 1 / ( 1 + 10^((RatingB-RatingA)/400) )

Set RankB-RankA = 1 and solve for RatingB-RatingA:
RatingB-RatingA = k/ln(10)*400
For 2d+ this is 226 Elo per rank
For 30k-5k this is 148 Elo per rank

Note that EGF uses a modified version of Elo that sets 100 rating points per rank, and then varies parameters to make the win rates match expectations)
The AlphaGo paper uses 230 Elo per rank, this looks like a rounding error from using a 79% win rate between ranks instead of starting from k=2.3

[edit]

Tables on rank response

Here is some math showing how your rank on KGS would react to you being a 2.5 (average 2d) and going from a steady 50% win rate to some other win rate.

Some assumptions / simplifications implied:

This is based on the information at http://www.gokgs.com/help/rmath.html?helpLocale=en_US, and a few things wms told me directly:
- The weight of a game on KGS decreases exponentially over time, with a half life of 45 days. In this I dropped all games older than 180 days (is this right?)
- KGS uses k=0.8, giving a 69% win rate in an even game between players 1 rank apart.
These are all assumed to be even games against a 2.5 (an average 2d). It makes a big difference if you play a 2.0 (weak 2d) or a 2.9 (strong 2d).
- A game between an average 2d and average 3d at 0.5 komi does not yield a 50% win rate. It's actually 60% for the 3d since 0.5 komi is only a half stone handicap. Generally in any handicap game, white should win 60% of the time. Again this example just assumes all even games for easier math.
Does not take into account the movements of other players, which has an impact on you (ie "rank drift").
Does not take into account the "confidence" factor, which can cause games with players of uncertain ranks to have less weight.
Assumes you play rated games at a constant rate. Note that with this assumption, it does not matter what that rate actually is, except that it is great enough to make KGS' "confidence" factor not have any impact.

[edit]

Table for suddenly becoming 1 stone stronger:

Suppose you played for 6 months as a 2.5 (an average 2d), and then suddenly became 3.5 (an average 3d) in strength and therefore started winning 79% of your games. In 45 days, KGS will rate you as a 2.5 + 0.51 = 3.01 (weak 3d).

   --------- Number of days played at new strength
  |       -- Increase in your rating
  |      |
  0    0.00
 15    0.21
 30    0.38
 45    0.51
 60    0.62
 75    0.71
 90    0.78
105    0.84
120    0.89
135    0.93
150    0.96
165    0.98
180    1.00

[edit]

How many wins in a row to promote?

Assume you played 1 game a day as an average player of your rank for 180 days. Then you get inspired and play a whole bunch of even games (still with average players of your rank) in one day and win them all. How will those games affect your rating?

  -------- Number of games won in a row
 |     ------- Increase in rating for 2d and above
 |    |      ------ Increase in rating for 15k and below
 |    |     |
 0  0.00  0.00
 1  0.02  0.10
 2  0.05  0.20
 3  0.07  0.29
 4  0.09  0.37
 5  0.12  0.44
 6  0.14  0.52
 7  0.16
 8  0.18
 9  0.20
10  0.22
11  0.24
12  0.26
13  0.27
14  0.29
15  0.31
16  0.32
17  0.34
18  0.36
19  0.37
20  0.39
21  0.40
22  0.42
23  0.43
24  0.45
25  0.46
26  0.47
27  0.49
28  0.50

For 2d and above it will take 28 games. For 15k and below it will take 6 games.

[edit]

Effective Ratings

Where this page speaks about "ratings", it doesn't refer to standard ratings of players, but to effective ratings. (This also applies to strengths, which are defined in the next section.) Now, what does that "effective" stand for?

Let's say, your opponent is a strong 3k (-3.1 AGA or 1840 EGF). If you give him/her an advantage of 0.5 stones (that is, s/he takes black) and get a komi of 5.5 moku (this default komi is debatable), his effective rating is also -3.1 AGA or 1840 EGF.

But if you use different handicap and/or komi, your opponent's effective playing power, compared to yours, will be affected. His/her effective rating in that game needs to take account of that change, to allow proper calculation of outcome probabilites. Each additional handicap stone that you give to the opponent effectively makes him/her one rank stronger. If it's you who gets handicap, his/her effective rating decreases by one rank per stone. The same applies to komi, as far as it differs from 5.5 moku to white (again, this number is debatable): your opponent's effective rating is increased/decreased by 1 rank per 11 moku komi points you give/take.

So, if you play white, giving 4 stones (which gives black an advantage of 3.5 stones over white) while still getting 5.5 moku, his/her effective rating in that game is (5.5 - 5.5)/11 + (3.5 - 0.5) = 3 grades better than his/her standard rating. The mentioned "strong 3k" player will effectively be a strong 1d (1.9 AGA or 2140 EGF) then to you.

If you play white and give your opponent 4 stones, but without komi this time, Black's effective rating is (5.5 - 0.0)/11 + (3.5 - 0.5) = 3.5 grades stronger than his/her standard rating. Therefore, s/he will effectively be close to an average 2d (2.4 AGA or 2190 EGF).

Usually it's considered best choice to have handicap and komi make your opponent effectively as strong as you are. However, it's possible to use different, even weirdly different, settings. E. g., you can give a stronger player handicap. The tables and formulas at this page still apply as long as you keep in mind to use the effective ratings (or effective strengths, respectively).

[edit]

The Math Behind, Made Easy

blubb: I am confident the rating system can be clarified enough to help people with some basic math knowledge to actually understand its main principles.

The most basic way is to present tables of values like you can see above, even if they can show some exemplary cases only. Another attempt is to simplify the math formulas as far as possible. If you want to go beyond [ext] http://www.gokgs.com/help/rank.html?helpLocale=en_US, but feel deterred by the somewhat cryptic explanation at [ext] http://www.gokgs.com/help/rmath.html?helpLocale=en_US, read on.

[edit]

Strength of Players

First, one can define a "(linearized) strength" s of players:

            s :=  e^(k*r) ,

where r is the common (logarithmized) rating. E. g., a value of k = 0.8 gives

            s  =  2.22^r .

Using this, the winning probability of player A against player B can be expressed as

                      s(A)
       P(A,B)  =  ------------- ,
                   s(A) + s(B)

Roughly spoken, this equation says:

If A is twice as strong as B, s/he is expected to win twice as often against B, as B is against A.

(Important note: This refers to pure strengths. In non-even games, effective strengths have to be used here, as calculated from effective ratings.)

[edit]

Rating Calculation

When we look at the rating calculation procedure (a so-called maximum-likelihood estimation), it turns out that it can be transformed to a rather simple formula as well.

Let player A's record consist of N games with various opponents (B1,...,Bx,..., BN). (It doesn't matter whether the opponents are all different or, e. g., B4 is the same person as are B5 and B11. They're just referenced according to the order of games.)

The players B1 to BN have the (linearized) strengths s(B1) to s(BN), respectively.

Let qx indicate the result of game no. x as follows (ties are not taken into account here):

          qx  :=  1    , if A won the game,
                  0    , if A lost the game.

Similarly, we write rx for the winning probability of player A in game no. x:

                                   s(A)
          rx  :=  P(A,Bx)  =  -------------- .
                               s(A) + s(Bx)

Now, what does it mean to calculate the strength of player A? Because rx depends on A's strength, we can call it a function of s(A), in other words, rx = rx (s(A)).

The correct strength value then is the one which ensures that the equation

      Avg(rx)  =  Avg(qx)

is fulfilled.

That is:

The correct strength makes the average winning probability value equal to the overall winning ratio.

[edit]

Game Result Weighting

In fact, the above paragraph doesn't cover how ratings are really calculated. If B8 for instance is an opponent who has not been playing for a long time, it's rather unsure what the result of game no. 8 actually tells about A's strength, because B8's strength itself is very uncertain.

That's why so-called "weights" wx are assigned to the games. They are intended to take account of each particular game result according to its "meaningfullness". A weight is a number between 0 (games which are not evaluable at all) and 1 (most evaluable games).

So, including weights, the correct strength value ensures that

 Avg(wx * rx)  =  Avg(wx * qx)

is true.

By taking wx, rx and qx for vectors, that is

                  _    _   _ 
            0  =  w * (r - q) .

Currently, weight depends on

the time since the game has been finished
the opponent's current rank confidence.

By the way, whilst the common kyu/dan ranks get the harder to increase the better you are, the strength behaves a little more according to the effort you spend on improving. However, there's a great drawback to strengths: it's irksomely difficult to tell the appropriate number of handicap stones between two players of given linearized strengths.

[edit]

General leap response formulas

Suppose

a constant playing rate
a certain, constant mix of effective strentghs of opponents.

Before the leap,

your performance was constant over a long time
your (weighted) win ratio was stable at q0
your strength was in its equilibrium at s(0).

But suddenly,

your (weighted) win ratio switches to q1
your strength starts moving like s(t).

Please note that the following formulas are continuous approximations that fit the better the higher the (constant) playing rate.

[edit]

Unchanged opponent strength

If the mix of opponent strengths remains the same as before, and you manage to keep the new win ratio, your recognized strength gradually changes according to

          s(t)     1 + qW * E(t)
          ----  =  -------------
          s(0)     1 + qL * E(t)

          where
              t     =    t   -    0               time passed since your sudden improvement
              qW    =   q1   /   q0               quotient of win ratios after and before the leap
              qL    = (1-q1) / (1-q0)             quotient of loss ratios after and before the leap
              E(t)  = 2^(t/t_half) - 1            aging factor

Example:

Calculating your rank development with k = 1.3, a half life of 30 days, an old win ratio of q0 = 0.5 and a new win ratio of q1 = 1.0, we get

qW    =    1.0   /  0.5    =  2.0
qL    =    0.0   /  0.5    =  0.0

and thereby

                         1 + 2.0 * E(t)
r(t) - r(0) =  (1/k) ln( -------------- )
                         1 + 0.0 * E(t)

            =  ln(1 + 2.0 * (2^(t/(30d)) - 1)) / 1.3

which simplifies to

r(t) - r(0) =  ln(2^(1 + t/(30d)) - 1) / 1.3

[edit]

History of major rating algorithm changes

(We could fill in dates on these and a summary of the changes) pre 2.6.4 2.6.4 3.0

[edit]

Links to information about other rating systems

EGF

Path: <= Rank =>

KGS Rating Math last edited by yoyoma on March 1, 2016 - 21:02

RecentChanges · StartingPoints · About

Edit page ·Search · Related · Page info · Latest diff