KGS Game Result Weighting

    Keywords: Tesuji

(Moved from KGS Wishlist / Game Handling)

rubilia: Weight ratings according to the average time per move rather than to the total playing time of a game, and use a continuous function to do so.

wms: This simply makes no statistical sense, rubilia. Either games are slow enough to predict the strength of a player in "normal" speed games, or they aren't. If they are, they should be counted. If they aren't, they shouldn't. I just don't see why I would add inaccurate information to the rating system at any weight. Making it weighted less doesn't make it any more accurate, just possibly less damaging - but leaving it out completely will be even less damaging, so that's what I intend to do.

rubilia: I agree for the case that your sample is infinite. Then an element (that is, a game) either contributes useful data to the evaluation (correlation being positive) and can be included with full weight. Or it doesn't contribute useful data and should not get any weight at all (zero or negative correlation). However, ratings are calculated from a finite set of results. I don't think this is the right place to go into details of theory here, and I am not quite familiar with english prob&stat vocabulary either. I'll give an (somewhat artificial) example, so it should be easy to see the consequences. (Sorry if that's "too much numbers". I want to keep the conversation as low-level as possible in order to allow non-math people to follow.)

My point is not about absolute ranks, but about the spreading of rank differences. Imagine there are exactly two distinct types of games with different certainty of outcomes, that is, with different variances (represented by P-functions with different k values, in terms of the KGS Math help page). Neither the age of games nor the opponents' rank confidence shall influence the weight. Type B shall behave according to half of the k value of type C. The variance of rank difference is integral(( (1/k) * ln(p / (1-p)) - 0)^2) dp, where p = 0 ... 1, hence type B has a variance 4 times bigger than type C (which I call varC).
Now let's say, a particular user's record consists of 10 games of type B and type C each. If you don't weight, you won't want to take account of type B games - they would eliminate a considerable part of accuracy. The resulting rank diff variance of all B- and C-games together would be ((10*1 + 10*4)/(10+10)^2) * varC = 1/8 * varC, while you can get a narrower variance of (10*1/10^2) * varC = 1/10 * varC evaluating the ten C-games only.
But taking the average outcome of four B-games for a single C-game's result, they can be included very well: just assign B-type games a weight of 1/4, and C-games a weight of 1, so you can treat the whole set as 12.5 games of consistently varC each. The resulting rank diff variance is even better: ((10*(1*1^2) + 10*(4*(1/4)^2))/(10*1 + 10*1/4)^2) * varC = 1/12.5 * varC. That way, the B games contribute valuable information, resulting in more accuracy than the C-games can provide alone.
(I have chosen "B" and "C" to indicate that there could be "A-games", not deserving any weight > 0.)
For optimal weighting, the product (weightingfactor * variance) needs to be constant for all game types. Of course, that also applies if there are no distinctive types but a rather continuous range of parameters. The weighting function suggested below is an attempt to roughly approximate the bigger influence of "luck" in speedy games. (By the way, games faster than m _are_ left out completely.)

rubilia: The ability to recognize that the opponent has moved and where, then to shift the mouse to any point and to click the button there and, not at least, to have a fast PC and internet connection, is of no importance to what I'd call an appropriate rank, and hardly correlates with players´ strength. Hence a weighting function which depends on how big a part of the players' total activity is supposed to be afforded for those matters, is advisable. I assume something like w := max(0, (T-N*m)/T) would work fine, where m stands for the time which is practically needed to perform a move without spending any time on thinking. N is the total number of moves played in the game and T is the total playing time, making (T-N*m) the "left for thinking time". Maybe 2 seconds per move (s/move) would be a good starting point for m. Using this m value, a game lasting for 80 minutes and consisting of 240 moves (which means it´s a 20 s/move-game), would weight 0.9 times the maximum that could be achieved. A 5 s/move-game would get 0.6, and a game of 2 s/move or less would get zero weight. The very minimum of m could also be estimated by experiment: let a representative set of people try to place as many stones per minute as they can, then m approximately is the average time per move.



There's alot of interesting ideas here, but I think that using Ockham's razor is important here. This is something both scientist and programmers should always keep in mind. The current system is good at finding good matches from my experience.


This is a copy of the living page "KGS Game Result Weighting" at Sensei's Library.
(OC) 2004 the Authors, published under the OpenContent License V1.0.
[Welcome to Sensei's Library!]
StartingPoints
ReferenceSection
About