![[Welcome to Sensei's Library!]](../../images/stone-hello.png)
StartingPoints
ReferenceSection
About
Referenced by KGSWishlist/GameH... KGSTheKiseidoGoSe...
|
KGS Game Result Weighting
Keywords: Tesuji
(Moved from KGS Wishlist / Game Handling)
rubilia: Weight ratings according to the average time per move rather than to the total playing time of a game, and use a continuous function to do so.
wms: This simply makes no statistical sense, rubilia. Either games are slow enough to predict the strength of a player in "normal" speed games, or they aren't. If they are, they should be counted. If they aren't, they shouldn't. I just don't see why I would add inaccurate information to the rating system at any weight. Making it weighted less doesn't make it any more accurate, just possibly less damaging - but leaving it out completely will be even less damaging, so that's what I intend to do.
rubilia: I agree for the case that your sample is infinite. Then an element (that is, a game) either contributes useful data to the evaluation (correlation being positive) and can be included with full weight. Or it doesn't contribute useful data and should not get any weight at all (zero or negative correlation). However, ratings are calculated from a finite set of results. I don't think this is the right place to go into details of theory here, and I am not quite familiar with english prob&stat vocabulary either. I'll give an (somewhat artificial) example, so it should be easy to see the consequences. (Sorry if that's "too much numbers". I want to keep the conversation as low-level as possible in order to allow non-math people to follow.)
- My point is not about absolute ranks, but about the spreading of rank differences. Imagine there are exactly two distinct types of games with different certainty of outcomes, that is, with different variances (represented by P-functions with different k values, in terms of the KGS Math help page). Neither the age of games nor the opponents' rank confidence shall influence the weight. Type B shall behave according to half of the k value of type C. The variance of rank difference is integral(( (1/k) * ln(p / (1-p)) - 0)^2) dp, where p = 0 ... 1, hence type B has a variance 4 times bigger than type C (which I call varC).
- Now let's say, a particular user's record consists of 10 games of type B and type C each. If you don't weight, you won't want to take account of type B games - they would eliminate a considerable part of accuracy. The resulting rank diff variance of all B- and C-games together would be ((10*1 + 10*4)/(10+10)^2) * varC = 1/8 * varC, while you can get a narrower variance of (10*1/10^2) * varC = 1/10 * varC evaluating the ten C-games only.
- But taking the average outcome of four B-games for a single C-game's result, they can be included very well: just assign B-type games a weight of 1/4, and C-games a weight of 1, so you can treat the whole set as 12.5 games of consistently varC each. The resulting rank diff variance is even better: ((10*(1*1^2) + 10*(4*(1/4)^2))/(10*1 + 10*1/4)^2) * varC = 1/12.5 * varC. That way, the B games contribute valuable information, resulting in more accuracy than the C-games can provide alone.
- (I have chosen "B" and "C" to indicate that there could be "A-games", not deserving any weight > 0.)
- For optimal weighting, the product (weightingfactor * variance) needs to be constant for all game types. Of course, that also applies if there are no distinctive types but a rather continuous range of parameters. The weighting function suggested below is an attempt to roughly approximate the bigger influence of "luck" in speedy games. (By the way, games faster than m _are_ left out completely.)
rubilia: The ability to recognize that the opponent has moved and where, then to shift the mouse to any point and to click the button there and, not at least, to have a fast PC and internet connection, is of no importance to what I'd call an appropriate rank, and hardly correlates with players´ strength. Hence a weighting function which depends on how big a part of the players' total activity is supposed to be afforded for those matters, is advisable. I assume something like w := max(0, (T-N*m)/T) would work fine, where m stands for the time which is practically needed to perform a move without spending any time on thinking. N is the total number of moves played in the game and T is the total playing time, making (T-N*m) the "left for thinking time". Maybe 2 seconds per move (s/move) would be a good starting point for m. Using this m value, a game lasting for 80 minutes and consisting of 240 moves (which means it´s a 20 s/move-game), would weight 0.9 times the maximum that could be achieved. A 5 s/move-game would get 0.6, and a game of 2 s/move or less would get zero weight. The very minimum of m could also be estimated by experiment: let a representative set of people try to place as many stones per minute as they can, then m approximately is the average time per move.
- Rakshasa: It seems more logical (and much simpler to implement) to require ranked games to be blitz or slower. The code should already be there for detecting ultra-blitz. Then noone can complain about ranks being unfair because of weights. (This is done in 2.5.8.)
- rubilia: To weight the games 100% or 0% only, is simpler, that's true. Though, about the "more logical" issue, I cannot agree. In fact, at KGS each game is already weighted continuously (with weight being anything between 0% and 100%), depending on the opponent's rank confidence (as well as on how old the game is). There are games which get 0.10 (10%) of the maximum possible weight, while other games get 0.93 (93%). I have never heard complaints about this weighting, but I have heard complaints about ranks being unfair because of the weighting function doesn't depend on game speed.
- Rakshasa: Those weights are related to how long ago the game was played and the weight of the players. It's not that weight i'm talking about here, each game has a constant base weight. (It's worth a single win or loss) If time changes the base weight, you suddenly end up with more games with way longer time that is needed. Is it fair that a game with an arbitrarily long game time gets weighted alot more? (They might not even spend half the time) What i think is fair about the ultra-blitz cutoff is that those games are mostly used by either time cheaters or those who want to have fun playing blitz.
- rubilia: If you think about the w function given above, carefully, you will recognize that it does respect these two components, but in an gradual instead of the "all or nothing" way. The "not-possible-to-think-in"-time stands for the part of activity which shouldn't contribute to the rating. What KGS calls "ultra-blitz", e. g. a 3 s/move-game, mainly consists of that kind of time, hence it would get a very low or no weight at all. On the other hand, the w-difference between, say, 10 s/move (80%) and 20 s/move (90%) is rather small (10%), but there is one, because the 10 s/move is slightly more influenced by not worth to be rated factors. Compared to the huge weighting differences which can occur because of different opponent's rank confidences (which KGS users usually don't know anything about), the w-differences between reasonably timed games are hardly noticable. I would not expect serious players to play slower without thinking deeper just in order to make the result slightly more evaluable for the ranking calculating algorithm.
- Rakshasa: What you seem to miss is that if a player only plays games with the same time then all of those games will have the same weight. If ultra-blitz only counts for 10% of normal games, then he'll just have to play a few more games before he gets a solid rank.
- rubilia: That's exactly what is intended by weights. The influence of non-go-related stuff is higher in faster games, and so is the variability of outcomes. Hence more than one fast game is needed for giving the combined result the same reliability that a single slow game's result provides. -- And again, m could be adjusted to any value, e. g. 10 s, giving even a 10 s/move-game no weight at all. I just want to point out that, wherever the treshold is set to, (the result of) a game which is only a few seconds slower shouldn´t get the same weight like a long-time one.
- Reuven: It'd create another problem - What if one finds long games more tiring than blitz? (rubilia: Do very slow games have a bigger variance than medium speed ones? Can't believe that.) Do you think it'd be appropriate to enter another variable? Checking the time against the avrage time or perhaps against the time settings preformed best at? (Providing it's not blitz?) Should this also be affected bythe time of the day? (Would I be pushing it, suggesting to connect electrodes trying to determine ones state of mind, mood?;)
- rubilia: I suppose you're right, there are more influences to k (and to the "rank diff vs. result oftenness" distribution in general) than opponents' rank confidence and playing speed. I haven't seen any of them ever implemented, though. :)
There's alot of interesting ideas here, but I think that using Ockham's razor is important here. This is something both scientist and programmers should always keep in mind. The current system is good at finding good matches from my experience.
This is a copy of the living page
"KGS Game Result Weighting" at
Sensei's Library.
2004 the Authors, published under the OpenContent License V1.0.
|