Scoring Estimator Considered Harmful

    Keywords: Theory, Software

I (crux) have been surprised to learn that some people actually take the KGS score estimator seriously and use it as a tool for evaluating positions and studying. All I can say is: Don't. It is a very crude algorithm with no understanding of the game. You might as well ask a magic 8-ball, or a random number generator.

People claim that it lets you quickly get an idea of the influence in a position (whatever that may mean). There are, however, numerous problems with the algorithm, among them:

  • it considers all stones equal, disregarding strength or weakness
  • it seems to grant a single stone a 52-point area of influence (a rough "circle" with radius 4), and counts it as territory, which seems like a rather large overestimate of the value of a single stone.
  • the fact that all "influence" is counted as points disregards the viability of invasions in the affected areas.

This leads to grotesque results. Just two examples to demonstrate the basic problem.

[Diagram]

Black to move

As Black, would you prefer a move at a, or at b? Obviously, the answer is b, since you wouldn't want to approach strong stones.[1] The scoring estimator disagrees: B+43.5 for a move at a, and B+30.5 for a move at b. If you were to try to interpret the output, you'd have to come to the conclusion that the program considers b to be overconcentrated, while it believes a to erase the influence of white's shimari. This is nonsense, of course, and goes to show that it's a mistake to assume there is any meaning in the output.[2]

[Diagram]

Black to move - how much is a worth?

This position may be contrived, but consider a move at a. How many points is it worth?

Hopefully everyone will agree that it is worth more than the one point the scoring estimator is willing to grant it (W+11.5 before, W+10.5 after). The algorithm is too stupid to tell that the corner is strengthened by a move at a, (or even to realize that it needs strengthening in the first place).

In case it isn't obvious yet, by assuming that the scoring estimator's output has any meaning, you are very likely to hurt your game. Please don't do it, and please don't encourage beginners to do it.


Hu: I think the first writer of this page doth protest too much. This began as a minor controversy on the KGS Wishlist page. The writer has completely mis-interpreted the way I use it (due to my lack of full and detailed explanation, not possible in the small space on that page). I was going to respond in full here, but now I can't be bothered when the writer has ignored my statement that "I am aware of the score estimator's limitations, I can see and account for those limitations when I use it, and I do my own counting in games". The writer has erected a straw man and demolished it. The examples provided above are nothing like what I use it for. I use it as a labor saving device to primarily compare fully resolved positions in the endgame when the faults that have been described above are simply not an issue. Yes, I know how to count myself. Similarly I could walk across the city if I had the time, but prefer to bicycle, knowing full well that there will be some aspects of the scenery I will miss that way. Sorry.

dnerra: Hu, I certainly did not assume that you would use the score estimator to e.g. measure the value of b in the first diagram. Nevertheless, I still think you underestimate its problems a little: I have several times seen it calculate a wrong score at the very end of the game, despite guessing all group status wrong [typing error?, do you mean right?]. I don't know where this comes from. Either prisoners are not counted correctly, or maybe some of the marked intersections are only counted as likely and not certain territory -- I don't know.

I agree it would be nice to be able to use it as an automatic counting mechanism. Btw, GNU Go's score estimation should be better, but, apart from maybe Sente Goban on Mac, it hasn't been integrated in a GUI yet as far as I know.

Hu: I was referring to crux, the first writer, not you dnerra. Yes, there are imperfections, even at the end of the game, but for the third time, readers please note, I use it for comparisons, and almost always, the imperfections are subtracted out in the comparisons. If the estimate of the resolution of branch A is B +17.5 (including 2 points of error) and the resolution of branch B is B +23.5 (including 2 points of error) then the difference cancels out the error. Further, because I am using it as a labor saving device during reviews, where speed is useful, I don't mind a few imperfections. Despite what some might feel, I am not stupid enough to use it for counting a difference for something truly important like data for an SL page.

crux: I would have thought that taking the difference doubles the error margin but IANAM. I'm pretty sure they don't cancel out though.

(I'd prefer to continue this discussion by email, since what follows doesn't need to be public, but I haven't found your email address. This page is directed not just at you, but also at beginners who use the program unthinkingly, believing that they get sensible answers.)

Anyhow, I wouldn't have made such an issue if I hadn't seen you use it in early middlegame positions in games I was watching on KGS. Specifically, Hu-gostones, where you used it at moves 13 and 23 in a way not unlike these diagrams. I pointed out at the time that the estimator doesn't understand the position, and you replied that it has a reasonable understanding of influence. I was hoping that by showing these examples I could help you see that it doesn't. In these early positions, the error margin is more likely to be on the order of 30-50 points.

Hu: Well then, crux, I apologize. I hadn't realized I had used it that recently in that way. I guess I must have internalized your lesson. Thank you for pointing it out at the time and for making this page. I apologize for taking exception to it, as your intentions were honorable.

PatrickB: In fact, there's a bigger point here that needs mentioning. As Kageyama says, improvement takes effort piled on effort piled on effort. Every time someone stronger than score est (say, 10k or better) opens the score randomizer, they're missing an opportunity to practice and improve their skills at estimation and counting. I hear lots of 5k players say "I can't count." or "I'm not good at counting." Well, there's one way to get better. Practice. Don't pass up chances to practice - watching other people's games is a *prime* opportunity to practice counting.

Hu, your original post and your statement above said you use score est to compare the relative value of positions. I think crux showed quite convincingly in his examples how score est is really wrong in doing this even in simple positions in which a 10k can determine the relative value of moves. Calling crux's argument a strawman in that case is selling it short.

Score est is useful for players 10-15k and weaker, as that's about how strong score est is. Using it after that point, however, runs the dire risk of misestimating the cost of a weak group, the value of influence, and the value of endgame moves, as well as missing an opportunity to practice counting.


[1]

Bill: These stones are not strong enough for that proverb. They are quite approachable.


[2]

Bill: You can compensate to a degree for a stupid influence calculator with a Difference Game approach. To compare two plays, let one player make one play and the other player make the other, and vice versa, and compare results. In this case, for instance, we get this pair of boards.

[Diagram]

Play comparison, Ba - Wb



[Diagram]

Play comparison, Bb - Wa



Which does Black prefer?


This is a copy of the living page "Scoring Estimator Considered Harmful" at Sensei's Library.
(OC) 2014 the Authors, published under the OpenContent License V1.0.
[Welcome to Sensei's Library!]
StartingPoints
ReferenceSection
About