[Welcome to Sensei's Library!]

StartingPoints
ReferenceSection
About


Referenced by
KGSWishlist/GameH...

 

Scoring Estimator Considered Harmful
    Keywords: Theory, Software

I (crux) have been surprised to learn that some people actually take the KGS scoring estimator seriously and use it as a tool for evaluating positions and studying. All I can say is: Don't. It is a very crude algorithm with no understanding of the game. You might as well ask a magic 8-ball, or a random number generator.

People claim that it lets you quickly get an idea of the influence in a position (whatever that may mean). There are, however, numerous problems with the algorithm, among them:

  • it considers all stones equal, disregarding strength or weakness
  • it seems to grant a single stone a 52-point area of influence (a rough "circle" with radius 4), and counts it as territory, which seems like a rather large overestimate of the value of a single stone.
  • the fact that all "influence" is counted as points disregards the viability of invasions in the affected areas.

This leads to grotesque results. Just two examples to demonstrate the basic problem.

[Diagram]
Black to move

As Black, would you prefer a move at a, or at b? Obviously, the answer is b, since you wouldn't want to approach strong stones. The scoring estimator disagrees: B+43.5 for a move at a, and B+30.5 for a move at b. If you were to try to interpret the output, you'd have to come to the conclusion that the program considers b to be overconcentrated, while it believes a to erase the influence of white's shimari. This is nonsense, of course, and goes to show that it's a mistake to assume there is any meaning in the output.


[Diagram]
Black to move - how much is a worth?

This position may be contrived, but consider a move at a. How many points is it worth?

Hopefully everyone will agree that it is worth more than the one point the scoring estimator is willing to grant it (W+11.5 before, W+10.5 after). The algorithm is too stupid to tell that the corner is strengthened by a move at a, (or even to realize that it needs strengthening in the first place).



In case it isn't obvious yet, by assuming that the scoring estimator's output has any meaning, you are very likely to hurt your game. Please don't do it, and please don't encourage beginners to do it.


HolIgor: I thought that the scoring estimators are disabled when you are playing. Anyway, I agree that the algorithm is too crude and unreliable.


Hu: I think the first writer of this page doth protest too much. This began as a minor controversy on the KGS Wishlist page. The writer has completely mis-interpreted the way I use it (due to my lack of full and detailed explanation, not possible in the small space on that page). I was going to respond in full here, but now I can't be bothered when the writer has ignored my statement that "I am aware of the score estimator's limitations, I can see and account for those limitations when I use it, and I do my own counting in games". The writer has erected a straw man and demolished it. The examples provided above are nothing like what I use it for. I use it as a labor saving device to primarily compare fully resolved positions in the endgame when the faults that have been described above are simply not an issue. Yes, I know how to count myself. Similarly I could walk across the city if I had the time, but prefer to bicycle, knowing full well that there will be some aspects of the scenery I will miss that way. Sorry.

dnerra: Hu, I certainly did not assume that you would use the score estimator to e.g. measure the value of b in the first diagram. Nevertheless, I still think you underestimate its problems a little: I have several times seen it calculate a wrong score at the very end of the game, despite guessing all group status wrong [typing error?, do you mean right?]. I don't know where this comes from. Either prisoners are not counted correctly, or maybe some of the marked intersections are only counted as likely and not certain territory -- I don't know.

I agree it would be nice to be able to use it as an automatic counting mechanism. Btw, GNU Go's score estimation should be better, but, apart from maybe Sente Goban on Mac, it hasn't been integrated in a GUI yet as far as I know.

Hu: I was referring to crux, the first writer, not you dnerra. Yes, there are imperfections, even at the end of the game, but for the third time, readers please note, I use it for comparisons, and almost always, the imperfections are subtracted out in the comparisons. If the estimate of the resolution of branch A is B +17.5 (including 2 points of error) and the resolution of branch B is B +23.5 (including 2 points of error) then the difference cancels out the error. Further, because I am using it as a labor saving device during reviews, where speed is useful, I don't mind a few imperfections. Despite what some might feel, I am not stupid enough to use it for counting a difference for something truly important like data for an SL page.

crux: I would have thought that taking the difference doubles the error margin but IANAM. I'm pretty sure they don't cancel out though.

(I'd prefer to continue this discussion by email, since what follows doesn't need to be public, but I haven't found your email address. This page is directed not just at you, but also at beginners who use the program unthinkingly, believing that they get sensible answers.)

Anyhow, I wouldn't have made such an issue if I hadn't seen you use it in early middlegame positions in games I was watching on KGS. Specifically, Hu-gostones, where you used it at moves 13 and 23 in a way not unlike these diagrams. I pointed out at the time that the estimator doesn't understand the position, and you replied that it has a reasonable understanding of influence. I was hoping that by showing these examples I could help you see that it doesn't. In these early positions, the error margin is more likely to be on the order of 30-50 points.

Hu: Well then, crux, I apologize. I hadn't realized I had used it that recently in that way. I guess I must have internalized your lesson. Thank you for pointing it out at the time and for making this page. I apologize for taking exception to it, as your intentions were honorable.

PatrickB: In fact, there's a bigger point here that needs mentioning. As Kageyama says, improvement takes effort piled on effort piled on effort. Every time someone stronger than score est (say, 10k or better) opens the score randomizer, they're missing an opportunity to practice and improve their skills at estimation and counting. I hear lots of 5k players say "I can't count." or "I'm not good at counting." Well, there's one way to get better. Practice. Don't pass up chances to practice - watching other people's games is a *prime* opportunity to practice counting.

Hu, your original post and your statement above said you use score est to compare the relative value of positions. I think crux showed quite convincingly in his examples how score est is really wrong in doing this even in simple positions in which a 10k can determine the relative value of moves. Calling crux's argument a strawman in that case is selling it short.

Score est is useful for players 10-15k and weaker, as that's about how strong score est is. Using it after that point, however, runs the dire risk of misestimating the cost of a weak group, the value of influence, and the value of endgame moves, as well as missing an opportunity to practice counting.

DougRidgway: Another question: what would make the score estimator more useful? More interactive, would be my vote. Let me fix the group statuses (useful for when it gets them wrong, or for quick what-if scenarios) and fudge the boundaries of implied influence and territory. This stuff requires go knowledge, and it's what humans do well. Let the computer calculate the area of multiple complex irregular shapes to a fraction of a percent, that's what computers do well. The combination could be quite powerful, I think.


MarkD: This discussion leads us to the question: How to write an accurate scoring algorithm? Maybe that's worth a new page to discuss it.

dnerra: It's about as difficult as writing a good Go program. You need to have a life-and-death solver to do it well. And if you can accurately score positions, you can also pretty accurately value moves.

Evand: The only difference between a good score estimator and a good go playing program is the ability to propose moves worth evaluating; writing a metamachine that uses a scoring or playing program and can act as the other is fairly trivial. I have one that I may release at some point if / when I get it cleaned up and if there is interest.

Anonymous: There's a pretty standard argument that shows that writing a score estimator is equally difficult to writing a good computer go player. If you had a good computer go player, it could be used to estimate the value of a position, by playing the game to completion from the position we're trying to evaluate. Conversely, if you had a good score evaluator, you could use that to severely limit the branching factor when searching the game tree, enabling a deep search into the game tree and writing a good computer player.

Chess has a relatively straightforward "score estimator", and usually a bad position is rapidly converted into a material advantage.

Evand: Yep, that's about exactly what my program does. The hard part is getting moves other than the best one suggested by the go playing program in some intelligent fashion; that's currently what I'm working on. The hard part about turning a score estimator into a player is only trying reasonable moves if your score estimator is slow and can't suggest moves. Currently I'm using gnugo to play; it's frequently correct that its best move suggested is better than its second-best move; the trick is finding other moves worth trying.



This is a copy of the living page "Scoring Estimator Considered Harmful" at Sensei's Library.
(OC) 2004 the Authors, published under the OpenContent License V1.0.