Akin to WidestPath, this is the move 1 that won the most games, followed by the move 2 that won the most games given move 1, etc. -- Evpsych
To clarify: we take the replies with the best winning percentage? Rather than with the absolutely largest number of wins? That way, a rare reply which was 100% successful will sometimes bring the path to an end. -- Charles Matthews
Bill: We could take as our measure the product of the number of wins and the percentage of wins. (A kind of chi-square statistic.)
WilliamNewman: Yes, perhaps something like that anyway. You could figure out what baseline assumption you'd consider unsurprising -- perhaps that black ought to win 50% of games -- then look for the results which are most surprising given that assumption. Black winning 2 out of 2 games is not very surprising (despite 100% win percentage), 29 out of 30 games is very surprising, 520 out of 1000 games is not very surprising again.
Charles So, what recipe for the 'most significant path in database'?
WilliamNewman: A possible rough and ready engineer's implementation of the approach I described above would be to consider only the variations with a sizeable number of games, so that a Gaussian distribution is a reasonable approximation. (The exact distribution -- no Gaussian approximation -- is actually quite tractable too, and I think it might be called the binomial distribution, but right now I'm not going to try either to do it off the top of my head or to find it.) Then the standard deviation is the square root of the number of samples, and the surprisingness is the number of standard deviations by which the number of wins exceeds the expected number of wins: surprise=(observedwins-expectedwins)/sqrt(ngames). So 29 out of 30 is surprising because the number of wins exceeds the expected number by 14, while the standard deviation is only about 5.5, so it's almost three standard deviations. (30 is small enough, and 29 is a sufficiently extreme result, that it's pretty suboptimal to use the Gaussian approximation here, but it's still less wildly wrong than some of the other ways you could do this.:-) 520 out of 1000 isn't very surprising because the number of extra wins is 20, while the standard deviation is about 32, so it's only about 0.6 standard deviations, and random fluctuations of that size are extremely common. (At one point I did a lot of work on statistical mechanics and Monte Carlo simulations, so I know math for this, but it wasn't done using the traditional terminology for statistical interpretation of experiments, so I don't know the traditional names. For all I know what I've just described might be equivalent to the chi-square statistic mentioned above.)
Bill: How much weight do we want to give to expert opinion? None? If the pros choose a play 3% of the time, despite its excellent winning record, what does that mean? Do we simply ignore that fact?
Here is the score my proposed method gives in the 3 cases mentioned above:
Plainly it gives substantial weight to expert opinion.
Alternatively, we can make pairwise comparisons, as in this 2x2 layout:
P1 P2 T + 520 29 549 - 480 1 481 T 1000 30 1030
We can estimate a t-score with this formula:
t = sqrt(1030) x (|(520)(1) - (480)(29)| - 1030/2) ---------------------------------------------- sqrt((1000)(30)(549)(481))
t = 4.65
Suppose that we take the more frequent play unless t > 2. Plainly we would choose P2 in this situation.
This gives some weight to expert opinion, but much less.
WilliamNewman: It depends on what Charles means by "most significant path", I guess. The formula I gave was intended to address the question "was this path good for black?" Or, "did black achieve a statistically significant advantage on this path?" The formula doesn't really assign any weight to how often pros choose the path, but that might be a feature not a bug. My interpretation of the original remark by Evpsych is that after seeing the WidestPath summaries of how many pros chose each path, he had become curious about (as an independent question) which paths worked best.
Correct. We have several versions of the question now, all of which are interesting. A good move is one for which there was a statistically significant large proportion of wins given the moves leading up to it (after rotation, flipping, and color inversion). I'm interested in ideas for formulas to balance significance and success. -- Evpsych
A simple alternative which might also be interesting is to choose the move that won the most games literally; this will be quite close to WidestPath but with a bias toward success. Also, maybe we can do a few paths for different player strength ranges. (P.S. I like Monte Carlo simulations also. And Common Lisp.) -- Evpsych
Charles My interest, such as it is, would be as an automation of the process by hand of sifting out the interesting from the routine material.
Illich: Maybe we can just take all moves which were played in more than X% (e.g. X=15) cases and choose the one with highest win ratio...
This will follow the highest win percentage, but it will not try some really obscure paths.
JanvanRongen: Intuition tells us that there is always a best move, but the best path in a database (based on winning percentages) does not seem to be so easy to define. I have tried a number of approaches, and none of them seem to work for various reasons.
(i) the odd man out spoils it. Just following the highest winning % doesnt give a satisfactory answer. In my database the first move at 11,4 has three games with a total winning percentage of 100% (three games in the US Masters tournament). OK for the first move we can ignore this, there are 27,000 other games. But after 5 or six moves the set is already so small that it is not statistically very clear what the odd man is.
(ii) the best path doesnt lead to the best position. Period: 1980-2003. After Black opening at 4,4 the opposing komoku at 17,4 seems to be one of the best replies (only B+40,7%). Then B plays another corner and W plays the remaining corner. The value of that sequence is still 50% for white, but if you look at the value of the _position_ it is a lot lower (W+40%). That is because only very strong pro players such as Cho and Sakata dared to play 17,4 (a move that Kajiwara told us is a losing move). But if we look at the value of the position after 4 moves we are looking at a lot more games by other players too.
(Evpsych: Interesting. We can solve it by taking the best move from each position regardless of how the position was arrived at.)
(iii) winning percentage of the move does not seem to be the right measure. Say black in a certain position has possibilities X (60%) and Y (50%) both on 20 games. White has an answer X1 to X that scores 100% in 8 games and an answer X2 that scores 0% in 12 games. To Y there are two answers Y1 and Y2, both 50%. If we had all possible games in our database, Y would be a better move, and X the losing move. Because of this effect, most best path algorithms I tried are a seesaw of B+, W+, B+, W+, etc. So a better measure seems to be the highest value after the best possible opponent's reply (a minmax or maxmin, doesn't that look familiar?). Unfortunately this is too much work with Kombilo and I do not yet have the time to write a program to play with this measure.
Charles Matthews Time to reconsider this. I'd still like to automate some aspect of using a database such as Kombilo. Looking around at some aspects of statistics and information theory, it seems that a significant formula might be of the shape
sum over i of the W(i)log W(i) and L(i)log L(i)
where W(i) and L(i) are the winning and losing proportions of move number i in a given position.
See also best path (of all).