[Welcome to Sensei's Library!]

StartingPoints
ReferenceSection
About


Paths
StatisticalAnalysis

Referenced by
GoBase
SubBoards
AnswerTheCappingP...
3464EnclosureStat...
GuidedTours
ShortExtensions
EasyWayOutOfADoub...
WidestPath
ProbePopularMisco...
BlockedConfiguration
33PointHighApproach
GetStrongAtJoseki...
TheSanrenseiFiles
WhereIsGoGoing
WhitherJoseki
BlockOnTheWiderSi...
34ApproachHighOrLow
34PointLowApproac...
KeimaSlideAndOgei...
NirenseiVNirensei...

 

Database Search
Path: StatisticalAnalysis   · Prev: WinningStatistics   · Next: TengenStatistics
   

Go databases of games seem to become useful at the level of 10000 high-level (pro) games. Fortunately there are now a number of such tools, and players everywhere can begin to derive the benefit.

There is something of a knack of posing good questions to a database equipped with a good search engine, such as Kombilo. One needs in general to choose sub-boards of appropriate size with some given stones. For example, for a joseki, you need to define a corner that is large enough, so that nearby stones already played don't have an impact. To find standard life and death positions, one often needs to use wild-card stones intelligently.

Charles Matthews


Stefan: There's something I don't understand about these database searches, and now seems like the right time to ask the experts. I guess the simplest way to phrase my problem is whether it's statistically sound to draw conclusions like this or not. At first sight the number of games looks big enough to make conclusions relevant, but have we checked whether we needn't adjust for a number of potentially disturbing factors?

Some factors that I can think of:

- A large portion of games involving a certain pattern tend to be generated by a small number of players, as Dave points out above. They don't call it Kobayashi style for nothing. :-) Dave compares the percentage with player X's win/loss rate with the pattern under analysis, but how about X's overall win/loss?

- The result of some of the games depending on some clear other reason.

- One of the players' position collapsing, an obvious pinpointable losing move, etc. Wouldn't you filter these out first? In a population of 200 games, only 4 games with such a calamity could swing the winning percentages from 51/49 to 49/51. Admittedly the percentages of database searches on these pages have often been more outspoken than that, but you see my point.

There are several math buffs hanging around here, so somebody must be able to reassure me.

Charles Well, if I don't like the statistics I see, I do a slightly different search, until I get something better.

Seriously, if it's statistical inference trying to prove something, the sheer number of games you are sampling becomes the dominant factor. Anything based on 500 games is much more serious than anything based on 50. And 50 games in one pattern is a minimum, really. Below that threshold you tend to believe what the strongest players play. It will be good enough anyway for strong amateurs.

The other factor that matters is getting a representative sample of games, across time and across the various schools of pros (Four Houses and Outside the Ki-ins). So a search through the Nihon Ki-in database (35000 games mainly 1965-1995) will be biased in ways that smaller databases with more varied geography avoid.

Anyway, the point is to find stuff that's interesting. I'm often looking for the stuff they don't play (blocked configurations), rather than the main lines; but the latter are now much more accessible.


Dieter: Earlier, I refrained from making a comparable comment. I'm supposed to be a maths buff but my days with the statistics sword go back some time. My question: given a pattern that does not give any particular disadvantage (supposing we know that), what's the probability that it lies within 5 percent above or below the average winning percentage (supposedly 50-50)? If that probability is about 95%, I think we can't disprove one of the patterns under study to give an equal result.

dnerra: Depends on the number of games, of course. Taking n random games, then 95% of the time the winning percentage for White will lie in the interval 50 +- (100 / sqrt[n])%. I.e. for 100 games it's 40-60, for 1000 games still 47-53.

See also



Path: StatisticalAnalysis   · Prev: WinningStatistics   · Next: TengenStatistics
This is a copy of the living page "Database Search" at Sensei's Library.
(OC) 2004 the Authors, published under the OpenContent License V1.0.