TieBreaker/Testing Tiebreakers

Sub-page of TieBreaker

Herman Hiddema: Instead of endlessly bickering back and forth over who considers what an advantage, how good or bad tie breakers are, etc, I propose to run a test. Input would be welcome. The method I envision is the following:

  1. I generate a random group of players, say about 100 (but this can be randomized), distributed over the ranks 20k-7d in the same way they are distributed in EGF tournaments (Using the file all.hst from the zipfile [ext] http://gemma.ujf.cas.cz/~cieply/GO/hisgor.zip at the EGF rating site, I can measure exactly how often players of a certain grade appear at tournaments)
  2. For each player, I generate a "secret" rating, normally distributed around his nominal rating (EGF style, 20k = 100, 1k = 2000, 1d = 2100, 7d = 2700) with an increasing standard deviation the weaker a player is (again, SD measured from all.hst). This "secret" rating is the players actual playing strength.
  3. For this group of players, I simulate a McMahon tournament of a number of rounds (variable, I can run many test).
  4. In each round, I generate pairings according to Christof Gerlach's "maximum weight perfect matching algorithm" (originally by Edmonds)
  5. For each pairing, I calculate the likelyhood that player A (the weaker player) wins according to the EGF formula
    SE(A) = 1 / [e^(D/a) + 1]
    Where "The weaker player" is the one with the lower "secret" rating (lower actual strength) and D is the "secret" rating difference. (see [ext] http://gemma.ujf.cas.cz/~cieply/GO/gor.html for further details)
  6. For each paring I generate a random number between 0 and 1, check whether it is bigger or smaller than SE(A), and based on this generate a simulated result for the game.
  7. At the end of the tournament, We will have a listing for all players with the ability to calculate MMS, SOS(-x), DC, SODOS, etc.
  8. Using this list, I can compare tiebreakers. I will consider a tiebreaker as better if the resulting sorted list is "closer" to the list that results from sorting it by actual strenght (ie, "secret" rating).

RobertJasiek:

  • Discussion of advantages is difficult but useful because it has not been done to its "end" yet. Regardless, tests are useful, too.
    • Hiddema: It seem to me that the current discussion largely revolves around different people having different ideas about what makes a tie breaker "good", or what is an "advantage" in the first place.
  • To evaluate your final test tournament standings, you should apply previously set aims. E.g., either you would want to evaluate the top few players only or all the participants. For the latter purpose, Jeff Boscole has already done some simulations, evaluated them in a maximum likelihood sense, but not explained that to everybody enough yet. At least you might want to avoid double work. However, I would like to see studies of the aim "top few players only" for different sizes of "few".
    • HIddema: After a test run, all of these can be output, we do not have to choose. Do you have any further info on Jeff Boscole's work? Google gave me very little :-)
    • RobertJasiek: Search rec.games.go for the keywords to be expected. Otherwise ask himself. If you have too much time, post to rec.games.go and discuss it with him but be prepared to meet 90% philosophical l'art pour l'art.
      • Jeff Boscole: Sorry if you need to wade through more literature. Not finding much myself, I thought to be breaking new ground -- maybe confirmed by the amount of CPU time for simulating 500,000 tournaments on a 1.4 Ghz machine. I applied tests much the same way Mr. Hiddema wants to try with economy calculation tricks on match outcome probability and least-squares for goodness of fit with a secret expectation. One UBASIC program was published on rec.games.go. Pairing methods are also crucial toward reducing tournament entropy, to include loose -avoidance- of adjacency pairing. Of methods tried, best tiebreaker after MMS appears to be a sum: SOS + SODOS .
    • Ok, I have read the relevant thread ([ext] part 1 and [ext] part 2). Highly interesting research. I didn't encounter much of what I considered "philosophical l'art pour l'art" as Robert calls it. Generally a very thorough bit of research, I thought. The research did not consider all tie breakers (especially not DC), but quite a lot of them nonetheless. It also tested something called 'partial SOS', which seems similar to CUSS.
  • Why do you want to take an arbitrary probability function (of the EGF rating system) as a presupposition? Better use none! Otherwise, invent and test many such functions, but interpreting that would be very tough. So better start from none. Thereby you do not create prejudiced outcomes.
    • Hiddema: What do you mean by using "none"? Do you mean any game has a 50-50 chance of being won by either player? Such a thing would, I think, make the test unrealistic, especially in the top group. Other functions for generating game outcomes could be used, it would be trivial to plug them in. Do you have other suggestions?
    • RobertJasiek: Nothing more special.
    • Geoff Kaniuk: I have published an article which provides a good model for the probability of win between players of any grade difference. You can find it at [ext] http://www.kaniuk.co.uk/articles/egd-analysis/pwin/prob-win.pdf. This model can be used to provide realistic simulations of the kind that Herman is suggesting.

Bass:

This is a biased test: it assumes there is a scalar quantity that describes a player's winning probability, while it may just as well be true that in reality rings exist so that A beats B beats C beats A.

Here we see Bass denigrating the idea that there is a scalar quantity that describes a player's winning probability. Please compare with Bass comments on DirectComparison/Discussion where Bass states: Here's the evidence: the swiss system makes statements about the relative order of players who haven't faced each other and bases these arguments on comparing single scalar number. You cannot make single scalar numbers go into a loop no matter how hard you try. This means chaining is assumed. You should not be ashamed of chaining, everybody assumes it. It is even everything the knockout system is all about. So Bass is able to argue the matter both ways.

Herman Hiddema: Contrary to how it may appear, this is not contradictory. It is true that Swiss/McMahon tournaments and rating systems assume that skill in go is a scalar. What Bass argues in the context of tie breakers is that if you make that assumption, you must stick with it. On the other hand he argues that the assumption that skill in go is a one-dimensional thing may be false to begin with. This is an option, I have never heard of any proof that skill in go is scalar.

RobertJasiek: There are examples though that skill is non-scalar. Relations to thinking time or handicaps are just examples.

Herman Hiddema: Yes, another example, related purely to skill at go: Player A is good at shinogi. He likes to build some territory, then invade his opponents moyo's. Player B has a cosmic style and likes to build large moyo's. B is weak against player A, because A usually succeeds very effectively in destroying his moyo's. Player C is very territorial, he loves playing on the third line and grabbing more and more territory. He is weak against B, because he lets B's moyo's get to big, and is no good at invading them. But he is strong against A, because without being able to use his shinogi skills, A is less effective at grabbing territory, so C usually ends up with more points.

In the context of tournaments, ratings, tiebreakers, etc. There is an explicit assumption that skill is linear. We hold the tournament to get a winner, we use McMahon and tie breakers to rank the players. For this reason, The above experiment is still interesting in that context.

That such a scalar quantity exists is a fundamental assumption of all rating systems. The goal of the rating system is to attempt to approximate that unknown scalar quantity by applying statistical methods to a small set of data. That set of data is the win/loss results of games played. See the "System Description" section and the equations therein on the [ext] http://gemma.ujf.cas.cz/~cieply/GO/gor.html page

However, this test should give good results for tiebreakers used in the tournament systems which rely on this assumption themselves, like the !Swiss and McMahon systems. So by all means, go ahead.

Yes, the above is true. Skill in go may well be a multidimensional phenomenon. In the context of McMahon/Swiss and Tiebreakers however, any test becomes pointless if we do not assume that skill is measurable along a single dimension.

This is just an uninformed guess, but might it not be a better test to use results from real tournaments so that the point system in whole is used to guess the results obtained during the tournament? Then, by changing the point system's tie breaker or leaving it out altogether, one would gain information on the relative usefulness of the tie breakers.

This is another, useful test. In this test however, we do not have any information about a player's true skill, whatever that may be. As such, there is nothing to test against.


Fun with programming

A randomly generated list of players, with grade, MMS and rating:

 1.  Walden, Lawrence      7d      24      2679
 2.  Oconnell, Richard     6d      24      2612
 3.  Jackson, William      6d      24      2585
 4.  Huntington, Wyatt     6d      24      2566
 5.  Donaldson, Anthony    5d      24      2527
 6.  Mclaren, Clarence     5d      24      2495
 7.  Mccormick, Thomas     4d      24      2478
 8.  Sallee, William       5d      24      2470
 9.  Smith, Tyrone         5d      24      2451
 10. Wilson, Darrin        4d      24      2446
 11. Miner, Jeff           5d      24      2438
 12. Callahan, Ezra        4d      24      2422
 13. Harris, William       3d      23      2357
 14. Maldonado, Kevin      3d      23      2343
 15. Purcell, Shawn        3d      23      2305
 16. Francisco, Devin      3d      23      2294
 17. Corrigan, Armando     3d      23      2290
 18. Reynolds, Frank       3d      23      2277
 19. Furman, John          3d      23      2226
 20. Gomez, David          2d      22      2289
 21. Mccarter, Julio       2d      22      2202
 22. Dunaway, James        2d      22      2192
 23. Adams, Clarence       2d      22      2186
 24. Gibson, Larry         2d      22      2182
 25. Whitford, Frankie     1d      21      2177
 26. Chamness, James       1d      21      2141
 27. Chalmers, Maurice     1d      21      2135
 28. Crowley, James        1d      21      2108
 29. Thomas, Billy         1d      21      2107
 30. Pope, John            1d      21      2078
 31. Garcia, Tyrone        1d      21      2054
 32. Hayes, Johnathan      1d      21      2026
 33. Schmidt, Roger        1d      21      2026
 34. Fancher, James        1k      20      2065
 35. Schaub, Rodney        1k      20      2058
 36. Byrd, Kenneth         1k      20      2053
 37. Lam, Derek            1k      20      2043
 38. Walker, James         1k      20      2038
 39. Warren, Steven        1k      20      2036
 40. Ackerman, Samuel      1k      20      2036
 41. Mcdaniel, Guillermo   1k      20      1983
 42. English, Richard      1k      20      1971
 43. Pieper, Mario         1k      20      1955
 44. Breen, Joel           1k      20      1933
 45. Perez, Todd           2k      19      2061
 46. Porter, Ronald        2k      19      2008
 47. Sullivan, Joseph      2k      19      1982
 48. Lambert, John         2k      19      1961
 49. Mills, Michael        2k      19      1931
 50. Meraz, James          2k      19      1915
 51. Walker, Frederick     2k      19      1908
 52. Russell, Ralph        2k      19      1868
 53. Harris, Boyd          2k      19      1845
 54. Mallette, Gordon      2k      19      1817
 55. Rogers, James         3k      18      1908
 56. Wood, Al              3k      18      1838
 57. Boyce, George         3k      18      1805
 58. Wilson, Roberto       3k      18      1752
 59. Scott, Richard        3k      18      1674
 60. Schultz, James        4k      17      1743
 61. Novak, Robert         4k      17      1715
 62. Spencer, Larry        4k      17      1679
 63. Sommers, Christopher  4k      17      1676
 64. Minor, Edward         4k      17      1574
 65. Burnett, George       5k      16      1687
 66. Wilson, Ronald        5k      16      1643
 67. Saucier, Mark         5k      16      1622
 68. Chambers, Oliver      5k      16      1569
 69. Wilson, Michael       5k      16      1532
 70. Jones, Mark           5k      16      1512
 71. Brown, Juan           6k      15      1538
 72. Coe, Lucas            6k      15      1526
 73. Dorsey, John          6k      15      1478
 74. Walls, Keith          6k      15      1464
 75. Hunter, Derrick       6k      15      1423
 76. Messina, Anthony      6k      15      1422
 77. Martin, Rex           6k      15      1384
 78. Ely, James            7k      14      1427
 79. Ahn, Nathan           7k      14      1425
 80. Pugh, George          7k      14      1383
 81. Massie, Francisco     7k      14      1358
 82. Long, Parker          7k      14      1263
 83. Rodriguez, Alex       8k      13      1449
 84. Ford, Robert          8k      13      1398
 85. Smith, William        8k      13      1262
 86. Gooch, Joel           8k      13      1242
 87. Ahner, Harry          9k      12      1193
 88. Shapiro, Ali         10k      11      1169
 89. Flores, Rosendo      10k      11      1133
 90. Moore, Wade          10k      11       940
 91. Poole, Bryan         11k      10       866
 92. Sanchez, Michael     12k       9       973
 93. Hyatt, Mario         12k       9       835
 94. Mora, Jose           13k       8       812
 95. Hilbert, Arturo      13k       8       720
 96. Morrison, Keith      14k       7       872
 97. Howes, Rodney        14k       7       743
 98. Madison, Stephen     14k       7       676
 99. Richards, Todd       14k       7       585
 100.Burns, Louis         15k       6       546b

TieBreaker/Testing Tiebreakers last edited by 82.21.107.54 on April 29, 2011 - 12:50
RecentChanges · StartingPoints · About
Edit page ·Search · Related · Page info · Latest diff
[Welcome to Sensei's Library!]
RecentChanges
StartingPoints
About
RandomPage
Search position
Page history
Latest page diff
Partner sites:
Go Teaching Ladder
Goproblems.com
Login / Prefs
Tools
Sensei's Library