TieBreaker/Testing Tiebreakers

Sub-page of TieBreaker

Herman Hiddema: Instead of endlessly bickering back and forth over who considers what an advantage, how good or bad tie breakers are, etc, I propose to run a test. Input would be welcome. The method I envision is the following:

I generate a random group of players, say about 100 (but this can be randomized), distributed over the ranks 20k-7d in the same way they are distributed in EGF tournaments (Using the file all.hst from the zipfile http://gemma.ujf.cas.cz/~cieply/GO/hisgor.zip at the EGF rating site, I can measure exactly how often players of a certain grade appear at tournaments)
For each player, I generate a "secret" rating, normally distributed around his nominal rating (EGF style, 20k = 100, 1k = 2000, 1d = 2100, 7d = 2700) with an increasing standard deviation the weaker a player is (again, SD measured from all.hst). This "secret" rating is the players actual playing strength.
For this group of players, I simulate a McMahon tournament of a number of rounds (variable, I can run many test).
In each round, I generate pairings according to Christof Gerlach's "maximum weight perfect matching algorithm" (originally by Edmonds)
For each pairing, I calculate the likelyhood that player A (the weaker player) wins according to the EGF formula
SE(A) = 1 / [e^(D/a) + 1]
Where "The weaker player" is the one with the lower "secret" rating (lower actual strength) and D is the "secret" rating difference. (see http://gemma.ujf.cas.cz/~cieply/GO/gor.html for further details)
For each paring I generate a random number between 0 and 1, check whether it is bigger or smaller than SE(A), and based on this generate a simulated result for the game.
At the end of the tournament, We will have a listing for all players with the ability to calculate MMS, SOS(-x), DC, SODOS, etc.
Using this list, I can compare tiebreakers. I will consider a tiebreaker as better if the resulting sorted list is "closer" to the list that results from sorting it by actual strenght (ie, "secret" rating).

Herman Hiddema: In studying the literature, I have found that Christoph Gerlach, writer of MacMahon - the Program, has already suggested a test such as the one above in his thesis (german, written 1994) in section 2.2.6.4.

RobertJasiek:

Discussion of advantages is difficult but useful because it has not been done to its "end" yet. Regardless, tests are useful, too.
- Hiddema: It seem to me that the current discussion largely revolves around different people having different ideas about what makes a tie breaker "good", or what is an "advantage" in the first place.
To evaluate your final test tournament standings, you should apply previously set aims. E.g., either you would want to evaluate the top few players only or all the participants. For the latter purpose, Jeff Boscole has already done some simulations, evaluated them in a maximum likelihood sense, but not explained that to everybody enough yet. At least you might want to avoid double work. However, I would like to see studies of the aim "top few players only" for different sizes of "few".
- HIddema: After a test run, all of these can be output, we do not have to choose. Do you have any further info on Jeff Boscole's work? Google gave me very little :-)
- RobertJasiek: Search rec.games.go for the keywords to be expected. Otherwise ask himself. If you have too much time, post to rec.games.go and discuss it with him but be prepared to meet 90% philosophical l'art pour l'art.
  - Jeff Boscole: Sorry if you need to wade through more literature. Not finding much myself, I thought to be breaking new ground -- maybe confirmed by the amount of CPU time for simulating 500,000 tournaments on a 1.4 Ghz machine. I applied tests much the same way Mr. Hiddema wants to try with economy calculation tricks on match outcome probability and least-squares for goodness of fit with a secret expectation. One UBASIC program was published on rec.games.go. Pairing methods are also crucial toward reducing tournament entropy, to include loose -avoidance- of adjacency pairing. Of methods tried, best tiebreaker after MMS appears to be a sum: SOS + SODOS .
- Ok, I have read the relevant thread ( part 1 and part 2). Highly interesting research. I didn't encounter much of what I considered "philosophical l'art pour l'art" as Robert calls it. Generally a very thorough bit of research, I thought. The research did not consider all tie breakers (especially not DC), but quite a lot of them nonetheless. It also tested something called 'partial SOS', which seems similar to CUSS.
Why do you want to take an arbitrary probability function (of the EGF rating system) as a presupposition? Better use none! Otherwise, invent and test many such functions, but interpreting that would be very tough. So better start from none. Thereby you do not create prejudiced outcomes.
- Hiddema: What do you mean by using "none"? Do you mean any game has a 50-50 chance of being won by either player? Such a thing would, I think, make the test unrealistic, especially in the top group. Other functions for generating game outcomes could be used, it would be trivial to plug them in. Do you have other suggestions?
- RobertJasiek: Nothing more special.
- Geoff Kaniuk: I have published an article which provides a good model for the probability of win between players of any grade difference. You can find it at http://www.kaniuk.co.uk/articles/egd-analysis/pwin/prob-win.pdf <unarchived dead link>. This model can be used to provide realistic simulations of the kind that Herman is suggesting.

Bass:

This is a biased test: it assumes there is a scalar quantity that describes a player's winning probability, while it may just as well be true that in reality rings exist so that A beats B beats C beats A.

Here we see Bass denigrating the idea that there is a scalar quantity that describes a player's winning probability. Please compare with Bass comments on DirectComparison/Discussion where Bass states: Here's the evidence: the swiss system makes statements about the relative order of players who haven't faced each other and bases these arguments on comparing single scalar number. You cannot make single scalar numbers go into a loop no matter how hard you try. This means chaining is assumed. You should not be ashamed of chaining, everybody assumes it. It is even everything the knockout system is all about. So Bass is able to argue the matter both ways.

Herman Hiddema: Contrary to how it may appear, this is not contradictory. It is true that Swiss/McMahon tournaments and rating systems assume that skill in go is a scalar. What Bass argues in the context of tie breakers is that if you make that assumption, you must stick with it. On the other hand he argues that the assumption that skill in go is a one-dimensional thing may be false to begin with. This is an option, I have never heard of any proof that skill in go is scalar.

RobertJasiek: There are examples though that skill is non-scalar. Relations to thinking time or handicaps are just examples.

Herman Hiddema: Yes, another example, related purely to skill at go: Player A is good at shinogi. He likes to build some territory, then invade his opponents moyo's. Player B has a cosmic style and likes to build large moyo's. B is weak against player A, because A usually succeeds very effectively in destroying his moyo's. Player C is very territorial, he loves playing on the third line and grabbing more and more territory. He is weak against B, because he lets B's moyo's get to big, and is no good at invading them. But he is strong against A, because without being able to use his shinogi skills, A is less effective at grabbing territory, so C usually ends up with more points.

In the context of tournaments, ratings, tiebreakers, etc. There is an explicit assumption that skill is linear. We hold the tournament to get a winner, we use McMahon and tie breakers to rank the players. For this reason, The above experiment is still interesting in that context.

That such a scalar quantity exists is a fundamental assumption of all rating systems. The goal of the rating system is to attempt to approximate that unknown scalar quantity by applying statistical methods to a small set of data. That set of data is the win/loss results of games played. See the "System Description" section and the equations therein on the http://gemma.ujf.cas.cz/~cieply/GO/gor.html page

However, this test should give good results for tiebreakers used in the tournament systems which rely on this assumption themselves, like the !Swiss and McMahon systems. So by all means, go ahead.

Yes, the above is true. Skill in go may well be a multidimensional phenomenon. In the context of McMahon/Swiss and Tiebreakers however, any test becomes pointless if we do not assume that skill is measurable along a single dimension.

This is just an uninformed guess, but might it not be a better test to use results from real tournaments so that the point system in whole is used to guess the results obtained during the tournament? Then, by changing the point system's tie breaker or leaving it out altogether, one would gain information on the relative usefulness of the tie breakers.

This is another, useful test. In this test however, we do not have any information about a player's true skill, whatever that may be. As such, there is nothing to test against.

[edit]

Fun with programming

A randomly generated list of players, with grade, MMS and rating:

1.  Walden, Lawrence      7d      24      2679
2.  Oconnell, Richard     6d      24      2612
3.  Jackson, William      6d      24      2585
4.  Huntington, Wyatt     6d      24      2566
5.  Donaldson, Anthony    5d      24      2527
6.  Mclaren, Clarence     5d      24      2495
7.  Mccormick, Thomas     4d      24      2478
8.  Sallee, William       5d      24      2470
9.  Smith, Tyrone         5d      24      2451
10. Wilson, Darrin        4d      24      2446
11. Miner, Jeff           5d      24      2438
12. Callahan, Ezra        4d      24      2422
13. Harris, William       3d      23      2357
14. Maldonado, Kevin      3d      23      2343
15. Purcell, Shawn        3d      23      2305
16. Francisco, Devin      3d      23      2294
17. Corrigan, Armando     3d      23      2290
18. Reynolds, Frank       3d      23      2277
19. Furman, John          3d      23      2226
20. Gomez, David          2d      22      2289
21. Mccarter, Julio       2d      22      2202
22. Dunaway, James        2d      22      2192
23. Adams, Clarence       2d      22      2186
24. Gibson, Larry         2d      22      2182
25. Whitford, Frankie     1d      21      2177
26. Chamness, James       1d      21      2141
27. Chalmers, Maurice     1d      21      2135
28. Crowley, James        1d      21      2108
29. Thomas, Billy         1d      21      2107
30. Pope, John            1d      21      2078
31. Garcia, Tyrone        1d      21      2054
32. Hayes, Johnathan      1d      21      2026
33. Schmidt, Roger        1d      21      2026
34. Fancher, James        1k      20      2065
35. Schaub, Rodney        1k      20      2058
36. Byrd, Kenneth         1k      20      2053
37. Lam, Derek            1k      20      2043
38. Walker, James         1k      20      2038
39. Warren, Steven        1k      20      2036
40. Ackerman, Samuel      1k      20      2036
41. Mcdaniel, Guillermo   1k      20      1983
42. English, Richard      1k      20      1971
43. Pieper, Mario         1k      20      1955
44. Breen, Joel           1k      20      1933
45. Perez, Todd           2k      19      2061
46. Porter, Ronald        2k      19      2008
47. Sullivan, Joseph      2k      19      1982
48. Lambert, John         2k      19      1961
49. Mills, Michael        2k      19      1931
50. Meraz, James          2k      19      1915
51. Walker, Frederick     2k      19      1908
52. Russell, Ralph        2k      19      1868
53. Harris, Boyd          2k      19      1845
54. Mallette, Gordon      2k      19      1817
55. Rogers, James         3k      18      1908
56. Wood, Al              3k      18      1838
57. Boyce, George         3k      18      1805
58. Wilson, Roberto       3k      18      1752
59. Scott, Richard        3k      18      1674
60. Schultz, James        4k      17      1743
61. Novak, Robert         4k      17      1715
62. Spencer, Larry        4k      17      1679
63. Sommers, Christopher  4k      17      1676
64. Minor, Edward         4k      17      1574
65. Burnett, George       5k      16      1687
66. Wilson, Ronald        5k      16      1643
67. Saucier, Mark         5k      16      1622
68. Chambers, Oliver      5k      16      1569
69. Wilson, Michael       5k      16      1532
70. Jones, Mark           5k      16      1512
71. Brown, Juan           6k      15      1538
72. Coe, Lucas            6k      15      1526
73. Dorsey, John          6k      15      1478
74. Walls, Keith          6k      15      1464
75. Hunter, Derrick       6k      15      1423
76. Messina, Anthony      6k      15      1422
77. Martin, Rex           6k      15      1384
78. Ely, James            7k      14      1427
79. Ahn, Nathan           7k      14      1425
80. Pugh, George          7k      14      1383
81. Massie, Francisco     7k      14      1358
82. Long, Parker          7k      14      1263
83. Rodriguez, Alex       8k      13      1449
84. Ford, Robert          8k      13      1398
85. Smith, William        8k      13      1262
86. Gooch, Joel           8k      13      1242
87. Ahner, Harry          9k      12      1193
88. Shapiro, Ali         10k      11      1169
89. Flores, Rosendo      10k      11      1133
90. Moore, Wade          10k      11       940
91. Poole, Bryan         11k      10       866
92. Sanchez, Michael     12k       9       973
93. Hyatt, Mario         12k       9       835
94. Mora, Jose           13k       8       812
95. Hilbert, Arturo      13k       8       720
96. Morrison, Keith      14k       7       872
97. Howes, Rodney        14k       7       743
98. Madison, Stephen     14k       7       676
99. Richards, Todd       14k       7       585
100.Burns, Louis         15k       6       546b

TieBreaker/Testing Tiebreakers last edited by bugcat on August 22, 2021 - 02:03

RecentChanges · StartingPoints · About

Edit page ·Search · Related · Page info · Latest diff