Bass: On point 1, I sort of agree. Changing the page name to "Bass explains SOS" would be a good idea for a number of other reasons too. On point 2: Yes, this is the foundation on which the "SOS in other tournament systems" section is built. Maybe it would not hurt to be explicit about this one.
Happy new year to all rule experts!
I have to admit I always am suspicious if something is presented as obvious. And as this page was referred to as mathematical proof, I like to comment.
The first point is admitted. If MMS measures anything, then better ability is required to reach a certain score against players with better MMS. However, the combination of the earlier propositions to "Sum of opponents' MMSs is an indicator of a better ability to collect MMS." does not strike me as particularly obvious. That is because the ability requirements are to a high degree random (commonly referred to as SOS-lottery) + over-emphasising early games against later games.
Imagine a pre-paired tournament. While obviously the difficulty of your games is to some degree measured by SOS, you had no chance to influence it - that is SOS is rather limited in its expressive value. While in Swiss tournaments a lot of randomness is introduced by the early more or less meaningless rounds (imagine WAGC) in McMahon tournaments the timing of losses is critical. Imagine e.g. a different order of top-player (say top 5) pairingsin the last EGC, with the resulting changes in pairing against the rest of the field (who is going to play the stronger players - with still pretty much guaranteed wins).
Let us calculate a SOL (Sum of lost games round numbers) tie breaker - the highest number wins.
1 Kim Eunkuk 7D KR Seo 8 14+/b0 10+/w0 7+/b0 5-/w0 25+/b0 11+/w0 3+/b0 6+/w0 2+/b0 4-/w0 42 408 1 2 Hwang In-seong 7D KR FaM 8 33+/b0 8+/w0 6+/b0 3-/w0 14+/w0 4+/b0 5+/b0 15+/w0 1-/w0 13+/w0 42 408 0 3 Kim Joon-Sang 7D KR Seo 8 27+/w0 11+/w0 4+/b0 2+/b0 5-/b0 9+/w0 1-/w0 13+/b0 23+/b0 7+/b0 42 407 4 Oh Chi-Min 7D KR Ber 8 24+/b0 23+/b0 3-/w0 17+/w0 12+/w0 2-/w0 26+/b0 11+/w0 10+/b0 1+/b0 42 403 5 Jun Sang-Youn 7D KR Seo 7 52+/w0 9+/b0 13+/b0 1+/b0 3+/w0 6-/b0 2-/w0 7-/w0 20+/w0 17+/b0 41 407 6 Dinerchtein A. 3P RU Kaz 7 19+/w0 17+/b0 2-/w0 24+/b0 10+/b0 5+/w0 25+/w0 1-/b0 7-/w0 16+/w0 41 404 7 Taranu Catalin 5P RO Buc 7 26+/w0 51+/b0 1-/w0 12-/b0 53+/w0 14+/b0 8+/w0 5+/b0 6+/b0 3-/w0 41 402 8 Mero Csaba 6D HU Bud 7 28+/w0 2-/b0 16+/w0 23+/w0 9-/b0 55+/w0 7-/b0 21+/w0 18+/b0 10+/w0 41 400 9 Pop Cristian 7D RO Buc 7 29+/b0 5-/w0 73+/b0 51+/w0 8+/w0 3-/b0 10-/b0 50+/b0 15+/w0 45+/b0 41 396
Kim Eunkuk 2 losses, SOL=14 Hwang In-seong 2 losses, SOL=13 Kim Joon-Sang 2 losses, SOL=12 Oh Chimin 2 losses, SOL=10
Jun Sang-Youn 3 losses, SOL=24 Dinerchtein A. 3 losses, SOL=20 Taranu Catalin 3 losses, SOL=17 Mero Csaba 3 losses, SOL=14 Pop Cristian 3 losses, SOL=15
That is apart from Cristian Pop who got unlucky in pairing (playing against the 73rd and 51st vs. the 16th and 23rd Csaba Mero got the opportunity to play against despite identical MMS) this SOL "tie breaker" pretty much coincides with the SOS results. For one big difference: nobody (I hope) can come up with a bogus interpretation of SOL as a sound scientific measurement of playing strength.
In my personal world of tie breakers, breaking ties would be usually omitted, otherwise ties will be broken by playoff, in case of this being impossible the tie breaker will be historical performance (say rating) or fate (say Nigiri, lottery). As far as I know that is how the professional players do break ties and there is something to be said in favour of this. Not the smallest advantage would be that some strong European players would be freed from the burden of meaningless tie breaking theory discussions.
Kind regards, Tapir.
tapir: Indeed. It should be CUSS = SOL + (n*(n+1)/2 - m*(m+1)/2) with n number of wins, m number of losses. At least as tie breaker it should be identical.
RobertJasiek: I have often looked at SOL without having defined it (Many thanks for the definition and the example!) and in many tournaments noticed the same correlation to SOS for the top players. Your definition gives us an easy way to calculate and compare it now.
Bass: Tapir, I admit point 6 was very terse on the page (that would be point 0 by this page's numberings), it read "Summing is one way to combine the overall difficulty from many games", to which I now added "so the sum of all opponent's scores indicates how difficult it was for the player to collect MMS in the whole tournament."
Now it might be easier to combine points 6 and 7 (or here: 0 and 1) to form the next point.
tapir: These points are conceded (higher SOS indicates more difficult opponents), but the flaw is that the difficulty of opponents is more or less determined by pairing luck/software and timing of losses. The common usage of "SOS-lottery" is a good hint to this. And while in the top group SOS pretty much equates to SOL (and I hope you agree that this is a rather arbitrary tie breaker) in the main field SOS should more or less coincide with using the EGF rating as tie breaker (at least if the tournament software tries hard to produce even pairings).
RobertJasiek: You claim that SOS indicated more difficult opponents. I claim we can only say that on average over many (an infinite number of?) tournaments (of the same type etc.) the probability that SOS indicates more difficult opponents (which needs to be defined) is greater than 0.5. It is easier to claim something different: That a player with the greater SOS has had opponents with greater Number of Wins Scores. (Trivial, by definition.) It is not necessarily the case though that greater SOS implies greater playing strength (or something similar) of the opponents. In fact, some pairing systems like, e.g., adjacency pairing reduce the likelihood of SOS having such meaning.
Bass: Thank you, Robert. You are free to claim whatever you want, and if we lived in a world where your claims were connected with reality, then I would have another reason to make this comment, apart from making it clear that in the comment below I agree with Tapir, not you.
RobertJasiek: For that very reason, I inserted my paragraph while moving it to the right.
Bass: Yes, I agree completely. In a swiss type tournament SOL and CUSS (and whatever you can think of that rewards for winning early) will approximate SOS, because the tournament system gives tougher opposition as a reward for winning. Also, the EGF rating would also correlate, because with a better rating you get a higher initial MMS, which is analogous to winning more games even before round 1.
And yes, I also agree it is a lottery. Out of commonly used tie break lottery methods, it's one of the least unfair, though. At least it punishes players who use the MacMahon elevator. ("Hissi" is the Finnish word for an elevator, you'll probably be able to guess which of the two repeated words means "a win" :-)
tapir: I can guess which one means win, but I don't get the elevator and the fairness point. (If you organize early wins by sandbagging McMahon score already punishes you, doesn't it?)
Bass: Hmm, I think you guessed wrong, then :-) The idea is to lose in the beginning (called stepping into the elevator) and then letting the elevator (easy opponents) lift you up later. This technique is only mentioned jokingly, since luckily no MM tournament gives prizes according to the number of wins only.
tapir: Indeed. It is too late. And I am lacking imagination.
tapir: 2nd try: I still don't get the elevator thing... It is nothing to care about for the top players anyway. (You won't win a tournament after taking the losses to step in the elevator. And in the main field prizes are usually awarded only for most wins.) Furthermore, tie breaking for the main field is absolutely irrelevant and something that should be discouraged. At least for me (I am unfortunately not a top player) having games against stronger opponents by early wins is what counts not the placement (I often don't even know it) or the final result (inevitably something like 2:3 or 3:2). People who reach an 3:2 results after two losses and easy later opponents will be reminded to the limited value of their performance by a look on their rating changes after the tournament, whether they are tied for the 13th place or assigned the 15th place after tie breaking by SOS is so utterly irrelevant I would not even start to bother (with the properties of McMahon they will despite bad performance quite often win the tie breaker against tied players with 4 or 5 wins, so even this punishment is very limited). There is another point, which however isn't very much related to the theory behind sos, that is SOS usually is applied to the whole field. It is not limited to break important ties (never seen a tournament which limited SOS application). If you really mean that playoffs should be encouraged then SOS is a bad choice.
tapir: McMahon scores represent losses in previous Swiss rounds. Whether I achieve these losses by intentionally losing or by sandbagging (starting with a number of default losses) doesn't change the logic. However, this isn't a sound strategy. The default loss certainly does not improve your overall chance to win the tournament. Common practice is again the best guide - how many players are sandbagging (=McMahon elevating) and how many are overrated (thus pointwise getting free wins)? So, it is a crime which doesn't occur and even if it would, it pretty much punishes itself. This shouldn't be the final reason for using SOS.
Bass: Yes, of course. The final reason for using SOS is that it provides more resolution to the main scoring system, provided that the requirements listed on the main page hold. This is, if somebody wants to use scoring system X in their tournament, it stands to reason that they would want to use the same scoring system (with a more fine-grained resolution) for tie breaking too, and that is exactly what SOS does. (..provided that the requirements listed on the main page hold, naturally.)
tapir: You said above, that it is basically a lottery method, but less unfair because it punishes the McMahon elevator... When the need to punish the McMahon elevator isn't relevant anymore. Why not make a playoff where possible, use a genuine lottery (easily applied through nigiri or coin tossing) where necessary and leave the other ties tied? E.g. in the EGC I would have preferred to see playoffs for both the Open Champion and the European Champion instead of such odd tie-breaking orgies - and in a 2 week tournament there should be time for playoffs, play 9 rounds next time and reserve a day for playoffs (+ side events)!
tapir: If I use a measuring method beyond it's applicability than I surely can apply the same procedure as everytime to arrive at my results, but the result is basically random. It is just like say a common electronic balance in a laboratory where nobody would use the last digit as "tie breaker" - surely it is a balance, weighing objects, and you can read it the same way, still the result of using the last digit to distinguish otherwise identical weights is random.
What is questioned here is not the accuracy of SOS, but it's inherent detection limit. While there is little doubt that large enough SOS differences indicates something (stronger opponents) it is not quite clear whether a single SOS point is measuring anything. However, I have never seen (ok, I am not interested in tie breakers :) anyone trying to actually establish these limits. By blank measurements (letting Gnugo1-5 + Gnugo6-10 on another setting play a tournament against each other on identical software, or maybe just by taking random results), calculating deviations and thereby setting up a reasonable limit of detection for SOS. My informed guess is, that any such detection limit would be above a single point SOS.
Bass: Yes, again I agree. Your point is more or less addressed on the main page under the heading "resolution concerns". About finding the limit, I am not sure if it is possible to find such a limit by any experimental means, it is very hard to distinguish random noise from a very weak signal, so you would probably end up measuring your test setup. You would also have to define what you are trying to measure, which kind of hard. For example, there is a marked difference between the overall strength of a player and the strength that a player demonstrates in any given tournament.
Also, I am not sure if finding a theoretical signal limit would be very useful anyways: it's completely up to the tournament organizer to set the desired confidence levels for their tournaments. A former system for the Finnish Championship required playoffs unless there was a margin of at least two wins between the first and second places.
tapir: Indeed, we agree on a lot of issues here. Yes, any SOS limit > 1 would render SOS rather useless + confusing as a tie breaker. But blank measurements are a common way to arrive (via standard deviation) at detection limits. By saying we try to measure "performance in this tournament only" - we can just claim Gnugo1 just performed better than Gnugo2 on identical hardware and settings - the notion of dividing randomness from performance is avoided. However, we can look at "performance in a tournament" as a random sample of overall performance = strength. Incidentally, we have a better (with its own limits) indicator of strength already tie breaking by rating (previous performance). It is just unbelievable that you still defend SOS :)
Herman: Someone's rating can easily fluctuate by 50-100 points despite no changes in playing strength. If player A has 1 rating point more than player B, is he really the stronger player? I think SOS is much better than prior rating, because it measures actual tournament performance, rather than performance in unrelated, possibly biased, other events.
tapir: Rating is just one kind of previous performance. Previous performance can be last years results as well, it is done like that in the professional leagues. There is much to be said about such a tie breaker. It is easy to understand, used by professionals and encourages regular participation in tournaments. (Rating would be just a shortcut for probably tournament participation is too irregular in Europe.) Anyway my proposal is to allow for ties where possible, playoff if necessary, and only in case this is impossible to use genuine lottery instead of SOS-lottery, or previous performance of any kind instead of direct comparison. All these are easily understood and don't need theoretical foundation. (As obvious by now, i don't mind arbitrary = random tie breaking. It just should not be basically random but presented as measuring something. Of course say the last two digits of the EGF rating are quite meaningless as well, as is the single SOS point.)
Bass: Random tie breaking is not automatically fair. If a player has, say 4 SOS points more than some of the other players, then I think we would agree that he has had a significantly tougher opposition, and it would be unfair to give the win to the other guy only because he won a coin toss.
tapir: There is no claim about fairness. It is just easy and completely arbitrary. While SOS is kind of arbitrary as well, but claiming it is something else. The difference in the EGC between tied 1st and 4th place was 5 points afair. I am not sure that even these five points have any predictive value for a playoff. (The 4th actually won against the 1st :) A four player play-off with two ko-rounds would have been the best solution, imho. The same for the four tied Europeans. (Maybe one can make a tournament system out of it, instead of a separate event with losing players dropping out to the main event, a "ko"-playoff of the best 4 europeans after round 8 in the last second rounds which is part of the main event, just these pairing will be fixed. This decides no clear 3rd and 4th places but a clear champion with say less SOS lottery. Please improve on this.)
Bass: We (Matti and I) did improve on that, and proposed a tournament system for the EGC that not only did find the European Champion without tie breakers, it also took into the account the fact that we don't want to bore the strong Koreans who certainly did not come to Europe just to play other Koreans. Point 15c at http://www.eurogofed.org/egf/agm2008.pdf does not contain the proposal itself, but it was basically a couple rounds of MacMahon followed by a double knockout, with extra care taken to make sure the pairings in the MacMahon part weren't unfair, and also taking into the account the factor that a place in the top 10 list would get the players some government support in some countries, so it also produced a top 10 list. The proposal got shot down at the AGM, IIRC because the strong players said they liked SOS-lottery (and even SOSOS-lottery) better.
tapir: Feel free to add the proposal somewhere on SL ;) No, I am interested. (Was the double knockout integrated in the main event or separate?)
isd: The proposal is on the EGF website still. At the AGM it was indeed shot down, the strong players argued that the initial SOS lottery might not correctly select the right players into the knockout and two other separate arguments - I think that it was a half measure and that playing around with the money was better.
Bass: It is completely alright to choose overall strength as the thing you want to measure. Then you would probably make a mistake in choosing the MacMahon system though, because an AccelRat-like system is more suited to measuring the overall strength. (disclaimer: I have not actually ever used AccelRat myself, but this is my impression of its design goals.) I have to defend SOS, since the MLES tie breaker is not implemented in any tournament software as far as I know. Lacking MLES (Maximum Likelihood Estimate of Strength based on the tournament results only), SOS is the least bad tie breaker, and therefore very much worth defending.
RobertJasiek: I guess you are talking about tiebreakers used for the purpose final placement order after the last round. Furthermore I assume that even you consider playing more rounds to be a better tiebreaker than SOS. - It is your opinion that SOS is the least bad tiebreaker. It is my opinion that direct or indirect comparison is the least bad tiebreaker. So far so (un)clear. - One thing you might explain nevertheless: Why, in your opinion, is SOS (used for the purpose final placement order after the last round, used in Swiss or McMahon, and for the moment not specifying particular values for other factors like round number or player number) a better tiebreaker than any of SOS-1, SOS-2 or SOS-R1? E.g., practical experience with the 10 rounds and many players EGC main tournament suggests that SOS-2 (for the determination of the very top places) is better for that tournament than SOS: Top players tend to have had zero, one or two but rarerly more rounds with low SOS opponents and the rounds may be other than the first rounds. SOS-2 removes noise from those top players affected by it so that comparing these players with those other top players not affected by so much noise becomes fairer.
Bass: I like SOS better than the mentioned variants for a very simple reason: it is implemented in the software that tournament organizers use. If someone were to implement "modified median SOS", I would recommend it instead. Since I am not sure whether it has been suggested elsewhere, the modified median is calculated by dropping the smallest-score-among-defeated-opponents and the biggest-score-among-other-opponents from SOS calculations. These two scores will be the most likely to contain pairing noise. So in order to cancel the noise, you have to actually win your easy game. Also, if you win your game against the difficult opponent, then that SOS will not get canceled. (in a longer tournament, you can apply the median filter to more than one pair of opponents scores, as long as the number of counted opponents stays greater than the number of ignored ones)
tapir: Has anyone actually calculated MLES-scores from tournament results? Maybe using the default wins/losses of the McMahon system to increase the "sample".
Bass: Adding to my earlier comment to tapir: Some years ago there was a discussion about this very matter (what do we want our tournaments to measure) on the #go.fi irc channel. The conclusion was, that if we want to measure overall strength, then after the tournament we should not reward the player with the best score from that tournament, but instead we should calculate the new GoR for all players and give the first price to the player with the highest rating. This was declared unacceptable for obvious reasons, so the conclusion was that we want to measure tournament success, never mind the overall strength.
RobertJasiek: accuracy and detection limit are related aspects of statistical significance. Before one can identify some detection limit, one first needs to define what one wants to measure. SOS reflects impacts from several circumstances. I have measured one of them: "A Player Cannot Influence His Opponents' Later SOS Changes". See Quality of SOS.
Bass: to Robert: Come on now. After their game is played, then out of the opponent's MMS, SOS and SOSOS, the only one that a player can always directly influence is the SOS, which increases by exactly one point every time the player wins a game.
tapir: Though obviously, by lowering the opponents SOS by losing there will be no tie to break anymore.
Bass: Apart from winning or losing one's games, what other means of affecting any scores does a player have in a tournament? What I am trying to say is that Robert's research is again looking like he is trying to invent "proof" for his opinions he is not willing to change anyway, so his "measurements" should be taken with an uyuniful of salt.
RobertJasiek: Unlike my rules of play research and my Go term definition research, only rather little of my statements on tournament system theory (incl. tiebreakers) is "research". Some more is preliminary studies of what might become research by somebody later. Most is opinion though (although it may be an educated rather than a purely emotional opinion). I.e., you do not need to paint all my statements as research. Every person with quite some education on the topic tournament system and tiebreakers is roughly playing with equally valued cards. Since the overall amount of available real research is rather small, every on-topic freak still has a good chance to understand everything. - Likewise your strong opinions also have their salt in them:)
taken from Lifein19x19
Last WAGC Bertan Bilen 2d lost to John Gibson 2k by forfeit = arriving too late, both are far from contending for the championship still this has an effect at the top of the table, it retrospectively granted the Taiwanese player who played Gibson in the 2nd round an additional SOS-point changing his placement from shared 3-way 7th to shared 2-way 6th. And this is just one example I remembered from the last WAGC. (Caveat: Of course this was the third round and effected the pairing later on, so it is not sure at all, that John Gibson would have ended with only one win with different pairing, however similar effects can happen and do happen in the last round.)