Maximum likelihood

    Keywords: Tournament

This page needs wiki master editing.
Comment: Obviously.

Maximum Likelihood tie breaker methods use statistics to work out a tournament rank for players on equal scores to decide who has had the best performance. A simple example is to use your favourite rating program (e.g. from an online go server, or European Go Database) with everyone starting on the same rating.


[ext] Maximum Likelihood Estimation Primer

basic technique We can use GoR to calculate tournament performance rating. Several programs, for example GoREst, already exist implementing the formula to calculate change of rating (deltaR) after a tournament. All that needs to be done is to find the entry rating for which deltaR=0. Relatively trivial to program. It would be nice to see this incorporated in some pairing programs.

I think this is a good idea actually. For example, the consider [ext] results of the 2006 London Open Tournament. One player finished in 3rd as his first round opponent was unable to play in the last 2 days, costing him SOS. However discounting the first round he would have tied for first with Ondrej Silt. Using Tournament Performance GoR would have produced a different result. Something to consider for pairing program makers.

pwaldron?: From the table, the third place player still ends up losing on SOS tiebreaks even if the first and second rounds are discounted, although the gap becomes very small. The ML method has some advantages, although I would not want to be the tournament director trying to explain the statistics of the system to someone on the short end of the system.

After looking at the tournament grid, though, I have two questions:

  1. Was there some kind of seeding in the tournament? The eventual winner played only opponents who finished in the top eleven of the tournament. Naturally this helped his SOS score compared to the second- and third-place players, who got paired much lower down in the field.

It was a European McMahon, with the bar was set at 2dan I think. Although they seem to claim it was 3d (Cornel - Jantunen says otherwise)

  1. It seems that the SOS tiebreak of the third-place finisher was not adjusted for the fact that his first opponent skipped the final two rounds. Shouldn't the SOS score of Ben He (the third-place finisher) be credited with 0.5 points per missed round? This doesn't affect anything this time, but it seems unfair to penalize a player because one of his opponents didn't play all the games.

[ext] GoDraw adds the initial McMahon score for each round missed, which here was 0, I think.

xela: I've been searching for an online article describing exactly how to calculate ML ratings in the context of an Elo-type system, but I haven't found anything useful. Therefore I'm inventing something that sounds plausible to me. I hope that someone more knowledgable can either verify that this is correct, or else post something better.

I'm aware that this is much simpler than [ext] ML as explained at wikipedia, but it should be easier to calculate and still give the same answer.

The scenario: player A has played some games against a set X of players with known ratings, and has scored n points. It is desired to work out a rating for A based on those games. This might be used for tie-break purposes in a tournament, or to assign a provisional rating if A is new to this rating system.


  • If you choose some number h as a "hypothetical rating" for A, then for each player in X you can calculate the expected score in a game against someone with rating h. (The expected scores will be fractions between 0 and 1.) The sum of all those expected scores will be some number, call it e(h), which is the expected total score if a player rated h plays games against everyone in the set X.
  • The function e(h) is a monotone function: that is, if h increases then e(h) increases. Therefore there will be a unique value of h for which e(h)=n.
  • The value of h for which e(h)=n is the number most likely to be the correct rating for player A based on this data, so we call that number the maximum likelihood rating of A.
  • This h value can be found to any desired degree of accuracy by [ext] binary search.

Of course this fails if player A has won all their games (the ML rating will be "infinite") or if they have lost all their games.

xela: Thinking more about the above: based on e(h), it's theoretically possible to calculate a probability distribution function p(h) describing the probability that h is in fact the correct rating. Ideally, what we'd like to know is the h at which p(h) is maximum, as well as the second derivative. The derivative is interesting because it gives a good idea of how accurate the estimation is. Therefore textbook accounts of ML devote a lot of effort to finding out the derivatives of p(h)--a complicated process!

The method above ought to converge to the correct h, but won't tell you anything else about the function p(h), so it doesn't give a lot of information. But it's easier to implement than the more sophisticated algorithms, and will be useful for some practical purposes: for example, if you want to enter someone into an Elo-style rating system but need a way to calculate a provisional rank for them first.

Bass: xela, your method is not what is called "maximum likelihood", but I would guess it produces very similar and most likely more accurate results.

For a simple way to find a maximum likelihood estimation of player X's rank, you split the rank range into a number of "buckets", so that the resolution is good enough for you. For the sake of the exercise, let's assume one bucket for every amateur kyu/dan rank.

Then, put an initial number in every bucket. If you have lots and lots of game data, just put a "1" in every bucket and you should be ok. Otherwise, you might want to count (or estimate) all go players in the population you are measuring (say, [ext] all European players) that would fall in a given bucket, and put this number in that bucket.

Now, player X has played some games against the players in your population, so you'll want to take that into account. So now..

For each game player X has played, do the following:

  • call the opponent "player Y" and the result "R"
  • then, for each of the buckets in turn:
    • find out what is the probability of a a player in this bucket achieving result R against player Y. If you [ext] tabulate (or make a [ext] mathematical model of) the winning probabilities for each bucket vs. each bucket in advance, this step is easy.
    • multiply the number in this bucket by that probability.

And that's it. The numbers in the buckets are called likelihoods, the maximum likelihood is the biggest of these numbers, and the maximum likelihood estimate of player X's strength is the name of the bucket with the biggest number in it.

NB. there are loads of considerations to make when actually implementing this kind of algorithm on real world computers, see links on this page for some of them.

tapir: This page does not contain what is announced on the tie breaker main page. Tournament performance rating is the key here, not maximum likelihood. Or even everyone is talking about a different tie breaker here. For a genuine "Maximum likelihood estimation" my educated guess is the number of games is way too low in all tournaments. The tournament performance rating at which your GoR would not have changed instead presupposes the initial ratings of all other players (otherwise you can't calculate it). That is it is biased in favour of improving players.

Winning against an improving player (whose rating is below his current strength) gives you less "performance" as winning against a stagnating player of the same strength (whose rating is at his strength) does.

Maximum likelihood last edited by tapir on January 19, 2011 - 15:43
RecentChanges · StartingPoints · About
Edit page ·Search · Related · Page info · Latest diff
[Welcome to Sensei's Library!]
Search position
Page history
Latest page diff
Partner sites:
Go Teaching Ladder
Login / Prefs
Sensei's Library