Tom/Go musings

Sub-page of Tom

I don't think I'm up to a blog, but I have a few ideas about go that I'd like some feedback on. When I have time and inclination, I'll post them on this page. Please feel free to add corrections,criticisms, comments and questions, and anything else you feel like (within reason) Tom.

Musing 1: Information theory and go

Information theory, largely due to Shannon [ext] http://en.wikipedia.org/wiki/Information_theory, shows how to define the information rate of an information source (which is measured in bits per second). A source here is something that randomly produces symbols from a finite alphabet according to some probability distribution.

For example, we can consider a coin tossed once per second as an information source. As you might guess, the information rate of this source is 1 bit per second. According to Shannon's theory, a 1 bit per second modem could communicate the results of these coin tosses, but only just. There is no possible scheme of encoding the results of the coin tosses that will enable them to be communicated through a 0.99 bit per second modem.

For coin tosses this is intuitive I think, but Shannon studies more complicated sources. For example, a source that produces a single letter of English language every second. What is the information rate of that? Suppose that the source's alphabet contains only 27 letters: our usual 26 together with 'space'. Each symbol this source produces requires log(27) bits to store it (all logs here are base 2), so the information rate of the source is at most log(27) bits per second (about 4.75 bits per second).

However, this estimate does not exploit any of the structure of English, and actually the rate of our source is lower. If we use some of this structure, for instance the probabilities of the various pairs of letters (a grid of 27*27 probabilities), then our source can be encoded more efficiently showing that its rate is somewhat less. As we use more sophisticated descriptions of our source, we can find more efficient ways to encode it.

If I recall correctly, Shannon estimates that the information rate of this source is about half of 4.75 bits per second. To do this, he actually defines a set of better and better models of this source, starting with the two above, and calculates their information rates. He also displays text produced by these models. As the model becomes better, the text appears more and more like english.

This theory reminds me of learning go. A perfect (but not machiavellian) go player has a certain amount of choice in his moves. He is only constrained not to lose any points. At the beginning of the game, the set of points he can choose is symmetrical under rotations and reflections of the board. I don't know how many choices he has, but he almost certainly has more than one, and if he has only one, then his perfect opponent has more than one response. Towards the end of the game, we can analyse perfectly, and we know that there are frequently many different plays a perfect player could choose.

We consider a perfect go player who chooses arbitrarily when he has a choice is an information source. The degree of freedom of our english source is given by its information rate. Similarly, the degree of freedom of our perfect go player is given by her information rate measured in bits per game say. To make the concept precise, lets agree that the information rate of any information source of go moves is measured in terms of its play against such a perfect player.

The successive model's of an English source seem to me analogous to a sequence of go players of various levels of skill. The source that chooses a letter at random corresponds to a very simple go playing algorithm that chooses a legal move at random. An order-of-magnitude estimate of the information rate of this source can be had by approximating the number of moves in a game as 2 X 100 and supposing that there are 200 choices at each move. This gives an estimate of 100 log(200), or about 750 bits per game.

In order for a source to be a stronger go player, its play must be more constrained, and therefore its data rate must be lower. Actual players have a lower data rate than they need to have to attain their strength because of factors like psychological hang-ups and style. However, for any given level of strength (requiring a komi of x points from the perfect player to achieve 50%), there is a source with the highest possible data rate (i.e., the greatest possible freedom of choice). The perfect player and the random player described above are examples of sources attaining this maximum. All other sources attaining this maximum have data rates between these two.

In this way, we can get a correspondence between playing strength and data rate. A higher playing strength will correspond to lower data rate. We could plot a graph with playing strength (or rather weakness) in terms of the komi-required measure above, on the x-axis and data rate in bits per game on the y-axis. The graph would slope downwards to the right.

I think the correct komi for the random player would be a bit less than 361. Maybe 320. This estimate is based on the players taking turns, and supposes that all the random player's stones get captured repeatedly until there is nowhere left to play.

With no good reason, I'm going to conjecture that the perfect player has on average (geometrical mean, naturally) 2 possible choices at each move, giving her a data rate of 100 bits per game.

I don't have much feel for the shape of the graph. I keep changing my mind as to whether it is likely to have a convex or a concave shape.

ilan: About 18 years ago, I had some thoughts about information theory and computer game playing. I defined the entropy of a position as follows: One considers a special type of evaluation function which assigns a probability F(M) to each move M, corresponding to how good the position will be compared to the positions after other possible moves. For example, one can do this based on an evaluation function G(P) for positions P, normalised to takes non negative values, and define for a move M

F(M) = G(M)/Sum (all moves N from P) G(N)

The entropy is then defined to be

E(P) = - Sum (all moves N from P) F(N) Log F(N).

Basically, low entropy means that there is only one good move, and high entropy means that there is a choice among many equivalent moves.

When making a move, one can consider the change in entropy to be the amount of information you are giving to your opponent. For example, a change from high entropy to low entropy means that in a situation with many possible moves, you are forcing your opponent to make one move, i.e., telling him where to play (information!).

My original idea was that at the time, a good way to play against a computer was to try to play for strategical positions where there was no clear move, and let the computer self-destruct. I thought that using the entropy concept could be a way to teach computers this anti-computer strategy. The maximum entropy method is also good when playing a weaker player, that is, you try to confuse him maximally instead of dictating his moves.

Charles Yes, that is a useful concept for handicap go.

Tom/Go musings last edited by jantiff on December 4, 2004 - 17:47

RecentChanges · StartingPoints · About

Edit page ·Search · Related · Page info · Latest diff