Chrisd/ComputerGo

Sub-page of Chrisd

First attempt

The first attempt I had at go programming was about a year ago (at the time of writing this it is Jan 2006). I tried to use a neural net and evolutionary algorithm to adjust the parameters of the neural net.

Evalution function

The idea is that the neural net should yield as the output the score of a position if it would be played perfectly. I.e., it would be an evaluation function. If the computer player is to move, all possible moves are tried and the one that gives the opponent the worst (lowest score under perfect play) score would be selected. The interpretation of the output as the score that could be obtained by perfect play yields the consistency condition that the score of a position should be equal to the minimum of the scores of all the follow-up positions. Learning would occur because the difference of these two numbers would measured and the network with the best understanding over the course of the game, would be chosen as the best player.

Don't fill your own eyes

I discovered that for such an approach it is mandatory that the games played in course of the evolution of the neural net should make some sense. Games that do not make sense are, for instance, ones where a player is filling his own eyes. This happens if you don't do anything about it. Hence I wrote an algorithm to recognise groups that had eyes and because of the eyes could never be captured. It was close to Bensons algorithm altough a bit weaker. This is because I only looked at eyes that consist of one single point. Play in such eyes was forbidden. It should be noted that sometimes, because of ko, a play in your own eyes is perfect play. This, however, is very, very rare so I decided to ignore this possibility.

Failure

The net now discovered on its own that on a 9x9 board starting near the middle of the board is quite good. However, the problem was that even after running for a long time there did not seem to be much improvement in the networks decision when to pass. Hence the games often ended at ridiculous moments. The resulting game ends at the wrong time would of course also give a strange final score. Strange final scores are a bad basis for the evolutionary algorithm and hence I concluded that this attempt was not going to result in a reasonably-playing program anytime soon.

Another Attempt

When to pass?

Recently I looked again at my program and decided try to tackle the problem when to pass. Using Chinese rules is the most convenient if you are trying to start with a go playing program. Under these rules it does not cost points to play in one's own territory if there are no valuable moves left anymore. Hence one might as well continue playing until the entire board is filled with regions that are unconditionallly owned by one player in the sense of Bensons algorithm. I will refer to such regions as Benson-regions. The problem here is seki.

Seki

I define a seki as the intersections of the board that are not Benson regions and where (for both players) any other move than passing would cost points. Note that there is a page here on SL that attempts to define seki otherwise. It must also be noted that all the "definitions" on that page are more then a bit naive. If their respective authors had taken the trouble of looking at the numerous example of complicated sekis at Sensei, I do not think they would have posted their definitions.

The definition I gave above needs knowledge of perfect play to work. This is a bit of a painful point and the remainder of this section is about making this acceptable without sacrificing too much accuracy. I.e., the goal is to detect any seki that could reasonably be expected to occur in a real game within a short amount of time (say, at most ten seconds on a modern computer).

Firstly, we would not want to invoke a routine that starts looking for perfect play during the fuseki. Therefore, a position should satisfy a few criteria before we start looking for perfect play. First all Benson-regions are identified. The criteria then are

  • There is no empty intersection next to a Benson-region.
  • All strings that are not part of Benson-regions have one, two of three liberties.
  • All strings of liberties in the non-Benson-regions consist of one or two liberties.
  • If a string is adjacent to one intersection of a liberty-string, it is also adjacent to the other intersection.

I am aware that this these criterea very probably exclude some sekis, but hope that one has to come up with real hairy examples to show this. Only if these points are satisfied perfect play will be attempted. The function play_perfectly() will be called for this. To make sure that for not too big sekis this will actually reach an end before the sun goes nova we apply heuristics. The function play_perfectly() returns the score under perfect play. The following shortcuts are used to speed it up.

  • If the entire board consists of Benson-regions, return the appropriate result immediately.
  • If the non-Benson part of the board is completely surrounded by a Benson-region of one color, check whether these surrounded intersections have enough room for two eyes for the opponent. If they don't immediately assume that these intersections are owned by the player that surrounds the intersections.
  • If a move has been found that wins all the non-Benson intersections, don't look at other moves.

The last point greatly speeds up the search. That is, provided that we start by looking at good moves. The speed-up is most important if one player can convert a non-Benson region into a Benson-region of his own. Lots of time can be saved by doing this in as few moves as possible. To make sure that we start with good moves, we use some rules. For every possbile move the quantity Q is defined as

Q = 5*(number of Benson-intersections created by the move) + 4*(number of stones directly or indirectly connected to Benson-intersections in the sense of miai) + 3*(number of empty intersections adjacent to Benson-intersections or to intersections as were counted in the previous term).

By "connected in the sense of miai" I mean that these intersections can be connected to a Benson-region in (at least) two possbile ways. Moves that have the largest value for Q are tried first. If all moves have the same value for Q this is not a useful heuristic and moves are ordered using the priorities

  • Moves that capture an opponent group.
  • Moves that pull an own group out of atari.
  • Moves that are not on the border.
  • Moves that are not in the corner.

These rules were found by trying to identify the seki-positions that can be found in "Go for beginners" by Kaoru Iwamoto. Quite a lot of fiddling was necessary before an acceptable solution was found. The hardest position I considered was

[Diagram]
corner  

It is solved in about four seconds. The other test positions go much quicker. Typically they are solved within the second. The bad news is that it is still quite possible to construct seki-positions that cannot be solved within reasonable time. Because of this the function play_perfectly() counts how many times it has been called recursively. It this is more than 50000, we simply assume that the position is not a seki and that play should continue.


Evaluation function

I wrote an evaluation function. This function, when given a position returns a number between 1 and -1 for every intersection of the board. If the number is 1 the intersection will certainly be owned by Black at the end of the game (as far as the evaluation function knows, which may not be very far in case of bad evaluation functions). This number is supposed to be the expectation value of the contribution of a point to the score in area scoring in the case of perfect play. Of course, saying that it is an expectation value implies that for every intersection there is a probability that it will belong to black or to white area at the end of the game. This is because in general there will be many lines of play that attain the highest possible score. The idea is to avarage for a board position over all possible moves that would in the end lead to the highest possible score. If I had an evaluation function that was capable of attaining perfect play it would satisfy the consistency condition that the value returned for a particular intersection is the averge of the values returned for this intersection after all moves that are compatible with perfect play. The difference between these two numbers gives an idea how good or bad the evaluation function is performing.

The difference mentioned at the end of the last paragraph can be used as an objective criterion to decide when the evaluation function is in error. If the two ways of obtaining the expectation value differ by more then a suitably chosen value (at the moment I am using 0.6), then the evalution function is considered to be in error and code should be written to correct it. I think that having such an objective criterion is important because without one the only way to know what parts of a go-playing program should be improved would be based on what humans think of the play of the program. This is not so desirable because even if these humans happened to be 9p, they still would not know for certain what would be the best way to play in every situation. Hence we would imperfectly coding the imperfect knowledge of humans. This is prabably a disadvantage.

Also, because of the seki-detection algorithm described above, I will know the exact score at the end of the game and I no longer have to rely on an inaccurate evalution function but can instead use the true result to see how accurate the evaluation function was one move before the end of the game.

The obvious thing to do now is run a go-program based on the evaluation function as just described and find out when the evalution function is in error (as defined above). Because I will have to start out with something I use an evaluation function that just detects Benson-regions, considers every stone on the board alive and uses an influence function in which influence is supposed to drop exponentially as we get further away from the stone(s) excerting the influence.

The play that results is rather funny. I will show a position obtained after fifty moves.

[Diagram]
19x19 diagram  

As you can see, the program enjoys alternating lines. The reason is that my influence function is such that a stone that is put in front of another completely shields the infuence of it. This is probably not very realistic, but maybe I will address that later.


Improving the evaluation function

Now that I can let my program play itself and judge its own play, I will automatically discover when the evaluation function gives wrong results. When improving the evaluation function, I start from the end of the game. So I first look at the last inconsistency that is found in a generated game. After that inconsistency is fixed, I start looking at the other ones. I will report on the kinds of inconsistencies that I encounter and fix on a separate page as this one is getting quite long.


Chrisd/ComputerGo last edited by 213.140.6.119 on February 12, 2006 - 17:07
RecentChanges · StartingPoints · About
Edit page ·Search · Related · Page info · Latest diff
[Welcome to Sensei's Library!]
RecentChanges
StartingPoints
About
RandomPage
Search position
Page history
Latest page diff
Partner sites:
Go Teaching Ladder
Goproblems.com
Login / Prefs
Tools
Sensei's Library