Fluke

Keywords: Software

Fluke is a Monte Carlo Go playing program written in Java by Tim Foden. Development started around 16th January 2008.

I've been meaning to start writing a Go program for some years now. Having recently noticed the success of the Monte Carlo approach, I realised that it wouldn't take anywhere near as much work to get a program off the ground, and this prompted me to start.

Thanks to everyone for all the tools and services like GoGui, GTP, and CGOS. These enabled me to get an engine that can actually play others after only about 1 week of development in the evenings.

Cheers, Tim.

Version information

080309a-10k 1677 (257 games)
080309e-10k 1668 (257 games)
080309f-10k 1648 (257 games)
080309c-10k 1639 (257 games)
080309b-10k 1634 (257 games)
080309d-10k 1620 (257 games)

UCT-RAVE now. Varying the Cr constant (the one in the RAVE part of the formula). Cu is sqrt(1/5) for all tests.
1. (a) Cr = 0.25
2. (b) Cr = 0.5
3. (c) Cr = 0.75
4. (d) Cr = 1.0 -- this is the default/control.
5. (e) Cr = 1.25
6. (f) Cr = 1.5

080308b-10k 1679 elo (188 games)
080308e-10k 1672 elo (188 games)
080308a-10k 1650 elo (186 games)
080308d-10k 1647 elo (185 games)
080308c-10k 1640 elo (184 games)

UCT-RAVE now. Varying the Cu constant (the one in the UCB1 (non-rave) part of the formula). Cr is 1.0 for all tests.
1. (a) Cu = sqrt(1/3)
2. (b) Cu = sqrt(1/4)
3. (c) Cu = sqrt(1/5) -- this is the default/control.
4. (d) Cu = sqrt(1/6)
5. (e) Cu = sqrt(1/7)

080306-10k 1570 elo (269 games)
080306b-10k 1627 elo (189 games)
080306c-10k 1621 elo (200 games)

This is back to a non-RAVE version, in an attempt to determine why I've not replicated the performance of the 080225 version with newer code. Could be a bug. Could be slippage of the ratings in CGOS. Could be that I've missed something that should be the same, but which is different. More investigation required. (update) 1570 elo looks fine to me. :)
1. (b) ??
2. (c) ??

080304-10k 1380 elo (302 games)
080304b-10k 1560 elo (299 games)
080304c-10k 1620 elo (299 games)
080304d-10k 1530 elo (282 games)
080304e-10k 1430 elo (276 games)
080304f-10k 1710 elo (252 games) (best version so far, hurrah!)

... finally, a test with somewhat of an improvement!
The first 3 versions were better, but still didn't seem right to me.
1. 304 - Random playouts.
2. 304b - Atari-aware playouts.
3. 304c - Mogo based playouts.
Need to look at the source to remember what this version did.
1. 304d - ??
These two versions were pretty good. :) I still feel I'm missing something though.
1. 304e - Random playouts.
2. 304f - Mogo based playouts.

080303-10k 1499 elo (152 games)
080303b-10k 999 elo (151 games)
080303c-10k 1400 elo (152 games)
080303d-10k 1074 elo (148 games)
080301e-10k 1422 elo (103 games)

RAVE again. Still not right.

080302-10k 1489 elo (184 games)
080302b-10k 1407 elo (146 games)
080302c-10k 1154 elo (143 games)

More RAVE attempts. Going nowhere fast.

080301rv-10k 1445 elo (206 games)
080301r2-10k 1291 elo (161 games)
080301r3-10k 1473 elo (204 games)
080301r4-10k 1481 elo (197 games)

Here is a first (failed) attempt to add RAVE to the UCT search.

080228-10k 1480 elo
080228b-10k 1475 elo
080228c-10k 1500 elo
080228d-10k 1450 elo

These were all basically the same as the 080227/25 versions, except that they used a faster implementation of the 3x3 pattern matcher (still with patterns based on the Mogo paper). Uct formula: winRate + sqrt(ln(totalPlays) / (x * plays)). [n.b. This pattern matcher includes a 'fix' so that the 'any' pattern doesn't match points off the edge of the board -- this may not be the correct behavior.] (update) None of these versions were as strong as the 080225 version, which implies to me that I may have a bug in the new 3x3 pattern code.
1. 080228 - uct: x = 6, playout order: extend, pattern, capture, random.
2. 080228b - uct: x = 2, playout order: extend, capture, pattern, random.
3. 080228c - uct: x = 5, playout order: extend, capture, pattern, random.
4. 080228d - uct: x = 2, playout order: extend, pattern, capture, random.

080227b-10k 1500 elo (164 games)
080227-10k 1540 elo (164 games)

These were both exactly the same as the 080225 version, except for a change to the selectivity of the UCT formula: winRate + sqrt(ln(totalPlays) / (x * plays)).
1. 080227 - uct: x = 5.
2. 080227b - uct: x = 4.

080226-10k 1420 elo (236 games)

Removed the pattern matcher from the playouts. This allow the speed to increase to about 5,000/sec in the opening. It looks like the patterns do add quite a lot to the strength (+130 in this instance).

080225-10k 1550 elo (295 games)

Changed PASS move integer value from board area (e.g. PASS = 81 on 9x9) to always being 0. This version uses UCT search (x = 2) with a Mogo rules based playout (See paper entitled "Modifyication of UCT with Patterns in Monte-Carlo Go"). The playouts are now very slow, down from about 8,000/sec to about 800/sec in the opening on my 1.66GHz Core2 Duo laptop. This was the strongest 10k UCT searcher so far.

080223-10k 080221-10k 1410 elo (408 games)

Looks like I didn't check in the source to 02-21, but I do have 02-23. Changes here are to use normal UCT search with Atari-aware playouts.

080220-10k 1310 elo (157 games)

Added a naive AMAF implementation into the UCT search. If there are n moves that will be considered valid for AMAF at a node, then we update the rewards by adding reward / n, and the plays by adding 1 / n. 'Standard' random playouts were used for this test.

080218-10k 1200 elo (270 games)

Now uses UCT again, but still with 'standard' random playouts. The UCT move choice uses the adjusted formula introduced for 080209. I'd expect this version to be rated about 1,250 elo. (update) Rating was only 1,200... perhaps a bug in the new UTC implementation?

080216a2-10k 1440 elo

This version defines n, the number of moves to use in AMAF from the playout, as all moves up to (but not including) the 1st capture. This version only seems about 70 elo weaker than the version below, but on average uses at quite a lot fewer moves from the playouts (estimate about 3/5). Fewer moves, but guaranteed better info.

080216am-10k 1510 elo

There was a bug in my AMAF... I didn't check that the move from AMAF was valid in the root position, so this would have affected how UCB1 selected moves, as the total number of moves would have been incremented with no other increment in the stats for the valid moves. For this version I've fixed this bug. I'll have to wait and see how this will affect the elo rating. (update) It appears to have made a +120 elo difference.

080214-10k 1000 elo
080214am-10k 1390 elo

This version of Fluke has been extensively re-factored to sort out problems with my initial 'design'. :) I may well have introduced various bugs whilst doing this.
Here I have gone back to the basic UCB1 (non-tree) search. The am-10k version is using the all moves as first (AMAF) heuristic. It choose to use the first n moves from the playout, where n = pow(num-playout-moves, 0.75). This is in an effort to compare with myCtest-10k and myCtest-10-AMAF, which gained about 420 elo from adding AMAF. My version only seems to have gained 390 elo... perhaps I have a bug in my AMAF implementation? (update) I now wonder if the non AMAF version has a bug, as I did have a bug in the AMAF version (see 080216a-10k), and 390 elo is not so different from 420 elo.

080209-10k 1440 elo
080209-50k 1660 elo

These are exactly the same code as the plain 080209 version, except for the code that decides on the number of nodes to search. All versions up to this point searched (num-legal-moves x 1000) nodes. E.g. in the opening they would search 81,000 nodes. The two versions given here instead always search a fixed number of nodes... 10k == 10,000 nodes, 50k == 50,000 nodes. It is interesting to observe that the fixed 50,000 nodes version looks only slightly stronger than the 1000*n variable nodes version.

080209 1640 elo - A substantial improvement.

Changed non-solid eye classification to try to detect false eyes. Non-solid eyes that have 50% or more corners occupied by enemy stones are regarded as false, and become candidates for moves.
Changed UCB1 move selection confidence bound from sqrt(2 log(m) / n) to sqrt(log(m) / (2 n)). This makes UCT quite a lot more selective.
Changed something else I can't remember right now.

080208 1430 elo - Going backwards here!

Changes here include changes for 080207, as I lost the source for 080207 (overwrote it by mistake, doh!).
Upgraded the atari aware playout to give priority to any moves which extend or capture a group with 1 liberty. The extension will occur whether or not it will increase the number of liberties of the group.

080207 1460 elo

Source for 080207 is lost, so specific changes for here cannot be devined.

080129 1440 elo

Initial stab at a UCT driver. It uses the UCB1 driver exactly as is, but in a tree formation. A 120 elo improvement over the plain UCB1 driver version. This isn't as much of an improvement as I expected... perhaps I have a bug here, or have misunderstood something in the implementation of the agorithm?

080127 1320 elo

Added a UCB1 driver for 1st move selection (for playouts). Chooses moves based on the standard UCB1 selection criteria. e.g. Chooses move with maximum v given by: double v = m_rewards[move] / m_plays[move] + Math.sqrt(2 * Math.log(m_totalPlays) / m_plays[move]).
Added an "atari aware" playout mode. Currently this only detects moves which cause self-atari, and if so, makes the next move capture the stone with a 50% chance.

080126 1280 elo

Changed playout move driver to select moves with probability proportional to the cube of the average win rate.

--- below this line from memory only... I no longer have the source code ---

080125 1270 elo

Changed eye rule for non-solid eyes. Now only fills non-solid eyes if one of the eye-surrounding groups is in atari.
Changed playout move driver to select moves with probability proportional to the square of the average win rate. e.g. W = win-rate ^ 2;

080124 1030 elo

Changed eye rule for non-solid eyes. Now only fills non-solid eyes if none of the eye-surrounding groups has less than 3 liberties.
Added probabalistic move driver for selecting the 1st move of the next playout. Weight for each move W = 1 + 99 * win-rate.

080123 940 elo

Initial version that can actually play go at all (via GTP). :) 1000 Playouts / valid move on the board (thus on an empty 9x9 board it does 81,000 playouts). On 9x9 does about 9,000 playouts / second on my Core2 Duo 1.66 GHz based laptop (raises to about 18,000 /s near the end of the game).

Playouts are random except for:

Don't play suicide moves.
Don't fill eyes. Eyes are defined as solid (all surrounding groups have same group-id) or non-solid. Non-solid eyes are only filled if the total number of liberties L is greater than 2. L = sum(group[n].libs - 1) + 1.
Don't play in position of simple ko.

Fluke 9x9 data from CGOS.

Name			ELO	Games	Date
----------------------------------------------------------
Fluke-080309a-10k	1677	257	2008-03-11 10:52:23
Fluke-080309e-10k	1668	257	2008-03-11 10:52:23
Fluke-080309f-10k	1648	257	2008-03-11 10:52:23
Fluke-080309c-10k	1639	257	2008-03-11 10:52:23
Fluke-080309b-10k	1634	257	2008-03-11 10:52:23
Fluke-080309d-10k	1620	257	2008-03-11 10:52:23

Fluke-080308b-10k	1679	188	2008-03-09 18:18:39
Fluke-080308e-10k	1672	188	2008-03-09 18:18:39
Fluke-080308a-10k	1650	186	2008-03-09 18:18:39
Fluke-080308d-10k	1647	185	2008-03-09 18:18:39
Fluke-080308c-10k	1640	184	2008-03-09 18:18:39

Fluke-080306b-10k	1627	189	2008-03-08 13:07:05
Fluke-080306c-10k	1621	200	2008-03-09 18:18:39
Fluke-080306-10k	1570	269	2008-03-08 13:07:05

Fluke-080304-10k	1383	302	2008-03-06 19:05:37
Fluke-080304b-10k	1556	299	2008-03-06 19:05:37
Fluke-080304c-10k	1620	299	2008-03-06 19:05:37
Fluke-080304d-10k	1527	282	2008-03-06 18:46:53
Fluke-080304e-10k	1426	276	2008-03-06 19:05:37
Fluke-080304f-10k	1709	252	2008-03-06 22:28:05

Fluke-080303-10k	1499	152	2008-03-04 19:22:18
Fluke-080303b-10k	999	151	2008-03-04 19:22:18
Fluke-080303c-10k	1400	152	2008-03-04 19:22:18
Fluke-080303d-10k	1074	148	2008-03-04 19:22:18
Fluke-080301e-10k	1422	103	2008-03-04 23:59:49

Fluke-080302-10k	1489	184	2008-03-03 19:49:19
Fluke-080302b-10k	1407	146	2008-03-03 19:49:19
Fluke-080302c-10k	1154	143	2008-03-03 19:49:19

Fluke-080301rv-10k	1445	206	2008-03-02 22:03:55
Fluke-080301r2-10k	1291	161	2008-03-02 16:02:17
Fluke-080301r3-10k	1473	204	2008-03-02 22:03:55
Fluke-080301r4-10k	1481	197	2008-03-02 22:03:55

Fluke-080228-10k	1481	261	2008-03-01 12:50:27
Fluke-080228b-10k	1475	243	2008-03-01 12:50:27
Fluke-080228c-10k	1501	242	2008-03-01 12:50:27
Fluke-080228d-10k	1451	176	2008-03-01 10:25:37

Fluke-080227b-10k	1503	164	2008-02-28 23:22:29
Fluke-080227-10k	1543	164	2008-02-28 21:01:29
Fluke-080226-10k	1424	236	2008-02-27 19:26:01
Fluke-080225-10k	1554	295	2008-02-27 19:07:21

Fluke-080221-10k	1409	408	2008-02-24 07:59:40
Fluke-080220-10k	1313	157	2008-02-21 20:42:58
Fluke-080218-10k	1196	270	2008-02-20 22:25:22

Fluke-080216a2-10k	1442	197	2008-02-17 08:26:26
Fluke-080216am-10k*	1473	381	2008-03-05 07:19:33
Fluke-080216am-10k	1514	236	2008-02-17 08:26:26
Fluke-080214am-10k	1393	277	2008-02-16 16:02:46
Fluke-080214-10k	998	416	2008-02-17 08:26:26
Fluke-080209-50k	1658	161	2008-02-12 17:39:54
Fluke-080209-10k	1437	165	2008-02-12 21:02:14
Fluke-080209		1637	455	2008-02-11 18:16:58
Fluke-080208		1429	152	2008-02-08 23:44:48
Fluke-080207		1456	125	2008-02-08 00:33:23
Fluke-080129		1438	160	2008-01-30 20:27:03
Fluke-080127		1318	223	2008-01-28 21:00:37
Fluke-080126		1278	147	2008-01-27 08:26:20
Fluke-080125		1270	118	2008-01-26 10:34:04
Fluke-080124		1032	120	2008-01-25 18:14:21

Fluke last edited by OneWeirdDude on December 14, 2009 - 19:31

RecentChanges · StartingPoints · About

Edit page ·Search · Related · Page info · Latest diff