Suicide Checkers

I added a couple of afterthoughts on the match after writing the original version of this report.
Suicide checkers is a checkers variant where the outcome of the game has been reversed compared to regular checkers: if you have no pieces left, or if you can't move any more, you win the game. I wrote a suicide checkers program in early 2005; there are some posts on it in my checkers blog for February and March 2005. I never published it, because suicide checkers is one of the games that can still be played online without cheaters, since there is no good program to cheat with. I won't publish Suicidal Cake as long as things remain this way. If you want a computer sparring partner, try HCheckers (link at the end of the article). I played a few games against some well-known suicide checkers players like George Miller and Michael Taktikos. Suicidal Cake won every single game it ever played against them. I more or less stopped working on it after that, since there was no incentive to develop it further.

The Challenge

On January 18 2006, I received the following e-mail from Jonathan Schaeffer:

Greetings!
I have a couple of students who built a suicide checkers program as a course project. They have tinkered around with it since then and are curious to know just how good it is. Obviously it kills all humans. I understand you have a strong suicide checkers program. Would you be willing to have your program play a match with them? Nothing at stake, except to satisfy some student's curiosity.

Of course, I immediately turned on my book generator for the suicide checkers opening book, and looked at Suicidal Cake's code once again. First of all, I noticed that I had managed to lose the latest version of my code. Fortunately, that only had the changes necessary to move from the 6-piece to the 7-piece endgame database, so I hadn't lost much, and quickly fixed that again. I also added another extension mechanism that seems to help in suicide checkers. I knew I had a good program, but would it be good enough? The Canadian students, Mike Smith and Frano Sailer, had computed the 8-piece endgame database. We both produced short advertisments for our programs. Our respective advertisments ran as follows:

Propaganda

Suicidal Cake 1.11 by Martin Fierz will be running on an AMD Athlon 64 3400+ with 2GB ram. SC has an opening book based on about 200'000 positions, and the full 7-piece endgame database (which is about 2GB compressed). It searches about 3'000'000 positions per second. It is based on Cake Manchester 1.05, with some changes for suicide checkers (some obvious ones like the necessary changes for the rules, and some less obvious ones concerning the search behavior). It has played some games against human experts and won every single one of them.

Roshi47 by Mike Smith and Frano Sailer is based on a Suicide Checkers program developed for a graduate course at the University of Alberta, in Jonathan Schaeffer's class (That's right, the guy who made Chinook!). Mike and Frano are graduate students in Schaeffer's group. The initial search program had an undefeated tournament record. Since then, it has been enhanced by the addition of 8-piece endgame databases, which total over 18GB in size (compressed!). Its evaluation function has also been trained by a machine learning algorithm under self-play. The new learned evaluation defeats all previous versions. Roshi47 will be running on a machine with approximately equivalent specs as the one Suicidal Cake will be running on. The name comes from an ancient Japanese story, about 47 Samurai.
I learned during the match that Frano is actually with Michael Buro of ProbCut fame rather than with Schaeffer. I also noticed that I had mixed up my machines: Suicidal Cake was running on an AMD Athlon 64 3000+ with 1.5GB ram. Roshi47 ran on an Opteron 250, which is very similar to my AMD Athlon 64, except that it was a bit faster (2.4 GHz instead of 2.0). Their machine hat 8GB (!) ram, but they were not using very much of it (they couldn't tell me exactly how much).

The Match

On March 8, 2006, the match was played on Kurnik. It turned out to be a more lopsided affair than we had expected. Suicidal Cake won all 4 games we played. I don't think that Suicidal Cake's book made a big difference - it got good positions out of the opening, but it also won one game where it thought the position was equal after it dropped out of book. Mike and Frano changed the evaluation function after the first two games, but to no avail. Surprisingly for me, their second evaluation function was a random evaluation! They told me that it was better than all simple hand-crafted evaluation functions that they had tried, and added that "that goes to show how hard it is to write a good evaluation function in suicide checkers". Looking at the games, I'm not sure... in principle it is certainly hard to come up with a good evaluation function for suicide checkers, but some simple principles do apply, like "grabbing material is good for you!" I would be very surprised if a very materialistic evaluation didn't beat a random evaluation. While chatting, I learned that their evaluation function is produced through automatic learning. It can more or less improve by itself, and I suspect that they will be back some day and ask for revenge! Just in case, I'm keeping my book generator running.
Enough said - you can replay all the games by clicking on the links below. In the end, we authors also played two games, which you can also replay! That's something I already wanted to do at the 2002 Las Vegas computer championship, but somehow we never managed to find the time back then.

The Games

The games have no comments for the moment, perhaps I'll add some light notes in the future.

Afterthoughts

Mike and Frano were very confident about their evaluation function. They had used a learning method which is described in the paper by Schaeffer et. al linked below. In that paper, Schaeffer & co claim that this type of learning works very well: they found that automated learning produced an evaluation function for Chinook that was just as good as their hand-tuned one. So why didn't it work for Roshi47? One of the problems is that this learning algorithm cannot learn patterns - you need to feed it with features that it should try to evaluate, and it will learn the correct weight for each feature. In Chinook the features themselves were a product of human learning of what to evaluate, and what not. In Roshi47, the features to learn were probably poorly chosen. For example, AFAIK they were using 2x32 features which described whether a man was on a square or not - that would result in something like "A red man on square 21 is evaluated as -20 points". Game two is a nice illustration of this red-man-on-21 theme. However, the red man on square 21 by itself is not the problem - the problem is that it is there combined with the white men on 25 and 30! Only this pattern allows white to constantly menace 30-26 with dire consequences for red. This is the feature that they should have been evaluating, and as long as their evaluation function cannot learn about it, all learning in the world will do no good! As a side note, there is a table in the Schaeffer et al. paper below which lists the features that Chinook was evaluating. I was quite surprised by that table, Chinook's evaluation function looks very simple compared to Cake's evaluation, and a lot of things that seem important to me are not in the table, while other things that don't seem important to me are. Of course, this doesn't mean that Chinook's evaluation is no good - it just means that the Chinook team chose very different features to evaluate than I did. For me this is a strong indication that any learning effort that just takes given features to evaluate can't work: True learning should also be able to recognize new features to evaluate!

Another e-mail

After the match, I received the following e-mail from Jonathan Schaeffer:

To: Martin Fierz
From: Jonathan Schaeffer
Subject: Suicide Checkers
Date: Wed, 8 Mar 2006 16:48:02 -0700
X-Mailer: Apple Mail (2.746.2)

Congratulations! Experience trumps youth :) I think both
Mike and Frano are stunned at the result.

Now, that is something to be proud of :-)

Links

-- March 18, 2006