Parallelization
In computer chess, most of the good engines are now capable of running on multiple processors. In checkers, this is not yet the case. I never attempted to make Cake multi-core capable for two reasons: First, because until recently, multi-core machines were not relevant for end-users. Second, because the Chinook team has reported rather low speedup values for parallel checkers programs (compared to chess), and claims that this is because of the game, and not because of their implementation. Nevertheless, I would love to have a parallel version of Cake.Parallelizing complex programming tasks will certainly become more important in the future, as CPUs seem to have hit a GHz speed limit, and manufacturers are going for multiple cores instead of ever higher clock speeds. This is why this project is quite relevant outside of game programming.
The Oracle
In computer checkers, enormous endgame databases have been constructed, which contain information on the game-theoretic value (win, loss or draw) of every position with less than 8 (in my case) or 10 (Chinook, Ed Gilbert/KingsRow) pieces. These databases must be accessed during the search, that means they must be accessed very quickly. Because of their huge size, the databases don't fit in the RAM of a normal computer, and only parts of the database can be kept in the RAM at any time. Loading from the harddisk takes very much time, so the compression of the database is crucial. The better you can compress it, the more you can keep in RAM, and you save time because you need less disk access.The databases are run-length-encoded, that is, you have an enumeration of all N positions in the database, and write down the values of each position like this: WWWWWWWLLLWWLDDLLLLWWWWWLL..... where W is a win, L is a loss and D is a draw. Run-length encoding stores the above data like this: W6L3W2L1D2L4W5L2.... The longer the "runs" of the same value are in the database, the better RLE works. RLE is not a particularly efficient compression algorithm, but it allows very fast decompression, which is crucial for the use of the database in the search.
I have this idea that it should be possible to write an "endgame oracle" which will return the value of a position (win, loss or draw). Basically it's just an evaluation function with only 3 return values. You could write a separate one for each sub-database (e.g. for 4 men vs 4 men a different evaluation function than for 3 men vs 3 men). Now, of course this oracle will not be right all the time, but let's say, you manage to get it to be right most of the time. You now go through the entire database, checking whether your oracle is right or not, and encode every position where the oracle is right as O instead of W/L/D. You end up with something like this: WOOOOOOOLOOOOOLOODOOOOL..... and encode it as W1O7L1O5L1O2D1O4L1....
If the oracle is good enough, you will end up with a shorter compressed version of the database (ideally, you would be getting ON for a perfect oracle and N positions in the database, and that is pretty good!). You can then use it in your search to look up the database value, and if it's W/L/D you just use it, if it's O, you call the oracle in your search. The idea in this project is to write an endgame database oracle, and to explore how well it can reproduce the values in the database. I would imagine that something like 90% might be achieved. The next step would then be to see whether it is possible to get a better compression of the database with this technique.
This project involves ideas of optimization and data compression, both of which are important topics in computer science.
The Stepping Stone
Checkers has a property that many other games don't have: It is convergent - by that I mean the following: As the game progresses, exchanges of pieces are practically unavoidable, and men are promoted to kings. If you imagine a state space composed of the number of each type of piece, then checkers starts out with 12 men vs 12 men, and as the game progresses, it moves to - for example - 11 men vs 11 men, then 10 men vs 10 men, and so on. The game can't move back to more pieces. Because captures are forced (in English checkers, if you can make a capture you also must make it), the game tends to go very quickly to much fewer pieces. This is also the reason why the endgame databases are so important in checkers - because the game is virtually guaranteed to end up in the endgame database at some point.While the endgame databases are a huge success, they also contain an enormous amount of irrelevant positions. In FEN notation the position B:W31,27,25,20:B14,7,3,2. is quite sensible, and will be necessary for a checkers engine (you can copy and paste the FEN into CheckerBoard if you can't visualize the position from the numbers alone). However positions such as B:W8,7,6,5:B28,27,26,25. and B:W29,21,13,9:B16,12,8,4. are highly unlikely to ever occur in a real game. Nevertheless, they are stored in the 4-4 men database and waste space there.
How many positions in a database are then sensible at all? To answer this question, I had Cake log all positions with 5 men vs 5 men that it encountered in its search during a 288-game-match with KingsRow. It found 24 million unique positions with 5 men vs 5 men in all these games in its search tree. Now, the 5 men vs 5 men database contains over 3 billion positions. That means, only about 1 in a 100 positions in that database showed up in Cake's search tree. What does this all mean? Instead of constructing and compressing a huge win/loss/draw database wich contains 99% irrelevant information, we could go a different way entirely. We could just use Cake with its 8-piece database to evaluate all those 24 million positions with a shallow search. Let's say, we search to about depth 13 (which is quite a lot), then we can evaluate all relevant 5 men vs 5 men positions within just a few days. Of course, this information is not perfect, but it is certainly very good. Now we store the result of this calculation in a database (perhaps compressing it somehow) and use this in the search instead of the real database.
The beauty of this approach is that it is not limited to a small number of pieces. You can do this more than once, for example for 8 vs 8 men and again for 6 vs 6 men. Because checkers is a convergent game as outlined above, and because kings tend to appear late in the game, these two "heuristic databases" will provide two nice stepping stones in between the starting position and the endgame database. This should give a huge increase in playing strength at a relatively low cost of preprocessing (compare this to the ~10 CPU-GHz-years that I invested in opening database construction, and a similar amount of time invested by Ed Gilbert in his 10-piece database).
This project is perhaps the least challenging of the three. However, if successful, it would have a direct extension into all other checkers variants. Also, there is an open challenge left here: How should we use the heuristic database in the search? Putting it into a hashtable is straightforward, but I imagine that there are more efficent ways of doing this. Here your creativity is required!
-- June 30, 2007