bin | ||
pubeval | ||
__init__.py | ||
.gitignore | ||
board.py | ||
bot.py | ||
cup.py | ||
dice.py | ||
game.py | ||
human.py | ||
main.py | ||
network_test.py | ||
network.py | ||
player.py | ||
plot.py | ||
README.org | ||
requirements.txt | ||
restore_bot.py | ||
test.py |
Quack-TD
Quack-TD is a backgammon playing algorithm based upon neural networks trained through TD(λ)-learning. The algorithm is implemented using Python 3 and Tensorflow.
Usage
The main executable is main.py
. Various command-line options and switches can be used to
execute different stages and modify the behaviour of the program. All
command-line options and switches are listed by running main.py
with the argument
--help
. The three central switches are listed below:
--train
: Trains the neural network for a set amount of episodes (full games of backgammon) set by--episodes
(defaults to 1,000). Summary results of the games played during the training session are written tomodels/$MODEL/logs/eval.log
--eval
: Evaluates the nerual network using the methods specified by--eval-methods
for a the amount of episodes set by--episodes
(defaults to 1,000). Results are written tomodels/$MODEL/logs/eval.log
.--play
: Allows the user to interactively play a game of backgammon against the algorithm.
Evaluation methods
Currently, the following evaluation methods are implemented:
pubeval
: Evaluates against thepubeval
backgammon benchmark developed by Gerald Tesauro. The source code is included in thepubeval
directory and needs to be compiled before use. The binary should be placed atpubeval/pubeval
.random
: Evaluates by playing against a player that makes random moves drawn from the set of legal moves. Should be used with high episode counts to lower variance. TODO: Doesn't even work currently
Examples
The following examples describe commmon operations.
Train default model
python3 --train
Train model named quack
python3 --train --model=quack
Train default model in sessions of 10,000 episodes
python3 --train --episodes=10000
Train model quack
and evaluate after each training session
python3 --train --eval-after-train --model=quack
Evaluate model named quack
using default evaluation method (currently random
)
python3 --eval --model=quack
Evaluate default model using evaluation methods random
and pubeval
python3 --eval --eval-methods random pubeval
Model storage format
Models are stored in the directory models
. If no model is specfied with the
--model
option, the model is stored in the models/default
directory. Otherwise, the model is stored in models/$MODEL
.
Files
Along with the Tensorflow checkpoint files in the directory, the following files are stored:
episodes_trained
: The number of episodes of training performed with the modellogs/eval.log
: Log of all completed evaluations performed on the model. The format of this file is specified in /Pownie/backgammon/src/commit/554e587ffde3e60d72aa2dd64f8cb2cdccf9cbf8/Log%20format.logs/train.log
: Log of all completed training sessions performed on the model. If a training session is aborted before the pre-specified episode target is reached, nothing will be written to this file, althoughepisodes_trained
will be updated every time the model is saved to disk. The format of this file is specified in /Pownie/backgammon/src/commit/554e587ffde3e60d72aa2dd64f8cb2cdccf9cbf8/Log%20format.
Log format
The evaluation and training log files (logs/eval.log
and logs/train.log
respectively) are CSV-foramtted files with structure as described below. Both
files have semicolon-separated columns (;
) and newline-separated rows (\n
).
Evaluation log (eval.log
)
Columns are written in the following order:
time
: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the evaluation was finished.method
: Short string describing the method used for evaluation.trained_eps
: Amount of episodes trained with the model before evaluationcount
: Amount of episodes used for evaluationsum
: Sum of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)mean
: Mean of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
TODO: Add example of log row
Training log (train.log
)
Columns are written in the following order:
time
: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the training session was finished.trained_eps
: Amount of episodes trained with the model after the training sessioncount
: Amount of episodes used for trainingsum
: Sum of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)mean
: Mean of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)