5.0 KiB
Quack-TD
- Usage
- Evaluation methods
- Examples
- Train default model
- Train perpetually
- Train model named
quack
- Train default model in sessions of 10,000 episodes
- Train model
quack
and evaluate after each training session - Evaluate model named
quack
using default evaluation method (currentlyrandom
) - Evaluate default model using evaluation methods
random
andpubeval
- Model storage format
Quack-TD is a backgammon playing algorithm based upon neural networks trained through TD(λ)-learning. The algorithm is implemented using Python 3 and Tensorflow.
Usage
The main executable is main.py
. Various command-line options and switches can be used to
execute different stages and modify the behaviour of the program. All
command-line options and switches are listed by running main.py
with the argument
--help
. The three central switches are listed below:
--train
: Trains the neural network for a set amount of episodes (full games of backgammon) set by--episodes
(defaults to 1,000). Summary results of the games played during the training session are written tomodels/$MODEL/logs/eval.log
--eval
: Evaluates the nerual network using the methods specified by--eval-methods
for a the amount of episodes set by--episodes
(defaults to 1,000). Results are written tomodels/$MODEL/logs/eval.log
.--play
: Allows the user to interactively play a game of backgammon against the algorithm.
Evaluation methods
Currently, the following evaluation methods are implemented:
pubeval
: Evaluates against thepubeval
backgammon benchmark developed by Gerald Tesauro. The source code is included in thepubeval
directory and needs to be compiled before use. The binary should be placed atpubeval/pubeval
.random
: Evaluates by playing against a player that makes random moves drawn from the set of legal moves. Should be used with high episode counts to lower variance. TODO: Doesn't even work currently
Examples
The following examples describe commmon operations.
Train default model
python3 --train
Train perpetually
python3 --train --train-perpetually
Train model named quack
python3 --train --model=quack
Train default model in sessions of 10,000 episodes
python3 --train --episodes=10000
Train model quack
and evaluate after each training session
python3 --train --eval-after-train --model=quack
Evaluate model named quack
using default evaluation method (currently random
)
python3 --eval --model=quack
Evaluate default model using evaluation methods random
and pubeval
python3 --eval --eval-methods random pubeval
Model storage format
Models are stored in the directory models
. If no model is specfied with the
--model
option, the model is stored in the models/default
directory. Otherwise, the model is stored in models/$MODEL
.
Files
Along with the Tensorflow checkpoint files in the directory, the following files are stored:
episodes_trained
: The number of episodes of training performed with the modellogs/eval.log
: Log of all completed evaluations performed on the model. The format of this file is specified in /Pownie/backgammon/src/commit/d275955296b88a685235d1a11de9024f0cfe6893/Log%20format.logs/train.log
: Log of all completed training sessions performed on the model. If a training session is aborted before the pre-specified episode target is reached, nothing will be written to this file, althoughepisodes_trained
will be updated every time the model is saved to disk. The format of this file is specified in /Pownie/backgammon/src/commit/d275955296b88a685235d1a11de9024f0cfe6893/Log%20format.
Log format
The evaluation and training log files (logs/eval.log
and logs/train.log
respectively) are CSV-foramtted files with structure as described below. Both
files have semicolon-separated columns (;
) and newline-separated rows (\n
).
Evaluation log (eval.log
)
Columns are written in the following order:
time
: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the evaluation was finished.method
: Short string describing the method used for evaluation.trained_eps
: Amount of episodes trained with the model before evaluationcount
: Amount of episodes used for evaluationsum
: Sum of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)mean
: Mean of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
TODO: Add example of log row
Training log (train.log
)
Columns are written in the following order:
time
: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the training session was finished.trained_eps
: Amount of episodes trained with the model after the training sessioncount
: Amount of episodes used for trainingsum
: Sum of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)mean
: Mean of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)