5.4 KiB
Quack-TD
- Setup
- Usage
- Evaluation methods
- Examples
- Train default model
- Train perpetually
- Train model named
quack
- Train default model in sessions of 10,000 episodes
- Train model
quack
and evaluate after each training session - Evaluate model named
quack
using default evaluation method (currentlyrandom
) - Evaluate default model using evaluation methods
random
andpubeval
- Model storage format
Quack-TD is a backgammon playing algorithm based upon neural networks trained through TD(λ)-learning. The algorithm is implemented using Python 3 and Tensorflow.
Setup
Pubeval
To use Pubeval for evaluation the Python module pubeval
must first be
installed. The necessary source files should be distributed alongside the main
application and located in the pubeval
directory. The installation can be done
by entering the directory and running python3 setup.py install
or pip install .
.
Usage
The main executable is main.py
. Various command-line options and switches can be used to
execute different stages and modify the behaviour of the program. All
command-line options and switches are listed by running main.py
with the argument
--help
. The central mode-switches are listed below:
--train
: Trains the neural network for a set amount of episodes (full games of backgammon) set by--episodes
(defaults to 1,000). Summary results of the games played during the training session are written tomodels/$MODEL/logs/eval.log
--eval
: Evaluates the nerual network using the methods specified by--eval-methods
for a the amount of episodes set by--episodes
(defaults to 1,000). Results are written tomodels/$MODEL/logs/eval.log
.--play
: Allows the user to interactively play a game of backgammon against the algorithm.--list-models
: Lists the models stored on in themodels
folder.
Evaluation methods
Currently, the following evaluation methods are implemented:
pubeval
: Evaluates against a Python extension based on thepubeval
backgammon benchmark developed by Gerald Tesauro. The source code is included in thepubeval
directory and needs to be installed before use. This can be done by runningpython3 setup.py install
orpip install .
from the source directory.random
: Evaluates by playing against a player that makes random moves drawn from the set of legal moves. Should be used with high episode counts to lower variance. TODO: Doesn't even work currently
Examples
The following examples describe commmon operations.
Train default model
python3 --train
Train perpetually
python3 --train --train-perpetually
Train model named quack
python3 --train --model=quack
Train default model in sessions of 10,000 episodes
python3 --train --episodes=10000
Train model quack
and evaluate after each training session
python3 --train --eval-after-train --model=quack
Evaluate model named quack
using default evaluation method (currently random
)
python3 --eval --model=quack
Evaluate default model using evaluation methods random
and pubeval
python3 --eval --eval-methods random pubeval
Model storage format
Models are stored in the directory models
. If no model is specfied with the
--model
option, the model is stored in the models/default
directory. Otherwise, the model is stored in models/$MODEL
.
Files
Along with the Tensorflow checkpoint files in the directory, the following files are stored:
episodes_trained
: The number of episodes of training performed with the modellogs/eval.log
: Log of all completed evaluations performed on the model. The format of this file is specified in /Pownie/backgammon/src/commit/221e83abd779413c49b9376692f4d5392b950aa3/Log%20format.logs/train.log
: Log of all completed training sessions performed on the model. If a training session is aborted before the pre-specified episode target is reached, nothing will be written to this file, althoughepisodes_trained
will be updated every time the model is saved to disk. The format of this file is specified in /Pownie/backgammon/src/commit/221e83abd779413c49b9376692f4d5392b950aa3/Log%20format.
Log format
The evaluation and training log files (logs/eval.log
and logs/train.log
respectively) are CSV-foramtted files with structure as described below. Both
files have semicolon-separated columns (;
) and newline-separated rows (\n
).
Evaluation log (eval.log
)
Columns are written in the following order:
time
: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the evaluation was finished.method
: Short string describing the method used for evaluation.trained_eps
: Amount of episodes trained with the model before evaluationcount
: Amount of episodes used for evaluationsum
: Sum of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)mean
: Mean of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
TODO: Add example of log row
Training log (train.log
)
Columns are written in the following order:
time
: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the training session was finished.trained_eps
: Amount of episodes trained with the model after the training sessioncount
: Amount of episodes used for trainingsum
: Sum of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)mean
: Mean of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)