backgammon/README.org

129 lines
5.0 KiB
Org Mode
Raw Normal View History

2018-03-11 12:11:27 +00:00
#+TITLE: Quack-TD
2018-03-11 11:59:57 +00:00
Quack-TD is a backgammon playing algorithm based upon neural networks trained
2018-03-11 12:11:27 +00:00
through TD(\lambda)-learning. The algorithm is implemented using Python 3 and
Tensorflow.
2018-03-11 11:59:57 +00:00
2018-03-11 12:11:27 +00:00
* Usage
2018-03-11 11:59:57 +00:00
The main executable is =main.py=. Various command-line options and switches can be used to
execute different stages and modify the behaviour of the program. All
command-line options and switches are listed by running =main.py= with the argument
=--help=. The three central switches are listed below:
- =--train=: Trains the neural network for a set amount of episodes (full games
2018-03-11 12:11:27 +00:00
of backgammon) set by =--episodes= (defaults to 1,000). Summary results of the
games played during the training session are written to =models/$MODEL/logs/eval.log=
2018-03-11 11:59:57 +00:00
- =--eval=: Evaluates the nerual network using the methods specified by
=--eval-methods= for a the amount of episodes set by =--episodes= (defaults to
2018-03-11 12:11:27 +00:00
1,000). Results are written to =models/$MODEL/logs/eval.log=.
2018-03-11 11:59:57 +00:00
- =--play=: Allows the user to interactively play a game of backgammon against
the algorithm.
2018-03-11 12:11:27 +00:00
** Evaluation methods
2018-03-11 23:11:40 +00:00
Currently, the following evaluation methods are implemented:
2018-03-11 12:11:27 +00:00
2018-03-11 23:11:40 +00:00
- =pubeval=: Evaluates against the =pubeval= backgammon benchmark developed by
Gerald Tesauro. The source code is included in the =pubeval= directory and
needs to be compiled before use. The binary should be placed at
=pubeval/pubeval=.
2018-03-11 12:11:27 +00:00
- =random=: Evaluates by playing against a player that makes random moves drawn
from the set of legal moves. Should be used with high episode counts to lower
variance. *TODO*: Doesn't even work currently
** Examples
The following examples describe commmon operations.
*** Train default model
=python3 --train=
2018-03-12 14:18:44 +00:00
*** Train perpetually
=python3 --train --train-perpetually=
2018-03-11 12:11:27 +00:00
*** Train model named =quack=
=python3 --train --model=quack=
*** Train default model in sessions of 10,000 episodes
=python3 --train --episodes=10000=
*** Train model =quack= and evaluate after each training session
=python3 --train --eval-after-train --model=quack=
*** Evaluate model named =quack= using default evaluation method (currently =random=)
2018-03-11 23:11:55 +00:00
=python3 --eval --model=quack=
2018-03-11 12:11:27 +00:00
2018-03-11 23:11:40 +00:00
*** Evaluate default model using evaluation methods =random= and =pubeval=
2018-03-11 12:11:27 +00:00
2018-03-11 23:11:40 +00:00
=python3 --eval --eval-methods random pubeval=
2018-03-11 12:11:27 +00:00
* Model storage format
2018-03-11 11:59:57 +00:00
Models are stored in the directory =models=. If no model is specfied with the
=--model= option, the model is stored in the =models/default=
directory. Otherwise, the model is stored in =models/$MODEL=.
2018-03-11 12:11:27 +00:00
** Files
2018-03-11 11:59:57 +00:00
Along with the Tensorflow checkpoint files in the directory, the following files
are stored:
2018-03-11 23:11:55 +00:00
- =episodes_trained=: The number of episodes of training performed with the
2018-03-11 11:59:57 +00:00
model
- =logs/eval.log=: Log of all completed evaluations performed on the model. The
format of this file is specified in [[Log format]].
- =logs/train.log=: Log of all completed training sessions performed on the
model. If a training session is aborted before the pre-specified episode
target is reached, nothing will be written to this file, although
2018-03-11 23:11:55 +00:00
=episodes_trained= will be updated every time the model is saved to disk. The
2018-03-11 11:59:57 +00:00
format of this file is specified in [[Log format]].
2018-03-11 12:11:27 +00:00
** Log format
2018-03-11 11:59:57 +00:00
The evaluation and training log files (=logs/eval.log= and =logs/train.log=
respectively) are CSV-foramtted files with structure as described below. Both
files have semicolon-separated columns (=;=) and newline-separated rows (=\n=).
2018-03-11 12:11:27 +00:00
*** Evaluation log (=eval.log=)
2018-03-11 11:59:57 +00:00
Columns are written in the following order:
2018-03-11 12:11:27 +00:00
- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC
2018-03-11 11:59:57 +00:00
instead?) describing when the evaluation was finished.
- =method=: Short string describing the method used for evaluation.
- =trained_eps=: Amount of episodes trained with the model before evaluation
- =count=: Amount of episodes used for evaluation
- =sum=: Sum of outcomes of the games played during evaluation. Outcomes are
integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
2018-03-11 12:11:27 +00:00
algorithm scored neutrally. (*TODO*: Is this true?)
2018-03-11 11:59:57 +00:00
- =mean=: Mean of outcomes of the games played during evaluation. Outcomes are
integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
2018-03-11 12:11:27 +00:00
algorithm scored neutrally. (*TODO*: Is this true?)
2018-03-11 11:59:57 +00:00
2018-03-11 12:11:27 +00:00
*TODO*: Add example of log row
2018-03-11 11:59:57 +00:00
2018-03-11 12:11:27 +00:00
*** Training log (=train.log=)
2018-03-11 11:59:57 +00:00
Columns are written in the following order:
2018-03-11 12:11:27 +00:00
- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC
2018-03-11 11:59:57 +00:00
instead?) describing when the training session was finished.
- =trained_eps=: Amount of episodes trained with the model /after/ the training
session
- =count=: Amount of episodes used for training
- =sum=: Sum of outcomes of the games played during training. Outcomes are
integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
2018-03-11 12:11:27 +00:00
algorithm scored neutrally. (*TODO*: Is this true?)
2018-03-11 11:59:57 +00:00
- =mean=: Mean of outcomes of the games played during training. Outcomes are
integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
2018-03-11 12:11:27 +00:00
algorithm scored neutrally. (*TODO*: Is this true?)