backgammon/README.org

4.7 KiB

Quack-TD

Quack-TD is a backgammon playing algorithm based upon neural networks trained through TD(λ)-learning. The algorithm is implemented using Python 3 and Tensorflow.

Usage

The main executable is main.py. Various command-line options and switches can be used to execute different stages and modify the behaviour of the program. All command-line options and switches are listed by running main.py with the argument --help. The three central switches are listed below:

  • --train: Trains the neural network for a set amount of episodes (full games of backgammon) set by --episodes (defaults to 1,000). Summary results of the games played during the training session are written to models/$MODEL/logs/eval.log
  • --eval: Evaluates the nerual network using the methods specified by --eval-methods for a the amount of episodes set by --episodes (defaults to 1,000). Results are written to models/$MODEL/logs/eval.log.
  • --play: Allows the user to interactively play a game of backgammon against the algorithm.

Evaluation methods

Currently, only a single evaluation method is implemented:

  • random: Evaluates by playing against a player that makes random moves drawn from the set of legal moves. Should be used with high episode counts to lower variance. TODO: Doesn't even work currently

Examples

The following examples describe commmon operations.

Train default model

python3 --train

Train model named quack

python3 --train --model=quack

Train default model in sessions of 10,000 episodes

python3 --train --episodes=10000

Train model quack and evaluate after each training session

python3 --train --eval-after-train --model=quack

Evaluate model named quack using default evaluation method (currently random)

python3 --eval --model-name=quack

Evaluate default model using evaluation methods random and foovaluation

python3 --eval --eval-methods random foovaluation

Model storage format

Models are stored in the directory models. If no model is specfied with the --model option, the model is stored in the models/default directory. Otherwise, the model is stored in models/$MODEL.

Files

Along with the Tensorflow checkpoint files in the directory, the following files are stored:

Log format

The evaluation and training log files (logs/eval.log and logs/train.log respectively) are CSV-foramtted files with structure as described below. Both files have semicolon-separated columns (;) and newline-separated rows (\n).

Evaluation log (eval.log)

Columns are written in the following order:

  • time: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the evaluation was finished.
  • method: Short string describing the method used for evaluation.
  • trained_eps: Amount of episodes trained with the model before evaluation
  • count: Amount of episodes used for evaluation
  • sum: Sum of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
  • mean: Mean of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)

TODO: Add example of log row

Training log (train.log)

Columns are written in the following order:

  • time: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the training session was finished.
  • trained_eps: Amount of episodes trained with the model after the training session
  • count: Amount of episodes used for training
  • sum: Sum of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
  • mean: Mean of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)