backgammon/README.org

3.5 KiB

Quack-TD

Quack-TD is a backgammon playing algorithm based upon neural networks trained through TD(λ)-learning.

Usage

The main executable is main.py. Various command-line options and switches can be used to execute different stages and modify the behaviour of the program. All command-line options and switches are listed by running main.py with the argument --help. The three central switches are listed below:

  • --train: Trains the neural network for a set amount of episodes (full games of backgammon) set by --episodes (defaults to 1,000).
  • --eval: Evaluates the nerual network using the methods specified by --eval-methods for a the amount of episodes set by --episodes (defaults to 1,000).
  • --play: Allows the user to interactively play a game of backgammon against the algorithm.

Model storage format

Models are stored in the directory models. If no model is specfied with the --model option, the model is stored in the models/default directory. Otherwise, the model is stored in models/$MODEL.

Files

Along with the Tensorflow checkpoint files in the directory, the following files are stored:

Log format

The evaluation and training log files (logs/eval.log and logs/train.log respectively) are CSV-foramtted files with structure as described below. Both files have semicolon-separated columns (;) and newline-separated rows (\n).

Evaluation log (eval.log)

Columns are written in the following order:

  • time: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the evaluation was finished.
  • method: Short string describing the method used for evaluation.
  • trained_eps: Amount of episodes trained with the model before evaluation
  • count: Amount of episodes used for evaluation
  • sum: Sum of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
  • mean: Mean of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)

TODO: Add example of log row

Training log (train.log)

Columns are written in the following order:

  • time: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the training session was finished.
  • trained_eps: Amount of episodes trained with the model after the training session
  • count: Amount of episodes used for training
  • sum: Sum of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
  • mean: Mean of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)