Implementation of backgammon in python 3.6.4
Go to file
2018-03-11 20:00:24 +01:00
bin add script to save models as tar archives 2018-03-10 01:07:56 +01:00
pubeval Added a bunch of pubeval stuff 2018-03-11 20:00:24 +01:00
__init__.py initial commit 2018-02-05 22:31:34 +01:00
.gitignore add README for project 2018-03-11 12:59:57 +01:00
board.py Added a bunch of pubeval stuff 2018-03-11 20:00:24 +01:00
bot.py Added a bunch of pubeval stuff 2018-03-11 20:00:24 +01:00
cup.py Added player type and possibility of playing against network as player 2018-03-09 14:19:31 +01:00
dice.py initial commit 2018-02-05 22:31:34 +01:00
game.py Added a bunch of pubeval stuff 2018-03-11 20:00:24 +01:00
human.py Bot reimplemented with new representation. 2018-02-22 14:01:28 +01:00
main.py training and evaluation stats are now logged by default to model/logs/ 2018-03-10 00:39:55 +01:00
network_test.py Potentially functioning network 2018-03-04 17:35:36 +01:00
network.py save and restore number of trained episodes 2018-03-10 00:22:20 +01:00
player.py typos 2018-03-09 21:02:41 +01:00
plot.py small fixes 2018-03-08 17:51:32 +01:00
pubeval_bin Added a bunch of pubeval stuff 2018-03-11 20:00:24 +01:00
README.org update README 2018-03-11 13:11:27 +01:00
requirements.txt add requirements.txt 2018-03-06 12:04:40 +01:00
restore_bot.py Might be able to learn now (?) 2018-03-06 16:23:08 +01:00
test.py more tests 2018-03-06 12:29:03 +01:00

Quack-TD

Quack-TD is a backgammon playing algorithm based upon neural networks trained through TD(λ)-learning. The algorithm is implemented using Python 3 and Tensorflow.

Usage

The main executable is main.py. Various command-line options and switches can be used to execute different stages and modify the behaviour of the program. All command-line options and switches are listed by running main.py with the argument --help. The three central switches are listed below:

  • --train: Trains the neural network for a set amount of episodes (full games of backgammon) set by --episodes (defaults to 1,000). Summary results of the games played during the training session are written to models/$MODEL/logs/eval.log
  • --eval: Evaluates the nerual network using the methods specified by --eval-methods for a the amount of episodes set by --episodes (defaults to 1,000). Results are written to models/$MODEL/logs/eval.log.
  • --play: Allows the user to interactively play a game of backgammon against the algorithm.

Evaluation methods

Currently, only a single evaluation method is implemented:

  • random: Evaluates by playing against a player that makes random moves drawn from the set of legal moves. Should be used with high episode counts to lower variance. TODO: Doesn't even work currently

Examples

The following examples describe commmon operations.

Train default model

python3 --train

Train model named quack

python3 --train --model=quack

Train default model in sessions of 10,000 episodes

python3 --train --episodes=10000

Train model quack and evaluate after each training session

python3 --train --eval-after-train --model=quack

Evaluate model named quack using default evaluation method (currently random)

python3 --eval --model-name=quack

Evaluate default model using evaluation methods random and foovaluation

python3 --eval --eval-methods random foovaluation

Model storage format

Models are stored in the directory models. If no model is specfied with the --model option, the model is stored in the models/default directory. Otherwise, the model is stored in models/$MODEL.

Files

Along with the Tensorflow checkpoint files in the directory, the following files are stored:

Log format

The evaluation and training log files (logs/eval.log and logs/train.log respectively) are CSV-foramtted files with structure as described below. Both files have semicolon-separated columns (;) and newline-separated rows (\n).

Evaluation log (eval.log)

Columns are written in the following order:

  • time: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the evaluation was finished.
  • method: Short string describing the method used for evaluation.
  • trained_eps: Amount of episodes trained with the model before evaluation
  • count: Amount of episodes used for evaluation
  • sum: Sum of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
  • mean: Mean of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)

TODO: Add example of log row

Training log (train.log)

Columns are written in the following order:

  • time: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the training session was finished.
  • trained_eps: Amount of episodes trained with the model after the training session
  • count: Amount of episodes used for training
  • sum: Sum of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
  • mean: Mean of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)