Implementation of backgammon in python 3.6.4
Go to file
2018-05-20 16:03:58 +02:00
bin train-evaluate-save 2018-03-28 15:32:22 +02:00
dumbeval update dumbeval weights 2018-03-27 12:57:06 +02:00
pubeval move dumbeval code to separate directory 2018-03-27 12:23:15 +02:00
quack fix bear_off bug; addtional tests and additional fixes 2018-05-12 15:18:52 +02:00
tensorflow_impl_tests More comments, backprop have been somewhat tested in the eager_main.py 2018-05-11 13:35:01 +02:00
__init__.py initial commit 2018-02-05 22:31:34 +01:00
.gitignore update .gitignore 2018-03-27 11:55:32 +02:00
actual_board.py You can now move off bar 2018-04-14 23:31:33 +02:00
app.py Everything might work, except for quad, that might be bugged. 2018-05-20 00:38:13 +02:00
board.py Added '--play' flag, so you can now play against the ai. 2018-05-14 13:07:48 +02:00
bot.py Some flags from main.py is gone, rolls now allow a face_value of 0 yet 2018-05-13 23:54:13 +02:00
cup.py oops 2018-03-14 14:02:56 +01:00
eval.py Functioning network using board representation shamelessly ripped from Tesauro 2018-03-27 02:26:15 +02:00
game.py Functioning network using board representation shamelessly ripped from Tesauro 2018-03-27 02:26:15 +02:00
human.py Bot reimplemented with new representation. 2018-02-22 14:01:28 +01:00
main.py remove dependency on yaml 2018-05-20 16:03:58 +02:00
network_test.py Some flags from main.py is gone, rolls now allow a face_value of 0 yet 2018-05-13 23:54:13 +02:00
network.py fix and clean 2018-05-18 14:55:10 +02:00
player.py fix and clean 2018-05-18 14:55:10 +02:00
plot.py print variances when plotting evaluation variance benchmark 2018-03-26 17:06:12 +02:00
README.org Update README.org 2018-03-14 10:19:17 +00:00
report_docs.txt add explanation of ply speedup 2018-05-13 22:26:24 +02:00
requirements.txt update TF dependency to 1.8.0 2018-05-10 19:27:51 +02:00
restore_bot.py Might be able to learn now (?) 2018-03-06 16:23:08 +01:00
test.py fix bear_off bug; addtional tests and additional fixes 2018-05-12 15:18:52 +02:00

Quack-TD

Quack-TD is a backgammon playing algorithm based upon neural networks trained through TD(λ)-learning. The algorithm is implemented using Python 3 and Tensorflow.

Setup

Pubeval

To use Pubeval for evaluation the Python module pubeval must first be installed. The necessary source files should be distributed alongside the main application and located in the pubeval directory. The installation can be done by entering the directory and running python3 setup.py install or pip install ..

Usage

The main executable is main.py. Various command-line options and switches can be used to execute different stages and modify the behaviour of the program. All command-line options and switches are listed by running main.py with the argument --help. The central mode-switches are listed below:

  • --train: Trains the neural network for a set amount of episodes (full games of backgammon) set by --episodes (defaults to 1,000). Summary results of the games played during the training session are written to models/$MODEL/logs/eval.log
  • --eval: Evaluates the nerual network using the methods specified by --eval-methods for a the amount of episodes set by --episodes (defaults to 1,000). Results are written to models/$MODEL/logs/eval.log.
  • --play: Allows the user to interactively play a game of backgammon against the algorithm.
  • --list-models: Lists the models stored on in the models folder.

Evaluation methods

Currently, the following evaluation methods are implemented:

  • pubeval: Evaluates against a Python extension based on the pubeval backgammon benchmark developed by Gerald Tesauro. The source code is included in the pubeval directory and needs to be installed before use. This can be done by running python3 setup.py install or pip install . from the source directory.
  • random: Evaluates by playing against a player that makes random moves drawn from the set of legal moves. Should be used with high episode counts to lower variance. TODO: Doesn't even work currently

Examples

The following examples describe commmon operations.

Train default model

python3 --train

Train perpetually

python3 --train --train-perpetually

Train model named quack

python3 --train --model=quack

Train default model in sessions of 10,000 episodes

python3 --train --episodes=10000

Train model quack and evaluate after each training session

python3 --train --eval-after-train --model=quack

Evaluate model named quack using default evaluation method (currently random)

python3 --eval --model=quack

Evaluate default model using evaluation methods random and pubeval

python3 --eval --eval-methods random pubeval

Model storage format

Models are stored in the directory models. If no model is specfied with the --model option, the model is stored in the models/default directory. Otherwise, the model is stored in models/$MODEL.

Files

Along with the Tensorflow checkpoint files in the directory, the following files are stored:

Log format

The evaluation and training log files (logs/eval.log and logs/train.log respectively) are CSV-foramtted files with structure as described below. Both files have semicolon-separated columns (;) and newline-separated rows (\n).

Evaluation log (eval.log)

Columns are written in the following order:

  • time: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the evaluation was finished.
  • method: Short string describing the method used for evaluation.
  • trained_eps: Amount of episodes trained with the model before evaluation
  • count: Amount of episodes used for evaluation
  • sum: Sum of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
  • mean: Mean of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)

TODO: Add example of log row

Training log (train.log)

Columns are written in the following order:

  • time: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the training session was finished.
  • trained_eps: Amount of episodes trained with the model after the training session
  • count: Amount of episodes used for training
  • sum: Sum of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
  • mean: Mean of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)