backgammon

Implementation of backgammon in python 3.6.4

Go to file

Christoffer Müller Madsen ea83f43085 add README for project		2018-03-11 12:59:57 +01:00
bin	add script to save models as tar archives	2018-03-10 01:07:56 +01:00
__init__.py	initial commit	2018-02-05 22:31:34 +01:00
.gitignore	add README for project	2018-03-11 12:59:57 +01:00
board.py	Added player type and possibility of playing against network as player	2018-03-09 14:19:31 +01:00
bot.py	save and restore number of trained episodes	2018-03-10 00:22:20 +01:00
cup.py	Added player type and possibility of playing against network as player	2018-03-09 14:19:31 +01:00
dice.py	initial commit	2018-02-05 22:31:34 +01:00
game.py	save and restore number of trained episodes	2018-03-10 00:22:20 +01:00
human.py	Bot reimplemented with new representation.	2018-02-22 14:01:28 +01:00
main.py	training and evaluation stats are now logged by default to model/logs/	2018-03-10 00:39:55 +01:00
network_test.py	Potentially functioning network	2018-03-04 17:35:36 +01:00
network.py	save and restore number of trained episodes	2018-03-10 00:22:20 +01:00
player.py	typos	2018-03-09 21:02:41 +01:00
plot.py	small fixes	2018-03-08 17:51:32 +01:00
README.org	add README for project	2018-03-11 12:59:57 +01:00
requirements.txt	add requirements.txt	2018-03-06 12:04:40 +01:00
restore_bot.py	Might be able to learn now (?)	2018-03-06 16:23:08 +01:00
test.py	more tests	2018-03-06 12:29:03 +01:00

README.org

Quack-TD
- Usage
- Model storage format
  - Files
  - Log format
    - Evaluation log (eval.log)
    - Training log (train.log)

Quack-TD

Quack-TD is a backgammon playing algorithm based upon neural networks trained through TD(λ)-learning.

Usage

The main executable is main.py. Various command-line options and switches can be used to execute different stages and modify the behaviour of the program. All command-line options and switches are listed by running main.py with the argument --help. The three central switches are listed below:

--train: Trains the neural network for a set amount of episodes (full games of backgammon) set by --episodes (defaults to 1,000).
--eval: Evaluates the nerual network using the methods specified by --eval-methods for a the amount of episodes set by --episodes (defaults to 1,000).
--play: Allows the user to interactively play a game of backgammon against the algorithm.

Model storage format

Models are stored in the directory models. If no model is specfied with the --model option, the model is stored in the models/default directory. Otherwise, the model is stored in models/$MODEL.

Files

Along with the Tensorflow checkpoint files in the directory, the following files are stored:

model.episodes: The number of episodes of training performed with the model
logs/eval.log: Log of all completed evaluations performed on the model. The format of this file is specified in /Pownie/backgammon/src/commit/ea83f43085deb399a0506d01ffef45a5b3942390/Log%20format.
logs/train.log: Log of all completed training sessions performed on the model. If a training session is aborted before the pre-specified episode target is reached, nothing will be written to this file, although model.episodes will be updated every time the model is saved to disk. The format of this file is specified in /Pownie/backgammon/src/commit/ea83f43085deb399a0506d01ffef45a5b3942390/Log%20format.

Log format

The evaluation and training log files (logs/eval.log and logs/train.log respectively) are CSV-foramtted files with structure as described below. Both files have semicolon-separated columns (;) and newline-separated rows (\n).

Evaluation log (`eval.log`)

Columns are written in the following order:

time: Unix time (Epoch time) timestamp in local time (TODO: should be UTC instead?) describing when the evaluation was finished.
method: Short string describing the method used for evaluation.
trained_eps: Amount of episodes trained with the model before evaluation
count: Amount of episodes used for evaluation
sum: Sum of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)
mean: Mean of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated algorithm scored neutrally. (TODO: Is this true?)

TODO: Add example of log row

Training log (`train.log`)