backgammon/README.org

#+TITLE: Quack-TD

Quack-TD is a backgammon playing algorithm based upon neural networks trained
through TD(\lambda)-learning. The algorithm is implemented using Python 3 and
Tensorflow.

* Usage

The main executable is =main.py=. Various command-line options and switches can be used to
execute different stages and modify the behaviour of the program. All
command-line options and switches are listed by running =main.py= with the argument
=--help=. The three central switches are listed below:

- =--train=: Trains the neural network for a set amount of episodes (full games
  of backgammon) set by =--episodes= (defaults to 1,000). Summary results of the
  games played during the training session are written to =models/$MODEL/logs/eval.log=

- =--eval=: Evaluates the nerual network using the methods specified by
 =--eval-methods= for a the amount of episodes set by =--episodes= (defaults to
  1,000). Results are written to =models/$MODEL/logs/eval.log=.

- =--play=: Allows the user to interactively play a game of backgammon against
  the algorithm.

** Evaluation methods

Currently, the following evaluation methods are implemented:

- =pubeval=: Evaluates against the =pubeval= backgammon benchmark developed by
  Gerald Tesauro. The source code is included in the =pubeval= directory and
  needs to be compiled before use. The binary should be placed at
 =pubeval/pubeval=.
- =random=: Evaluates by playing against a player that makes random moves drawn
  from the set of legal moves. Should be used with high episode counts to lower
  variance. *TODO*: Doesn't even work currently

** Examples

The following examples describe commmon operations.

*** Train default model

=python3 --train=

*** Train perpetually

=python3 --train --train-perpetually=

*** Train model named =quack=

=python3 --train --model=quack=

*** Train default model in sessions of 10,000 episodes

=python3 --train --episodes=10000=

*** Train model =quack= and evaluate after each training session

=python3 --train --eval-after-train --model=quack=

*** Evaluate model named =quack= using default evaluation method (currently =random=)

=python3 --eval --model=quack=

*** Evaluate default model using evaluation methods =random= and =pubeval=

=python3 --eval --eval-methods random pubeval=

* Model storage format

Models are stored in the directory =models=. If no model is specfied with the
=--model= option, the model is stored in the =models/default=
directory. Otherwise, the model is stored in =models/$MODEL=.

** Files

Along with the Tensorflow checkpoint files in the directory, the following files
are stored:

- =episodes_trained=: The number of episodes of training performed with the
  model
- =logs/eval.log=: Log of all completed evaluations performed on the model. The
  format of this file is specified in [[Log format]].
- =logs/train.log=: Log of all completed training sessions performed on the
  model. If a training session is aborted before the pre-specified episode
  target is reached, nothing will be written to this file, although
 =episodes_trained= will be updated every time the model is saved to disk. The
  format of this file is specified in [[Log format]].

** Log format

The evaluation and training log files (=logs/eval.log= and =logs/train.log=
respectively) are CSV-foramtted files with structure as described below. Both
files have semicolon-separated columns (=;=) and newline-separated rows (=\n=).

*** Evaluation log (=eval.log=)

Columns are written in the following order:

- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC
  instead?) describing when the evaluation was finished.
- =method=: Short string describing the method used for evaluation.
- =trained_eps=: Amount of episodes trained with the model before evaluation
- =count=: Amount of episodes used for evaluation
- =sum=: Sum of outcomes of the games played during evaluation. Outcomes are
  integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
  algorithm scored neutrally. (*TODO*: Is this true?)
- =mean=: Mean of outcomes of the games played during evaluation. Outcomes are
  integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
  algorithm scored neutrally. (*TODO*: Is this true?)

*TODO*: Add example of log row

*** Training log (=train.log=)

Columns are written in the following order:

- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC
  instead?) describing when the training session was finished.
- =trained_eps=: Amount of episodes trained with the model /after/ the training
  session
- =count=: Amount of episodes used for training
- =sum=: Sum of outcomes of the games played during training. Outcomes are
  integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
  algorithm scored neutrally. (*TODO*: Is this true?)
- =mean=: Mean of outcomes of the games played during training. Outcomes are
  integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
  algorithm scored neutrally. (*TODO*: Is this true?)
update README 2018-03-11 12:11:27 +00:00			`#+TITLE: Quack-TD`
add README for project 2018-03-11 11:59:57 +00:00
			`Quack-TD is a backgammon playing algorithm based upon neural networks trained`
update README 2018-03-11 12:11:27 +00:00			`through TD(\lambda)-learning. The algorithm is implemented using Python 3 and`
			`Tensorflow.`
add README for project 2018-03-11 11:59:57 +00:00
update README 2018-03-11 12:11:27 +00:00			`* Usage`
add README for project 2018-03-11 11:59:57 +00:00
			`The main executable is =main.py=. Various command-line options and switches can be used to`
			`execute different stages and modify the behaviour of the program. All`
			`command-line options and switches are listed by running =main.py= with the argument`
			`=--help=. The three central switches are listed below:`

			`- =--train=: Trains the neural network for a set amount of episodes (full games`
update README 2018-03-11 12:11:27 +00:00			`of backgammon) set by =--episodes= (defaults to 1,000). Summary results of the`
			`games played during the training session are written to =models/$MODEL/logs/eval.log=`
add README for project 2018-03-11 11:59:57 +00:00
			`- =--eval=: Evaluates the nerual network using the methods specified by`
			`=--eval-methods= for a the amount of episodes set by =--episodes= (defaults to`
update README 2018-03-11 12:11:27 +00:00			`1,000). Results are written to =models/$MODEL/logs/eval.log=.`
add README for project 2018-03-11 11:59:57 +00:00
			`- =--play=: Allows the user to interactively play a game of backgammon against`
			`the algorithm.`

update README 2018-03-11 12:11:27 +00:00			`** Evaluation methods`

pubeval evaluation 2018-03-11 23:11:40 +00:00			`Currently, the following evaluation methods are implemented:`
update README 2018-03-11 12:11:27 +00:00
pubeval evaluation 2018-03-11 23:11:40 +00:00			`- =pubeval=: Evaluates against the =pubeval= backgammon benchmark developed by`
			`Gerald Tesauro. The source code is included in the =pubeval= directory and`
			`needs to be compiled before use. The binary should be placed at`
			`=pubeval/pubeval=.`
update README 2018-03-11 12:11:27 +00:00			`- =random=: Evaluates by playing against a player that makes random moves drawn`
			`from the set of legal moves. Should be used with high episode counts to lower`
			`variance. TODO: Doesn't even work currently`

			`** Examples`

			`The following examples describe commmon operations.`

			`*** Train default model`

			`=python3 --train=`

clean up 2018-03-12 14:18:44 +00:00			`*** Train perpetually`

			`=python3 --train --train-perpetually=`

update README 2018-03-11 12:11:27 +00:00			`*** Train model named =quack=`

			`=python3 --train --model=quack=`

			`*** Train default model in sessions of 10,000 episodes`

			`=python3 --train --episodes=10000=`

			`*** Train model =quack= and evaluate after each training session`

			`=python3 --train --eval-after-train --model=quack=`

			`*** Evaluate model named =quack= using default evaluation method (currently =random=)`

renaming parameters 2018-03-11 23:11:55 +00:00			`=python3 --eval --model=quack=`
update README 2018-03-11 12:11:27 +00:00
pubeval evaluation 2018-03-11 23:11:40 +00:00			`*** Evaluate default model using evaluation methods =random= and =pubeval=`
update README 2018-03-11 12:11:27 +00:00
pubeval evaluation 2018-03-11 23:11:40 +00:00			`=python3 --eval --eval-methods random pubeval=`
update README 2018-03-11 12:11:27 +00:00
			`* Model storage format`
add README for project 2018-03-11 11:59:57 +00:00
			`Models are stored in the directory =models=. If no model is specfied with the`
			`=--model= option, the model is stored in the =models/default=`
			`directory. Otherwise, the model is stored in =models/$MODEL=.`

update README 2018-03-11 12:11:27 +00:00			`** Files`
add README for project 2018-03-11 11:59:57 +00:00
			`Along with the Tensorflow checkpoint files in the directory, the following files`
			`are stored:`

renaming parameters 2018-03-11 23:11:55 +00:00			`- =episodes_trained=: The number of episodes of training performed with the`
add README for project 2018-03-11 11:59:57 +00:00			`model`
			`- =logs/eval.log=: Log of all completed evaluations performed on the model. The`
			`format of this file is specified in [[Log format]].`
			`- =logs/train.log=: Log of all completed training sessions performed on the`
			`model. If a training session is aborted before the pre-specified episode`
			`target is reached, nothing will be written to this file, although`
renaming parameters 2018-03-11 23:11:55 +00:00			`=episodes_trained= will be updated every time the model is saved to disk. The`
add README for project 2018-03-11 11:59:57 +00:00			`format of this file is specified in [[Log format]].`

update README 2018-03-11 12:11:27 +00:00			`** Log format`
add README for project 2018-03-11 11:59:57 +00:00
			`The evaluation and training log files (=logs/eval.log= and =logs/train.log=`
			`respectively) are CSV-foramtted files with structure as described below. Both`
			`files have semicolon-separated columns (=;=) and newline-separated rows (=\n=).`

update README 2018-03-11 12:11:27 +00:00			`*** Evaluation log (=eval.log=)`
add README for project 2018-03-11 11:59:57 +00:00
			`Columns are written in the following order:`

update README 2018-03-11 12:11:27 +00:00			`- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC`
add README for project 2018-03-11 11:59:57 +00:00			`instead?) describing when the evaluation was finished.`
			`- =method=: Short string describing the method used for evaluation.`
			`- =trained_eps=: Amount of episodes trained with the model before evaluation`
			`- =count=: Amount of episodes used for evaluation`
			`- =sum=: Sum of outcomes of the games played during evaluation. Outcomes are`
			`integers in the range of -2 to 2. A sum of 0 indicates that the evaluated`
update README 2018-03-11 12:11:27 +00:00			`algorithm scored neutrally. (TODO: Is this true?)`
add README for project 2018-03-11 11:59:57 +00:00			`- =mean=: Mean of outcomes of the games played during evaluation. Outcomes are`
			`integers in the range of -2 to 2. A mean of 0 indicates that the evaluated`
update README 2018-03-11 12:11:27 +00:00			`algorithm scored neutrally. (TODO: Is this true?)`
add README for project 2018-03-11 11:59:57 +00:00
update README 2018-03-11 12:11:27 +00:00			`TODO: Add example of log row`
add README for project 2018-03-11 11:59:57 +00:00
update README 2018-03-11 12:11:27 +00:00			`*** Training log (=train.log=)`
add README for project 2018-03-11 11:59:57 +00:00
			`Columns are written in the following order:`

update README 2018-03-11 12:11:27 +00:00			`- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC`
add README for project 2018-03-11 11:59:57 +00:00			`instead?) describing when the training session was finished.`
			`- =trained_eps=: Amount of episodes trained with the model /after/ the training`
			`session`
			`- =count=: Amount of episodes used for training`
			`- =sum=: Sum of outcomes of the games played during training. Outcomes are`
			`integers in the range of -2 to 2. A sum of 0 indicates that the evaluated`
update README 2018-03-11 12:11:27 +00:00			`algorithm scored neutrally. (TODO: Is this true?)`
add README for project 2018-03-11 11:59:57 +00:00			`- =mean=: Mean of outcomes of the games played during training. Outcomes are`
			`integers in the range of -2 to 2. A mean of 0 indicates that the evaluated`
update README 2018-03-11 12:11:27 +00:00			`algorithm scored neutrally. (TODO: Is this true?)`