update README
This commit is contained in:
parent
ea83f43085
commit
81461d917e
72
README.org
72
README.org
|
@ -1,9 +1,10 @@
|
||||||
* Quack-TD
|
#+TITLE: Quack-TD
|
||||||
|
|
||||||
Quack-TD is a backgammon playing algorithm based upon neural networks trained
|
Quack-TD is a backgammon playing algorithm based upon neural networks trained
|
||||||
through TD(\lambda)-learning.
|
through TD(\lambda)-learning. The algorithm is implemented using Python 3 and
|
||||||
|
Tensorflow.
|
||||||
|
|
||||||
** Usage
|
* Usage
|
||||||
|
|
||||||
The main executable is =main.py=. Various command-line options and switches can be used to
|
The main executable is =main.py=. Various command-line options and switches can be used to
|
||||||
execute different stages and modify the behaviour of the program. All
|
execute different stages and modify the behaviour of the program. All
|
||||||
|
@ -11,22 +12,59 @@ command-line options and switches are listed by running =main.py= with the argum
|
||||||
=--help=. The three central switches are listed below:
|
=--help=. The three central switches are listed below:
|
||||||
|
|
||||||
- =--train=: Trains the neural network for a set amount of episodes (full games
|
- =--train=: Trains the neural network for a set amount of episodes (full games
|
||||||
of backgammon) set by =--episodes= (defaults to 1,000).
|
of backgammon) set by =--episodes= (defaults to 1,000). Summary results of the
|
||||||
|
games played during the training session are written to =models/$MODEL/logs/eval.log=
|
||||||
|
|
||||||
- =--eval=: Evaluates the nerual network using the methods specified by
|
- =--eval=: Evaluates the nerual network using the methods specified by
|
||||||
=--eval-methods= for a the amount of episodes set by =--episodes= (defaults to
|
=--eval-methods= for a the amount of episodes set by =--episodes= (defaults to
|
||||||
1,000).
|
1,000). Results are written to =models/$MODEL/logs/eval.log=.
|
||||||
|
|
||||||
- =--play=: Allows the user to interactively play a game of backgammon against
|
- =--play=: Allows the user to interactively play a game of backgammon against
|
||||||
the algorithm.
|
the algorithm.
|
||||||
|
|
||||||
** Model storage format
|
** Evaluation methods
|
||||||
|
|
||||||
|
Currently, only a single evaluation method is implemented:
|
||||||
|
|
||||||
|
- =random=: Evaluates by playing against a player that makes random moves drawn
|
||||||
|
from the set of legal moves. Should be used with high episode counts to lower
|
||||||
|
variance. *TODO*: Doesn't even work currently
|
||||||
|
|
||||||
|
** Examples
|
||||||
|
|
||||||
|
The following examples describe commmon operations.
|
||||||
|
|
||||||
|
*** Train default model
|
||||||
|
|
||||||
|
=python3 --train=
|
||||||
|
|
||||||
|
*** Train model named =quack=
|
||||||
|
|
||||||
|
=python3 --train --model=quack=
|
||||||
|
|
||||||
|
*** Train default model in sessions of 10,000 episodes
|
||||||
|
|
||||||
|
=python3 --train --episodes=10000=
|
||||||
|
|
||||||
|
*** Train model =quack= and evaluate after each training session
|
||||||
|
|
||||||
|
=python3 --train --eval-after-train --model=quack=
|
||||||
|
|
||||||
|
*** Evaluate model named =quack= using default evaluation method (currently =random=)
|
||||||
|
|
||||||
|
=python3 --eval --model-name=quack=
|
||||||
|
|
||||||
|
*** Evaluate default model using evaluation methods =random= and =foovaluation=
|
||||||
|
|
||||||
|
=python3 --eval --eval-methods random foovaluation=
|
||||||
|
|
||||||
|
* Model storage format
|
||||||
|
|
||||||
Models are stored in the directory =models=. If no model is specfied with the
|
Models are stored in the directory =models=. If no model is specfied with the
|
||||||
=--model= option, the model is stored in the =models/default=
|
=--model= option, the model is stored in the =models/default=
|
||||||
directory. Otherwise, the model is stored in =models/$MODEL=.
|
directory. Otherwise, the model is stored in =models/$MODEL=.
|
||||||
|
|
||||||
*** Files
|
** Files
|
||||||
|
|
||||||
Along with the Tensorflow checkpoint files in the directory, the following files
|
Along with the Tensorflow checkpoint files in the directory, the following files
|
||||||
are stored:
|
are stored:
|
||||||
|
@ -41,42 +79,42 @@ are stored:
|
||||||
=model.episodes= will be updated every time the model is saved to disk. The
|
=model.episodes= will be updated every time the model is saved to disk. The
|
||||||
format of this file is specified in [[Log format]].
|
format of this file is specified in [[Log format]].
|
||||||
|
|
||||||
*** Log format
|
** Log format
|
||||||
|
|
||||||
The evaluation and training log files (=logs/eval.log= and =logs/train.log=
|
The evaluation and training log files (=logs/eval.log= and =logs/train.log=
|
||||||
respectively) are CSV-foramtted files with structure as described below. Both
|
respectively) are CSV-foramtted files with structure as described below. Both
|
||||||
files have semicolon-separated columns (=;=) and newline-separated rows (=\n=).
|
files have semicolon-separated columns (=;=) and newline-separated rows (=\n=).
|
||||||
|
|
||||||
**** Evaluation log (=eval.log=)
|
*** Evaluation log (=eval.log=)
|
||||||
|
|
||||||
Columns are written in the following order:
|
Columns are written in the following order:
|
||||||
|
|
||||||
- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC
|
- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC
|
||||||
instead?) describing when the evaluation was finished.
|
instead?) describing when the evaluation was finished.
|
||||||
- =method=: Short string describing the method used for evaluation.
|
- =method=: Short string describing the method used for evaluation.
|
||||||
- =trained_eps=: Amount of episodes trained with the model before evaluation
|
- =trained_eps=: Amount of episodes trained with the model before evaluation
|
||||||
- =count=: Amount of episodes used for evaluation
|
- =count=: Amount of episodes used for evaluation
|
||||||
- =sum=: Sum of outcomes of the games played during evaluation. Outcomes are
|
- =sum=: Sum of outcomes of the games played during evaluation. Outcomes are
|
||||||
integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
|
integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
|
||||||
algorithm scored neutrally. (TODO: Is this true?)
|
algorithm scored neutrally. (*TODO*: Is this true?)
|
||||||
- =mean=: Mean of outcomes of the games played during evaluation. Outcomes are
|
- =mean=: Mean of outcomes of the games played during evaluation. Outcomes are
|
||||||
integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
|
integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
|
||||||
algorithm scored neutrally. (TODO: Is this true?)
|
algorithm scored neutrally. (*TODO*: Is this true?)
|
||||||
|
|
||||||
TODO: Add example of log row
|
*TODO*: Add example of log row
|
||||||
|
|
||||||
**** Training log (=train.log=)
|
*** Training log (=train.log=)
|
||||||
|
|
||||||
Columns are written in the following order:
|
Columns are written in the following order:
|
||||||
|
|
||||||
- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC
|
- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC
|
||||||
instead?) describing when the training session was finished.
|
instead?) describing when the training session was finished.
|
||||||
- =trained_eps=: Amount of episodes trained with the model /after/ the training
|
- =trained_eps=: Amount of episodes trained with the model /after/ the training
|
||||||
session
|
session
|
||||||
- =count=: Amount of episodes used for training
|
- =count=: Amount of episodes used for training
|
||||||
- =sum=: Sum of outcomes of the games played during training. Outcomes are
|
- =sum=: Sum of outcomes of the games played during training. Outcomes are
|
||||||
integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
|
integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
|
||||||
algorithm scored neutrally. (TODO: Is this true?)
|
algorithm scored neutrally. (*TODO*: Is this true?)
|
||||||
- =mean=: Mean of outcomes of the games played during training. Outcomes are
|
- =mean=: Mean of outcomes of the games played during training. Outcomes are
|
||||||
integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
|
integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
|
||||||
algorithm scored neutrally. (TODO: Is this true?)
|
algorithm scored neutrally. (*TODO*: Is this true?)
|
||||||
|
|
Loading…
Reference in New Issue
Block a user