update README

2018-03-11 13:11:27 +01:00 · 2018-03-11 13:11:27 +01:00 · 81461d917e
commit 81461d917e
parent ea83f43085
1 changed files with 55 additions and 17 deletions
--- a/README.org
+++ b/README.org
@ -1,9 +1,10 @@
-* Quack-TD
+#+TITLE: Quack-TD

 Quack-TD is a backgammon playing algorithm based upon neural networks trained
-through TD(\lambda)-learning.
+through TD(\lambda)-learning. The algorithm is implemented using Python 3 and
+Tensorflow.

-** Usage
+* Usage

 The main executable is =main.py=. Various command-line options and switches can be used to
 execute different stages and modify the behaviour of the program. All
@ -11,22 +12,59 @@ command-line options and switches are listed by running =main.py= with the argum
 =--help=. The three central switches are listed below:

 - =--train=: Trains the neural network for a set amount of episodes (full games
-  of backgammon) set by =--episodes= (defaults to 1,000).
+  of backgammon) set by =--episodes= (defaults to 1,000). Summary results of the
+  games played during the training session are written to =models/$MODEL/logs/eval.log=

 - =--eval=: Evaluates the nerual network using the methods specified by
 =--eval-methods= for a the amount of episodes set by =--episodes= (defaults to
-  1,000).
+  1,000). Results are written to =models/$MODEL/logs/eval.log=.

 - =--play=: Allows the user to interactively play a game of backgammon against
  the algorithm.

-** Model storage format
+** Evaluation methods
+
+Currently, only a single evaluation method is implemented:
+
+- =random=: Evaluates by playing against a player that makes random moves drawn
+  from the set of legal moves. Should be used with high episode counts to lower
+  variance. *TODO*: Doesn't even work currently
+
+** Examples
+
+The following examples describe commmon operations.
+
+*** Train default model
+
+=python3 --train=
+
+*** Train model named =quack=
+
+=python3 --train --model=quack=
+
+*** Train default model in sessions of 10,000 episodes
+
+=python3 --train --episodes=10000=
+
+*** Train model =quack= and evaluate after each training session
+
+=python3 --train --eval-after-train --model=quack=
+
+*** Evaluate model named =quack= using default evaluation method (currently =random=)
+
+=python3 --eval --model-name=quack=
+
+*** Evaluate default model using evaluation methods =random= and =foovaluation=
+
+=python3 --eval --eval-methods random foovaluation=
+
+* Model storage format

 Models are stored in the directory =models=. If no model is specfied with the
 =--model= option, the model is stored in the =models/default=
 directory. Otherwise, the model is stored in =models/$MODEL=.

-*** Files
+** Files

 Along with the Tensorflow checkpoint files in the directory, the following files
 are stored:
@ -41,42 +79,42 @@ are stored:
 =model.episodes= will be updated every time the model is saved to disk. The
  format of this file is specified in [[Log format]].

-*** Log format
+** Log format

 The evaluation and training log files (=logs/eval.log= and =logs/train.log=
 respectively) are CSV-foramtted files with structure as described below. Both
 files have semicolon-separated columns (=;=) and newline-separated rows (=\n=).

-**** Evaluation log (=eval.log=)
+*** Evaluation log (=eval.log=)

 Columns are written in the following order:

- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC
+- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC
  instead?) describing when the evaluation was finished.
 - =method=: Short string describing the method used for evaluation.
 - =trained_eps=: Amount of episodes trained with the model before evaluation
 - =count=: Amount of episodes used for evaluation
 - =sum=: Sum of outcomes of the games played during evaluation. Outcomes are
  integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
-  algorithm scored neutrally. (TODO: Is this true?)
+  algorithm scored neutrally. (*TODO*: Is this true?)
 - =mean=: Mean of outcomes of the games played during evaluation. Outcomes are
  integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
-  algorithm scored neutrally. (TODO: Is this true?)
+  algorithm scored neutrally. (*TODO*: Is this true?)

-TODO: Add example of log row
+*TODO*: Add example of log row

-**** Training log (=train.log=)
+*** Training log (=train.log=)

 Columns are written in the following order:

- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC
+- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC
  instead?) describing when the training session was finished.
 - =trained_eps=: Amount of episodes trained with the model /after/ the training
  session
 - =count=: Amount of episodes used for training
 - =sum=: Sum of outcomes of the games played during training. Outcomes are
  integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
-  algorithm scored neutrally. (TODO: Is this true?)
+  algorithm scored neutrally. (*TODO*: Is this true?)
 - =mean=: Mean of outcomes of the games played during training. Outcomes are
  integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
-  algorithm scored neutrally. (TODO: Is this true?)
+  algorithm scored neutrally. (*TODO*: Is this true?)