diff --git a/README.org b/README.org index 2a7d506..dd45173 100644 --- a/README.org +++ b/README.org @@ -1,9 +1,10 @@ -* Quack-TD +#+TITLE: Quack-TD Quack-TD is a backgammon playing algorithm based upon neural networks trained -through TD(\lambda)-learning. +through TD(\lambda)-learning. The algorithm is implemented using Python 3 and +Tensorflow. -** Usage +* Usage The main executable is =main.py=. Various command-line options and switches can be used to execute different stages and modify the behaviour of the program. All @@ -11,22 +12,59 @@ command-line options and switches are listed by running =main.py= with the argum =--help=. The three central switches are listed below: - =--train=: Trains the neural network for a set amount of episodes (full games - of backgammon) set by =--episodes= (defaults to 1,000). + of backgammon) set by =--episodes= (defaults to 1,000). Summary results of the + games played during the training session are written to =models/$MODEL/logs/eval.log= - =--eval=: Evaluates the nerual network using the methods specified by =--eval-methods= for a the amount of episodes set by =--episodes= (defaults to - 1,000). + 1,000). Results are written to =models/$MODEL/logs/eval.log=. - =--play=: Allows the user to interactively play a game of backgammon against the algorithm. -** Model storage format +** Evaluation methods + +Currently, only a single evaluation method is implemented: + +- =random=: Evaluates by playing against a player that makes random moves drawn + from the set of legal moves. Should be used with high episode counts to lower + variance. *TODO*: Doesn't even work currently + +** Examples + +The following examples describe commmon operations. + +*** Train default model + +=python3 --train= + +*** Train model named =quack= + +=python3 --train --model=quack= + +*** Train default model in sessions of 10,000 episodes + +=python3 --train --episodes=10000= + +*** Train model =quack= and evaluate after each training session + +=python3 --train --eval-after-train --model=quack= + +*** Evaluate model named =quack= using default evaluation method (currently =random=) + +=python3 --eval --model-name=quack= + +*** Evaluate default model using evaluation methods =random= and =foovaluation= + +=python3 --eval --eval-methods random foovaluation= + +* Model storage format Models are stored in the directory =models=. If no model is specfied with the =--model= option, the model is stored in the =models/default= directory. Otherwise, the model is stored in =models/$MODEL=. -*** Files +** Files Along with the Tensorflow checkpoint files in the directory, the following files are stored: @@ -41,42 +79,42 @@ are stored: =model.episodes= will be updated every time the model is saved to disk. The format of this file is specified in [[Log format]]. -*** Log format +** Log format The evaluation and training log files (=logs/eval.log= and =logs/train.log= respectively) are CSV-foramtted files with structure as described below. Both files have semicolon-separated columns (=;=) and newline-separated rows (=\n=). -**** Evaluation log (=eval.log=) +*** Evaluation log (=eval.log=) Columns are written in the following order: -- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC +- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC instead?) describing when the evaluation was finished. - =method=: Short string describing the method used for evaluation. - =trained_eps=: Amount of episodes trained with the model before evaluation - =count=: Amount of episodes used for evaluation - =sum=: Sum of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated - algorithm scored neutrally. (TODO: Is this true?) + algorithm scored neutrally. (*TODO*: Is this true?) - =mean=: Mean of outcomes of the games played during evaluation. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated - algorithm scored neutrally. (TODO: Is this true?) + algorithm scored neutrally. (*TODO*: Is this true?) -TODO: Add example of log row +*TODO*: Add example of log row -**** Training log (=train.log=) +*** Training log (=train.log=) Columns are written in the following order: -- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC +- =time=: Unix time (Epoch time) timestamp in local time (*TODO*: should be UTC instead?) describing when the training session was finished. - =trained_eps=: Amount of episodes trained with the model /after/ the training session - =count=: Amount of episodes used for training - =sum=: Sum of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A sum of 0 indicates that the evaluated - algorithm scored neutrally. (TODO: Is this true?) + algorithm scored neutrally. (*TODO*: Is this true?) - =mean=: Mean of outcomes of the games played during training. Outcomes are integers in the range of -2 to 2. A mean of 0 indicates that the evaluated - algorithm scored neutrally. (TODO: Is this true?) + algorithm scored neutrally. (*TODO*: Is this true?)