diff --git a/.gitignore b/.gitignore index 0fbef80..8ce1b00 100644 --- a/.gitignore +++ b/.gitignore @@ -165,3 +165,6 @@ venv.bak/ # End of https://www.gitignore.io/api/emacs,python + +README.* +!README.org \ No newline at end of file diff --git a/README.org b/README.org new file mode 100644 index 0000000..2a7d506 --- /dev/null +++ b/README.org @@ -0,0 +1,82 @@ +* Quack-TD + +Quack-TD is a backgammon playing algorithm based upon neural networks trained +through TD(\lambda)-learning. + +** Usage + +The main executable is =main.py=. Various command-line options and switches can be used to +execute different stages and modify the behaviour of the program. All +command-line options and switches are listed by running =main.py= with the argument +=--help=. The three central switches are listed below: + +- =--train=: Trains the neural network for a set amount of episodes (full games + of backgammon) set by =--episodes= (defaults to 1,000). + +- =--eval=: Evaluates the nerual network using the methods specified by + =--eval-methods= for a the amount of episodes set by =--episodes= (defaults to + 1,000). + +- =--play=: Allows the user to interactively play a game of backgammon against + the algorithm. + +** Model storage format + +Models are stored in the directory =models=. If no model is specfied with the +=--model= option, the model is stored in the =models/default= +directory. Otherwise, the model is stored in =models/$MODEL=. + +*** Files + +Along with the Tensorflow checkpoint files in the directory, the following files +are stored: + +- =model.episodes=: The number of episodes of training performed with the + model +- =logs/eval.log=: Log of all completed evaluations performed on the model. The + format of this file is specified in [[Log format]]. +- =logs/train.log=: Log of all completed training sessions performed on the + model. If a training session is aborted before the pre-specified episode + target is reached, nothing will be written to this file, although + =model.episodes= will be updated every time the model is saved to disk. The + format of this file is specified in [[Log format]]. + +*** Log format + +The evaluation and training log files (=logs/eval.log= and =logs/train.log= +respectively) are CSV-foramtted files with structure as described below. Both +files have semicolon-separated columns (=;=) and newline-separated rows (=\n=). + +**** Evaluation log (=eval.log=) + +Columns are written in the following order: + +- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC + instead?) describing when the evaluation was finished. +- =method=: Short string describing the method used for evaluation. +- =trained_eps=: Amount of episodes trained with the model before evaluation +- =count=: Amount of episodes used for evaluation +- =sum=: Sum of outcomes of the games played during evaluation. Outcomes are + integers in the range of -2 to 2. A sum of 0 indicates that the evaluated + algorithm scored neutrally. (TODO: Is this true?) +- =mean=: Mean of outcomes of the games played during evaluation. Outcomes are + integers in the range of -2 to 2. A mean of 0 indicates that the evaluated + algorithm scored neutrally. (TODO: Is this true?) + +TODO: Add example of log row + +**** Training log (=train.log=) + +Columns are written in the following order: + +- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC + instead?) describing when the training session was finished. +- =trained_eps=: Amount of episodes trained with the model /after/ the training + session +- =count=: Amount of episodes used for training +- =sum=: Sum of outcomes of the games played during training. Outcomes are + integers in the range of -2 to 2. A sum of 0 indicates that the evaluated + algorithm scored neutrally. (TODO: Is this true?) +- =mean=: Mean of outcomes of the games played during training. Outcomes are + integers in the range of -2 to 2. A mean of 0 indicates that the evaluated + algorithm scored neutrally. (TODO: Is this true?)