add README for project
This commit is contained in:
parent
49966cd207
commit
ea83f43085
3
.gitignore
vendored
3
.gitignore
vendored
|
@ -165,3 +165,6 @@ venv.bak/
|
|||
|
||||
|
||||
# End of https://www.gitignore.io/api/emacs,python
|
||||
|
||||
README.*
|
||||
!README.org
|
82
README.org
Normal file
82
README.org
Normal file
|
@ -0,0 +1,82 @@
|
|||
* Quack-TD
|
||||
|
||||
Quack-TD is a backgammon playing algorithm based upon neural networks trained
|
||||
through TD(\lambda)-learning.
|
||||
|
||||
** Usage
|
||||
|
||||
The main executable is =main.py=. Various command-line options and switches can be used to
|
||||
execute different stages and modify the behaviour of the program. All
|
||||
command-line options and switches are listed by running =main.py= with the argument
|
||||
=--help=. The three central switches are listed below:
|
||||
|
||||
- =--train=: Trains the neural network for a set amount of episodes (full games
|
||||
of backgammon) set by =--episodes= (defaults to 1,000).
|
||||
|
||||
- =--eval=: Evaluates the nerual network using the methods specified by
|
||||
=--eval-methods= for a the amount of episodes set by =--episodes= (defaults to
|
||||
1,000).
|
||||
|
||||
- =--play=: Allows the user to interactively play a game of backgammon against
|
||||
the algorithm.
|
||||
|
||||
** Model storage format
|
||||
|
||||
Models are stored in the directory =models=. If no model is specfied with the
|
||||
=--model= option, the model is stored in the =models/default=
|
||||
directory. Otherwise, the model is stored in =models/$MODEL=.
|
||||
|
||||
*** Files
|
||||
|
||||
Along with the Tensorflow checkpoint files in the directory, the following files
|
||||
are stored:
|
||||
|
||||
- =model.episodes=: The number of episodes of training performed with the
|
||||
model
|
||||
- =logs/eval.log=: Log of all completed evaluations performed on the model. The
|
||||
format of this file is specified in [[Log format]].
|
||||
- =logs/train.log=: Log of all completed training sessions performed on the
|
||||
model. If a training session is aborted before the pre-specified episode
|
||||
target is reached, nothing will be written to this file, although
|
||||
=model.episodes= will be updated every time the model is saved to disk. The
|
||||
format of this file is specified in [[Log format]].
|
||||
|
||||
*** Log format
|
||||
|
||||
The evaluation and training log files (=logs/eval.log= and =logs/train.log=
|
||||
respectively) are CSV-foramtted files with structure as described below. Both
|
||||
files have semicolon-separated columns (=;=) and newline-separated rows (=\n=).
|
||||
|
||||
**** Evaluation log (=eval.log=)
|
||||
|
||||
Columns are written in the following order:
|
||||
|
||||
- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC
|
||||
instead?) describing when the evaluation was finished.
|
||||
- =method=: Short string describing the method used for evaluation.
|
||||
- =trained_eps=: Amount of episodes trained with the model before evaluation
|
||||
- =count=: Amount of episodes used for evaluation
|
||||
- =sum=: Sum of outcomes of the games played during evaluation. Outcomes are
|
||||
integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
|
||||
algorithm scored neutrally. (TODO: Is this true?)
|
||||
- =mean=: Mean of outcomes of the games played during evaluation. Outcomes are
|
||||
integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
|
||||
algorithm scored neutrally. (TODO: Is this true?)
|
||||
|
||||
TODO: Add example of log row
|
||||
|
||||
**** Training log (=train.log=)
|
||||
|
||||
Columns are written in the following order:
|
||||
|
||||
- =time=: Unix time (Epoch time) timestamp in local time (TODO: should be UTC
|
||||
instead?) describing when the training session was finished.
|
||||
- =trained_eps=: Amount of episodes trained with the model /after/ the training
|
||||
session
|
||||
- =count=: Amount of episodes used for training
|
||||
- =sum=: Sum of outcomes of the games played during training. Outcomes are
|
||||
integers in the range of -2 to 2. A sum of 0 indicates that the evaluated
|
||||
algorithm scored neutrally. (TODO: Is this true?)
|
||||
- =mean=: Mean of outcomes of the games played during training. Outcomes are
|
||||
integers in the range of -2 to 2. A mean of 0 indicates that the evaluated
|
||||
algorithm scored neutrally. (TODO: Is this true?)
|
Loading…
Reference in New Issue
Block a user