README extended to include new pubeval

This commit is contained in:
Christoffer Müller Madsen 2018-03-14 11:12:11 +01:00
parent d52e4a597c
commit 6709a4bb1c
Signed by: christoffer
GPG Key ID: 337BA5A95E686EFD

View File

@ -4,12 +4,20 @@ Quack-TD is a backgammon playing algorithm based upon neural networks trained
through TD(\lambda)-learning. The algorithm is implemented using Python 3 and through TD(\lambda)-learning. The algorithm is implemented using Python 3 and
Tensorflow. Tensorflow.
* Setup
** Pubeval
To use Pubeval for evaluation the Python module =pubeval= must first be
installed. The necessary source files should be distributed alongside the main
application and located in the =pubeval= directory. The installation can be done
by entering the directory and running =python3 setup.py install= or =pip install
.=.
* Usage * Usage
The main executable is =main.py=. Various command-line options and switches can be used to The main executable is =main.py=. Various command-line options and switches can be used to
execute different stages and modify the behaviour of the program. All execute different stages and modify the behaviour of the program. All
command-line options and switches are listed by running =main.py= with the argument command-line options and switches are listed by running =main.py= with the argument
=--help=. The three central switches are listed below: =--help=. The central mode-switches are listed below:
- =--train=: Trains the neural network for a set amount of episodes (full games - =--train=: Trains the neural network for a set amount of episodes (full games
of backgammon) set by =--episodes= (defaults to 1,000). Summary results of the of backgammon) set by =--episodes= (defaults to 1,000). Summary results of the
@ -22,14 +30,17 @@ command-line options and switches are listed by running =main.py= with the argum
- =--play=: Allows the user to interactively play a game of backgammon against - =--play=: Allows the user to interactively play a game of backgammon against
the algorithm. the algorithm.
- =--list-models=: Lists the models stored on in the =models= folder.
** Evaluation methods ** Evaluation methods
Currently, the following evaluation methods are implemented: Currently, the following evaluation methods are implemented:
- =pubeval=: Evaluates against the =pubeval= backgammon benchmark developed by - =pubeval=: Evaluates against a Python extension based on the =pubeval=
Gerald Tesauro. The source code is included in the =pubeval= directory and backgammon benchmark developed by Gerald Tesauro. The source code is included
needs to be compiled before use. The binary should be placed at in the =pubeval= directory and needs to be installed before use. This can be
=pubeval/pubeval=. done by running =python3 setup.py install= or =pip install .= from the source
directory.
- =random=: Evaluates by playing against a player that makes random moves drawn - =random=: Evaluates by playing against a player that makes random moves drawn
from the set of legal moves. Should be used with high episode counts to lower from the set of legal moves. Should be used with high episode counts to lower
variance. *TODO*: Doesn't even work currently variance. *TODO*: Doesn't even work currently