README extended to include new pubeval

This commit is contained in:
Christoffer Müller Madsen 2018-03-14 11:12:11 +01:00
parent d52e4a597c
commit 6709a4bb1c
Signed by: christoffer
GPG Key ID: 337BA5A95E686EFD

View File

@ -4,12 +4,20 @@ Quack-TD is a backgammon playing algorithm based upon neural networks trained
through TD(\lambda)-learning. The algorithm is implemented using Python 3 and
Tensorflow.
* Setup
** Pubeval
To use Pubeval for evaluation the Python module =pubeval= must first be
installed. The necessary source files should be distributed alongside the main
application and located in the =pubeval= directory. The installation can be done
by entering the directory and running =python3 setup.py install= or =pip install
.=.
* Usage
The main executable is =main.py=. Various command-line options and switches can be used to
execute different stages and modify the behaviour of the program. All
command-line options and switches are listed by running =main.py= with the argument
=--help=. The three central switches are listed below:
=--help=. The central mode-switches are listed below:
- =--train=: Trains the neural network for a set amount of episodes (full games
of backgammon) set by =--episodes= (defaults to 1,000). Summary results of the
@ -22,14 +30,17 @@ command-line options and switches are listed by running =main.py= with the argum
- =--play=: Allows the user to interactively play a game of backgammon against
the algorithm.
- =--list-models=: Lists the models stored on in the =models= folder.
** Evaluation methods
Currently, the following evaluation methods are implemented:
- =pubeval=: Evaluates against the =pubeval= backgammon benchmark developed by
Gerald Tesauro. The source code is included in the =pubeval= directory and
needs to be compiled before use. The binary should be placed at
=pubeval/pubeval=.
- =pubeval=: Evaluates against a Python extension based on the =pubeval=
backgammon benchmark developed by Gerald Tesauro. The source code is included
in the =pubeval= directory and needs to be installed before use. This can be
done by running =python3 setup.py install= or =pip install .= from the source
directory.
- =random=: Evaluates by playing against a player that makes random moves drawn
from the set of legal moves. Should be used with high episode counts to lower
variance. *TODO*: Doesn't even work currently