To do a benchmark for `pubeval`, run `python3 main.py --bench-eval-scores --eval-methods pubeval` Logs will be placed in directory `bench` Use `plot_bench(data_path)` in `plot.py` for plotting