GIGA 15 Reflections - SanchoGGP/ggp-base GitHub Wiki

These are personal reflections on the GIGA-15 papers, particularly with respect to their relevance to Sancho. Items that are possibly worthy of a closer look in the 2015-16 time-frame are marked :microscope:.

On the Cross-Domain Reusability of Neural Modules for General Video Game Playing

All about the transfer of chunks of neural network from one game to another. It'll be a long time (if ever) before this is relevant to us.

:microscope: Width-based planning for General Video-Game Playing

Introduces "Iterated Width Search", IW(i).
This is a form of breadth-first search which is aggressively pruned. For IW(i), if a new state doesn't contain a combination of i propositions that have never been set before, the state is pruned.
IW(1) is linear in the number of propositions, IW(2) is quadratic, etc..
Produces better results the MCTS over short timescales (40ms - 1s).
Mentions an MCTS variant which I haven't met before and might be interesting - OLMCTS.
:microscope: Looks worthy of some further exploration.

Game Description Language for Real-time Games

Introduces rtGDL.
Very general in its expressiveness. Allows things like a chess clock or a free-running game.
Would be a very different model and much harder to deal with.
Can be combined with GDL-II.
Seems unlikely to catch on any time soon (considering that GDL-II hasn't become the norm).

:microscope: Discounting and Pruning for Nested Playouts in General Game Playing

Nesting looks complicated (both conceptually and in computational expense).
:microscope: Vaguely related thoughts triggered by this paper...
- Perhaps we should take another look at discounting based on rollout depth.
- Consider using a moving average for node utility (s.t. samples from the more recent and fully formed tree and more heavily weighted than old samples which were created when the tree had fewer nodes and therefore more randomness in the playouts).
:microscope: The paper is directly applied to GGP. It might well be promising if I can fully get my head round it (although some back-of-the-envelope calculations by Steve indicate that it might just be too expensive).

Creating Action Heuristics for General Game Playing Agents

Produce general heuristics for an action (where the heuristic value for each action depends on the parent state).
Insight is that action heuristics are more useful than state heuristics for GGP. (Which is certainly true, although perhaps not always easy/possible to generate useful action heuristics.)
Paper includes a suggestion for a general mechanism to produce action heuristics.
Reported results suggest that it doesn't work very well and, when a duff heuristic is used to guide playouts, it's particularly harmful.
No automated method proposed to detect when a trial heuristic is good/bad.

Space-Consistent Game Equivalence Detection in General Game Playing

This paper fleshes out thoughts that we've previously had on detecting identical games from the propnets (or other representation). However, we've never needed it. Sancho's current mechanism works just fine - at least for now.

Defines a much more general model than Sancho.
Converts the problem to the well-understood problems of graph isomorphism + SAT, to leverage optimised solvers for these problems.
Detects symmetry.
Good starting point if this ever becomes an issue.

The GRL System: Learning Board Game Rules With Piece-Move Iterations

Leans game rules by watching games (no GDL)
Not relevant (having read the abstract only).

:microscope: MCTS Playouts Parallelization with a MPPA Architecture

Mostly irrelevant when not on the MPPA architecture in question (which has extremely limited memory, buts lots of CPUs)
The paper repeated the claim that lead parallelization (as used by Sancho) is worst. Either tree or root parallelization is better. Seems hard to believe and may depend on exactly what is meant by leaf parallelization. (If it's simply using multiple threads to rollout the same state - i.e. an increased sample size - then this is easily believable. However, we're dubious that root parallelization is really better than leaf parallelization as implemented by Sancho with virtual losses.)
Surprise result that per-rollout throw-away threads are better than keeping pooled workers alive + waiting for work. Worth some quick exploration, but I wonder if this is related to...
- Using C++
- Using Linux
- The MPPA architecture

Investigating MCTS Modifications in General Video Game Playing

Key invention in this paper is "Reversal Penalty" which penalises actions which return the player (in a video game) to a recently visited position.
Good in some video games. Bad in others. Net harmful.

A Neural Network Approach to Model Learning for Stochastic Discrete Environments

Not relevant to GGP (from reading the abstract).