CS7545_Sp24_Lecture_22: Game Theory - mltheory/CS7545 GitHub Wiki
Problem Pretext
An application of online learning that we have not previously discussed is for solving min-max problems and in particular solving zero-sum games. It's a little bit surprising because you would not think about there being a connection naturally.
Game theory generally involves trying to understand the behavior of multiple players that interact. Through a function of the actions that they take, the players receive rewards or losses (real numbers) during the game. It's a way to understand their behavior when multiple parties get loss or gain as a function of the actions of all of the parties in question. (In this analogy optimization looks at how one party can choose an action (vector) or number to minimize some function. And often, the single agent view is the standard view that we take. Game theory is more of a multi-agent view in comparison, where the gain to the players can depend on other players in the game.
The simplest version of this is the two player zero-sum game. Every zero-sum game can be represented by a matrix where:
- a row player chooses actions from the rows of this matrix
$i \in n$ - a column player who chooses some option of action
$j \in 1... m$ -
$M_{ij}$ is the gain to the column player -
$M_{ij}$ is also called the loss to the to the row player (so that$-M_{ij}$ is the gain to the to the row player)
This applies to lots of games; every finite action space zero-sum game is defined by a matrix and every matrix can be thought of as as representing a zero-sum game. With zero-sum games, there is just one matrix because the gain of one player is the loss to another player. If one player gains
A simple example of a common zero-sum game is Rock, Paper, Scissors.
If a row player plays rock, and the column player plays scissors, rock beats scissors, so the column player lost and gets a -1. The row player's loss is -1 and his gain is 1. The column player's gain is -1 (and loss is 1).
If you have no information about your opponent, the optimal strategy is to randomize your actions uniformly, meaning putting equal weight on rock paper and scissors.
Consider that a "mixed strategy" for either player is the distribution:
We can write this quantity (nested optimization) in matrix notation.
Now we will try to demonstrate the the following:
Two optimizations are happening here. First, we optimize the inner quantity and then the outer quantity. (Because, by optimizing the inner piece, we've sort of "committed" to some
Logically, this strategy is naturally worse than
However, with the
This result is also known as Von Neumann's Minimax Theorem. A formal proof of this can be found here: