CS 7545: Machine Learning Theory -- Spring 2024

Instructor: Jacob Abernethy

Notes for Lecture 22: Game Theory

April 11, 2024

Scribes: Alexandra Stewart

Problem Pretext

An application of online learning that we have not previously discussed is for solving min-max problems and in particular solving zero-sum games. It's a little bit surprising because you would not think about there being a connection naturally.

Game Theory

Game theory generally involves trying to understand the behavior of multiple players that interact. Through a function of the actions that they take, the players receive rewards or losses (real numbers) during the game. It's a way to understand their behavior when multiple parties get loss or gain as a function of the actions of all of the parties in question. (In this analogy optimization looks at how one party can choose an action (vector) or number to minimize some function. And often, the single agent view is the standard view that we take. Game theory is more of a multi-agent view in comparison, where the gain to the players can depend on other players in the game.

Zero-sum games

The simplest version of this is the two player zero-sum game. Every zero-sum game can be represented by a matrix where:

a row player chooses actions from the rows of this matrix $i \in n$
a column player who chooses some option of action $j \in 1... m$
$M_{ij}$ is the gain to the column player
$M_{ij}$ is also called the loss to the to the row player (so that $-M_{ij}$ is the gain to the to the row player)

This applies to lots of games; every finite action space zero-sum game is defined by a matrix and every matrix can be thought of as as representing a zero-sum game. With zero-sum games, there is just one matrix because the gain of one player is the loss to another player. If one player gains $M_{ij}$ then the other player loses $M_{ij}$.

Example: Rock, Paper, Scissors

A simple example of a common zero-sum game is Rock, Paper, Scissors.

If a row player plays rock, and the column player plays scissors, rock beats scissors, so the column player lost and gets a -1. The row player's loss is -1 and his gain is 1. The column player's gain is -1 (and loss is 1). $M_{ij}$ is the gain to the column player. You can think of Rock, Paper, Scissors through the lens of the matrix by writing the payoffs.

$$ M=\begin{bmatrix} & R & P & S \\ R & 0 &+1 & -1 \\ P & -1 & 0 & +1 \\ S & +1 & -1 & 0\\ \end{bmatrix} $$

If you have no information about your opponent, the optimal strategy is to randomize your actions uniformly, meaning putting equal weight on rock paper and scissors.

Consider that a "mixed strategy" for either player is the distribution: $$\vec{p} \in \Delta_n (\Delta_m)$$ Is there a notion of optimality for both players when it comes to choosing their next strategy? We observe that if row player plays $\vec{p} \in \Delta_n$ and the column player plays $\vec{q} \in \Delta_m$, the expected payoff to column player is:

$$\mathbb{E}_{i \in \Delta_p , j in \Delta_q} M_{ij} = \sum_{i=1}^n \sum_{i=1}^m p_i q_i M_{ij} = p^T M q$$

We can write this quantity (nested optimization) in matrix notation.

Now we will try to demonstrate the the following:

$$\min_{p \in \Delta_n} (\max_{q \in \Delta_m } p^TMq) \geq \max_{q \in \Delta_m } (\min_{p \in \Delta_n} p^TMq)$$

Two optimizations are happening here. First, we optimize the inner quantity and then the outer quantity. (Because, by optimizing the inner piece, we've sort of "committed" to some $p$ and subsequently optimize the $q$ as a function of this outcome. Thus we have that the P player is playing first in the following expression:

$$\max_{q \in \Delta_m } (\min_{p \in \Delta_n} p^TMq)$$

Logically, this strategy is naturally worse than $\min_{p \in \Delta_n} (\max_{q \in \Delta_m } p^TMq)$, as it allows for your opponent to make to maximize your loss for any strategy you chose. This conclusion is also known as weak duality, which can be used to formally demonstrate this conclusion.

However, with the $M$ in Zero-Sum games, it can be demonstrated that:

$$\min_p (\max_q p^TMq) = \max_q (\min_p p^TMq)$$

This result is also known as Von Neumann's Minimax Theorem. A formal proof of this can be found here:

CS7545_Sp24_Lecture_22: Game Theory - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Instructor: Jacob Abernethy

Notes for Lecture 22: Game Theory

April 11, 2024

Scribes: Alexandra Stewart

Game Theory

Zero-sum games

Example: Rock, Paper, Scissors

⚠️ GitHub.com Fallback ⚠️

CS7545_Sp24_Lecture_22: Game Theory - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Instructor: Jacob Abernethy

Notes for Lecture 22: Game Theory

April 11, 2024

Scribes: Alexandra Stewart

Game Theory

Zero-sum games

Example: Rock, Paper, Scissors

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️