CS7545_Sp24_Lecture_22 - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Notes for Lecture 22: Game Theory Applications to Online Learning

April 11, 2024

Scribes: Alexandra Stewart,Jackson Dike

Game Theory

Game theory involves trying to understand the behavior of multiple players that interact. Through a function of the actions that they take, the players receive rewards or losses (real numbers) during the game. It's a way to understand their behavior when multiple parties get loss or gain as a function of the actions of all of the parties in question. (In this analogy optimization looks at how one party can choose an action (vector) or number to minimize some function. Game theory is more of a multi-agent view in comparison, where the gain to the players can depend on others players in the game.

Zero sum games

The simplest version of this is the two player zero-sum game. Every zero-sum game can be represented by a matrix where:

  • a row player chooses actions from the rows of this matrix $i \in n$
  • a column player who chooses some option of an action $j \in 1... m$
  • $M_{ij}$ is the gain to the column player
  • $M_{ij}$ is also called the loss to the to the row player (so that $-M_{ij}$ is the gain to the to the row player)

This applies to lots of games; every finite action space zero-sum game is defined by a matrix and every matrix can be thought of as as representing a zero-sum game. With zero-sum games there is just one matrix because the gain of one player is the loss to another player. If 1 player gains $M_{ij}$ then the other player loses $M_{ij}$.

Example: Rock, Paper, Scissors

A simple example of a common zero-sum game is Rock, Paper, Scissors.

If a row player plays rock, and the column player plays scissors, rock beats scissors, so the column player lost and gets a -1. The row player's loss is -1 and his gain is 1. The column player's gain is -1 (and loss is 1). $M_{ij}$ is the gain to the column player. You can think of Rock, Paper, Scissors through the lens of the matrix by writing the payoffs.

$$\begin{equation} M=\begin{bmatrix} & R & P & S \\ R & 0 &+1 & -1 \\ P & -1 & 0 & +1 \\ S & +1 & -1 & 0\\ \end{bmatrix} \end{equation}$$

If you have no information about your opponent, the optimal strategy is to randomize your actions uniformly, meaning putting equal weight on rock paper and scissors.

Consider that a "mixed strategy" for either player is a distribution $\vec{p} \in \Delta_n (\text{ on } \Delta_m)$. Is there a notion of optimality for both players when it comes to choosing their next strategy? We observe that if row player plays $\vec{p} \in \Delta_n$ and the column player plays $\vec{q}\in\Delta_m$ , the expected payoff to column player is

$$ \begin{equation} E_{i \in \Delta_p, j in \Delta_q} M_{ij} = \sum_{i=1}^n \sum_{i=1}^m p_i q_i M_{ij} = p^T M q \end{equation} $$

we can write this quantity (nested optimization) in matrix notation. We can write the following:

$$ \begin{equation} \min_{p \in \Delta_n} (\max_{q \in \Delta_m } p^TMq) \geq \max_{q \in \Delta_m } (\min_{p \in \Delta_n} p^TMq) \nonumber \end{equation} $$

There's two optimizations happening. First, we optimize the inner quantity and then the outer quantity.(Because the, the optimizing the inner piece, we've sort of "committed" to some $p$ and subsequently optimize the $q$ as a function of this outcome. Thus we have that the P player is playing first in this expression.

Now,

$$ \begin{equation} \max_q \min_p p^TMq \end{equation} $$

$$ \begin{equation} \min_p (\max_q p^TMq) = \max_q (\min_p p^TMq) \end{equation} $$

Would you rather be the first player or the second player? P player goes first, and then the Q player goes, and the Q player doesn't see what the P player samples, but it does see that the players distribution.

It's clear that you would rather be the second player in that you'd rather have more information and respond to your opponent. So if you look at this, and you're the min player, the min player prefers to be in the inner optimization.

Minimax Theorem and its Relation to Online Learning

⚠️ **GitHub.com Fallback** ⚠️