CS7545_Sp24_Lecture_22 - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Notes for Lecture 22: Game Theory Applications to Online Learning

April 11, 2024

Scribes: Alexandra Stewart,Jackson Dike

Game Theory

Game theory involves trying to understand the behavior of multiple players that interact. Through a function of the actions that they take, the players receive rewards or losses (real numbers) during the game. It's a way to understand their behavior when multiple parties get loss or gain as a function of the actions of all of the parties in question. (In this analogy optimization looks at how one party can choose an action (vector) or number to minimize some function. Game theory is more of a multi-agent view in comparison, where the gain to the players can depend on others players in the game.

Zero sum games

The simplest version of this is the two player zero-sum game. Every zero-sum game can be represented by a matrix where:

  • a row player chooses actions from the rows of this matrix $i \in n$
  • a column player who chooses some option of an action $j \in 1... m$
  • $M_{ij}$ is the gain to the column player
  • $M_{ij}$ is also called the loss to the to the row player (so that $-M_{ij}$ is the gain to the to the row player)

This applies to lots of games; every finite action space zero-sum game is defined by a matrix and every matrix can be thought of as as representing a zero-sum game. With zero-sum games there is just one matrix because the gain of one player is the loss to another player. If 1 player gains $M_{ij}$ then the other player loses $M_{ij}$.

Example: Rock, Paper, Scissors

A simple example of a common zero-sum game is Rock, Paper, Scissors.

If a row player plays rock, and the column player plays scissors, rock beats scissors, so the column player lost and gets a -1. The row player's loss is -1 and his gain is 1. The column player's gain is -1 (and loss is 1). $M_{ij}$ is the gain to the column player. You can think of Rock, Paper, Scissors through the lens of the matrix by writing the payoffs.

$$\begin{equation} M=\begin{bmatrix} & R & P & S \\ R & 0 &+1 & -1 \\ P & -1 & 0 & +1 \\ S & +1 & -1 & 0\\ \end{bmatrix} \end{equation}$$

If you have no information about your opponent, the optimal strategy is to randomize your actions uniformly, meaning putting equal weight on rock paper and scissors.

Consider that a "mixed strategy" for either player is a distribution $\vec{p} \in \Delta_n (\text{ on } \Delta_m)$. Is there a notion of optimality for both players when it comes to choosing their next strategy? We observe that if row player plays $\vec{p} \in \Delta_n$ and the column player plays $\vec{q}\in\Delta_m$ , the expected payoff to column player is

$$ \begin{equation} E_{i \in \Delta_p, j in \Delta_q} M_{ij} = \sum_{i=1}^n \sum_{i=1}^m p_i q_i M_{ij} = p^T M q \end{equation} $$

we can write this quantity (nested optimization) in matrix notation. We can write the following:

$$ \begin{equation} \min_{p \in \Delta_n} (\max_{q \in \Delta_m } p^TMq) \geq \max_{q \in \Delta_m } (\min_{p \in \Delta_n} p^TMq) \nonumber \end{equation} $$

There's two optimizations happening. First, we optimize the inner quantity and then the outer quantity.(Because the, the optimizing the inner piece, we've sort of "committed" to some $p$ and subsequently optimize the $q$ as a function of this outcome. Thus we have that the P player is playing first in this expression.

Now,

$$ \begin{equation} \max_q \min_p p^TMq \end{equation} $$

$$ \begin{equation} \min_p (\max_q p^TMq) = \max_q (\min_p p^TMq) \end{equation} $$

Would you rather be the first player or the second player? P player goes first, and then the Q player goes, and the Q player doesn't see what the P player samples, but it does see that the players distribution.

It's clear that you would rather be the second player in that you'd rather have more information and respond to your opponent. So if you look at this, and you're the min player, the min player prefers to be in the inner optimization.

Minimax Theorem and its Relation to Online Learning

Claim

$$ \min_{p\in \Delta}\max{q \in \Delta_m} p^TMq \leq \max_{q \in \Delta_m} \min_{p\in \Delta_n}p^TMq + \epsilon $$

For any $\epsilon$, I'm going to show this and then the corollary C is true without $\epsilon$, which is obvious. If it's true for small $\epsilon$, it's true for $\epsilon$=0 too.

Claim: There exists an algorithm which determines $w_T = l_1$ of the $l_{t-1}$, this is going to be inside of simplex such that

$$\frac{1}{T}\sum w e^T l_{t}-\min{l_t}=o(1)$$

Fact: with exponential weights our algorithm is to set t equal to $\frac{log(n)}{ \epsilon^2}$ and solve for $t$.

Imagine a protocol: Imagine row and column player play for $t$ rounds. V equals the 1s vector for $T = 1... T$.The Loss vector for the for the first players, $l_{t}$ equals $m$

$$p_{t}=w_t \sum q_{t}=V_t \sum V_t$$

$l_{t}$ is going to equal $Mq_t$. And $g_t$ is going to equal $M^T p^T$.

What's going on here? The players both pick their mix strategies. And then each of them is going to look at their opponents mixed strategy and figure out the loss vector for their actions on this round. If my opponent tells me that they're going to play rock half the time and then paper the other half the time and then scissors 0% of the time.If the column player does $\frac{1}{2}$, $\frac{1}{2}$, 0, you can multiply this matrix out to figure out what the loss vector for the row player.

Both players use the exponential weights algorithm to update their probability distributions on the actions. The indices $i$ and $j$ are the actions. The matrix goes over rows $i$ and columns $j$. So 1 player is getting the distribution of the rows, the other player is the distribution of the columns in the matrix.

Consider $\frac{1}{T}\sum p_t^T M q_t$ And this is the expected value lost by the P player. The row player plays $p_t$ and Q player plays $q_t$. How much did the row player lose over the course of this game on average? We take the inner product of $q_t$ with the M as $l_{t}$.

$$\min_{p\in \Delta_n} \sum p^T l(t)$$

$$-\epsilon_t+ \min_{p \in ∆_n} \max_{q \in ∆_m} p^T q$$

$\min_p \max_q p^T M q - \epsilon_t \leq \max_q \min_p p^T M q + \epsilon_t$ the only thing that has $t$ now is $\epsilon$.

The idea is if you gave me some time to learn from your actions, I can do as well as if I knew them in advance and had to commit to a fixed rate.

⚠️ **GitHub.com Fallback** ⚠️