CS7545_Sp23_Lecture_22: Online Learning for Equilibrium - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2023

Instructor: Guanghui Wang

Notes for Lecture 22

April 6, 2023

Scribes: Akash Harapanahalli, Hongjian Lan

Miscellaneous

Popular Topics

  1. Theory for deep learning (21 votes)
  2. Reinforcement Learning (18 votes)
  3. Game Theory / Minimax (16 votes)

We're going to do Theory for deep learning and game theory/minimax, because setting up Reinforcement Learning will take too long to be able to discuss anything else.

Project Discussion

The point of the paper is to explore a new topic beyond class, and do more than just read a paper.

Don't:

  1. Pick a topic from your research or from another class.
  2. Rugirgitate the info from a paper.
  3. Fixate the page numbers. They are not a requirement---you should know when your paper is good enough.

Do:

  1. Synthesize information from multiple papers
  2. Connect what you're reading to what we learned in class

Think of the project as a workshop paper in terms of content and quality.

Solving Zero Sum Games

A Nash Equilibrium for a zero-sum game with payoff matrix $M\in\mathbb{R}^{n\times m}$ is a pair $(\hat{p}\in\Delta_n,\hat{q}\in\Delta_m)$ where

$$p^\top M \hat{q} \geq \hat{p}^\top M \hat{q} \geq \hat{p}^\top M q.$$

Small Aside: sometimes non-zero-sum games can be richer in describing a game. Turns out, Von-Neumann's proof for the minimax theorem follows for non-zero-sum games (shown by Nash). We may go into non-zero-sum games in a future lesson (teaser: you need two matrices to describe the reward for each player separately).

For a particular player, the Nash equilibrium is not always optimal when the other player does not play the Nash equilibrium. For example, in rock/paper/scissors, if one player chooses rock/paper with probability $0.5$, the best strategy for the other player is to choose paper with probability $1$. More precisely: "If you want the best in the worst case, play the Nash equilibrium".

Another thing to note is that the Nash equilibrium pair may not be unique, but the value of $p^\top M q$ at the Nash equilibrium is unique.

$\epsilon$-equilibrium

$\hat{p},\hat{q}$ are an $\epsilon$-equilibrium of M if

$$ \begin{align*} &\hat{p}^\top M\hat{q}\leq \min_{p\in \Delta_n}p^\top M\hat{q}+\epsilon,\\ &\hat{p}^\top M\hat{q}\geq \max_{q\in \Delta_m}\hat{p}^\top Mq-\epsilon. \end{align*} $$

In the last lecture, we have shown that using online learning techniques can find $\epsilon$-equilibrium with $\epsilon=O(\sqrt{\frac{\mathrm{log}mn}{T}})$ (from EWA bound) or $T=O(\frac{\mathrm{log}mn}{\epsilon^2})$.

Online Learning to Compute $\epsilon$-equilibrium

For $t=1,\dots,T$:

  1. Row player chooses $p_t\in \Delta_n$
  2. Column player chooses $q_t\in \Delta_m$
  3. Row player observes $l_t= Mq_t$
  4. Column player observes $h_t= -M^\top p_t$
  5. Two players update their information

End For

Note in this scheme, step 1 and 2 can happen simultaneously, and step 3 and 4 can happen simultaneously.

An alternative scheme:

For $t=1,\dots,T$:

  1. Row player chooses $p_t\in \Delta_n$
  2. Column player observes $h_t= -M^\top p_t$
  3. Column player sets $q_t=\mathrm{arg}\min_{q\in \Delta_m}h_t^\top q=\mathrm{arg}\max_{q\in \Delta_m}p_t^\top Mq$
  4. Row player receives $l_t= Mq_t$
  5. Row player updates its information

End For

The $q_t$ in this scheme is called the best response. The best response can always be a deterministic action.

Proof of convergence

Firstly,

$$ \begin{align*} \frac{1}{T}\sum_{t=1}^\top p_t^\top Mq_t &\leq \dots \leq \min_{p\in \Delta_n} p^\top M(\frac{1}{T}\sum_{t=1}^\top q_t)+\epsilon_T^p \leq \max_q\min_pp^\top Mq+\epsilon_T^p. \end{align*} $$

On the other hand,

$$ \begin{align*} \frac{1}{T}\sum_{t=1}^\top p_t^\top Mq_t&=\frac{1}{T}\sum_{t=1}^\top (\max_qp_t^\top Mq)\\ & \geq \max_q(\frac{1}{T}\sum_{t=1}^\top p_t)^\top Mq &&\text{(by Jensen's inequality)} \\ & \geq \min_p\max_qp^\top Mq. \end{align*} $$

Recall the von Neumann's Minimax Theorem:

$$\min_p\max_qp^\top Mq=\max_q\min_pp^\top Mq.$$

Thus the convergence is proved, and we claim that $\bar{p}_T= \frac{1}{T} \sum p_t$ and $\bar{q}_T=\frac{1}{T}\sum q_t$ are the $\epsilon_T^p$-equilibria for M.

Boosting

Assume you have a "bad" hypothesis class $\mathcal{H}$. Maybe I can combine multiple h's in $\mathcal{H}$ to get a good hypothesis.

Let $(x_1,y_1),\dots,(x_n,y_n)$ be a sample, $y_i\in \{-1,1\}$.

Assume $\forall h\in \mathcal{H}$, $Pr_{i\in [n]}(h(x_i)=y_i)\leq 0.55$.

Maybe $\exists h_1,\alpha_1,h_2,\alpha_2,\dots,h_k,\alpha_k$, $\alpha_i\in R_+$, such that $$Pr(h_{\bar{\alpha}}(x_i))\approx0.99 ,\text{or} =1 \ \text{for the training sample}$$ $$h_{\bar{\alpha}}(x)=\mathrm{sign}(\sum_j \alpha_jh_j(x))$$

Weak Learning Assumption

For $\gamma >0$, $\forall p\in \Delta_n,\ \exists h\in \mathcal{H}$, such that $$Pr_{i\sim p}(h(x_i)=y_i)\geq\frac{1}{2}+\gamma$$

⚠️ **GitHub.com Fallback** ⚠️