CS7545_Sp23_Lecture_22: Online Learning for Equilibrium - mltheory/CS7545 GitHub Wiki
- Theory for deep learning (21 votes)
- Reinforcement Learning (18 votes)
- Game Theory / Minimax (16 votes)
We're going to do Theory for deep learning and game theory/minimax, because setting up Reinforcement Learning will take too long to be able to discuss anything else.
The point of the paper is to explore a new topic beyond class, and do more than just read a paper.
Don't:
- Pick a topic from your research or from another class.
- Rugirgitate the info from a paper.
- Fixate the page numbers. They are not a requirement---you should know when your paper is good enough.
Do:
- Synthesize information from multiple papers
- Connect what you're reading to what we learned in class
Think of the project as a workshop paper in terms of content and quality.
A Nash Equilibrium for a zero-sum game with payoff matrix
Small Aside: sometimes non-zero-sum games can be richer in describing a game. Turns out, Von-Neumann's proof for the minimax theorem follows for non-zero-sum games (shown by Nash). We may go into non-zero-sum games in a future lesson (teaser: you need two matrices to describe the reward for each player separately).
For a particular player, the Nash equilibrium is not always optimal when the other player does not play the Nash equilibrium. For example, in rock/paper/scissors, if one player chooses rock/paper with probability
Another thing to note is that the Nash equilibrium pair may not be unique, but the value of
In the last lecture, we have shown that using online learning techniques can find
For
- Row player chooses
$p_t\in \Delta_n$ - Column player chooses
$q_t\in \Delta_m$ - Row player observes
$l_t= Mq_t$ - Column player observes
$h_t= -M^\top p_t$ - Two players update their information
End For
Note in this scheme, step 1 and 2 can happen simultaneously, and step 3 and 4 can happen simultaneously.
An alternative scheme:
For
- Row player chooses
$p_t\in \Delta_n$ - Column player observes
$h_t= -M^\top p_t$ - Column player sets
$q_t=\mathrm{arg}\min_{q\in \Delta_m}h_t^\top q=\mathrm{arg}\max_{q\in \Delta_m}p_t^\top Mq$ - Row player receives
$l_t= Mq_t$ - Row player updates its information
End For
The
Proof of convergence
Firstly,
On the other hand,
Recall the von Neumann's Minimax Theorem:
Thus the convergence is proved, and we claim that
Assume you have a "bad" hypothesis class
Let
Assume
Maybe
For