CS7545_Sp24_Lecture_22 - mltheory/CS7545 GitHub Wiki
Game theory involves trying to understand the behavior of multiple players that interact. Through a function of the actions that they take, the players receive rewards or losses (real numbers) during the game. It's a way to understand their behavior when multiple parties get loss or gain as a function of the actions of all of the parties in question. (In this analogy optimization looks at how one party can choose an action (vector) or number to minimize some function. Game theory is more of a multi-agent view in comparison, where the gain to the players can depend on others players in the game.
The simplest version of this is the two player zero-sum game. Every zero-sum game can be represented by a matrix where:
- a row player chooses actions from the rows of this matrix
$i \in n$ - a column player who chooses some option of an action
$j \in 1... m$ -
$M_{ij}$ is the gain to the column player -
$M_{ij}$ is also called the loss to the to the row player (so that$-M_{ij}$ is the gain to the to the row player)
This applies to lots of games; every finite action space zero-sum game is defined by a matrix and every matrix can be thought of as as representing a zero-sum game. With zero-sum games there is just one matrix because the gain of one player is the loss to another player. If 1 player gains
A simple example of a common zero-sum game is Rock, Paper, Scissors.
If a row player plays rock, and the column player plays scissors, rock beats scissors, so the column player lost and gets a -1. The row player's loss is -1 and his gain is 1. The column player's gain is -1 (and loss is 1).
If you have no information about your opponent, the optimal strategy is to randomize your actions uniformly, meaning putting equal weight on rock paper and scissors.
Consider that a "mixed strategy" for either player is a distribution
we can write this quantity (nested optimization) in matrix notation. We can write the following:
There's two optimizations happening. First, we optimize the inner quantity and then the outer quantity.(Because the, the optimizing the inner piece, we've sort of "committed" to some
Now,
Would you rather be the first player or the second player? P player goes first, and then the Q player goes, and the Q player doesn't see what the P player samples, but it does see that the players distribution.
It's clear that you would rather be the second player in that you'd rather have more information and respond to your opponent. So if you look at this, and you're the min player, the min player prefers to be in the inner optimization.
For any
Claim: There exists an algorithm which determines
Fact: with exponential weights our algorithm is to set t equal to
Imagine a protocol: Imagine row and column player play for
What's going on here? The players both pick their mix strategies. And then each of them is going to look at their opponents mixed strategy and figure out the loss vector for their actions on this round. If my opponent tells me that they're going to play rock half the time and then paper the other half the time and then scissors 0% of the time.If the column player does
Both players use the exponential weights algorithm to update their probability distributions on the actions. The indices
Consider
The idea is if you gave me some time to learn from your actions, I can do as well as if I knew them in advance and had to commit to a fixed rate.