CS7545_Sp24_Lecture_19 - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Instructor: Guanghui Wang

Notes for Lecture 19: Online Learning

March 28, 2024

Scribes: Milind Nakul, Xiaotian Liu

Recap: Prediction with Expert Advice

Framework: [ $N$ -experts]

For $t = 1, \ldots, T$, do:

  1. Each expert $i \in {1, \ldots, n}$ predicts $z_t^i \in [0,1]$

  2. Learner predicts $\hat{y_t} \in [0,1]$

  3. Learner suffers loss $l\left(\hat{y_t}, y_t\right)$

End-for

Full Information: $l\left(z^i_t, y_t\right)$ can be computed for any expert $i$ and each expert can be evaluated.

Goal: $$\text{Minimize Regret} = R_T = L_T - \min_{i \in N}L_T^i \quad \text{where} \quad L_T = \sum_{t=1}^T l\left(\hat{y_t}, y_t\right) \quad \text{and} \quad L_T^i = \sum_{t=1}^T l\left(z_t^i, y_t\right).$$

Exponential Weights Algorithm

Idea : Maintain cumulative loss of all the experts.

Algorithm:

For $t \in {1, \ldots, T}$ do:

  1. Each expert predicts $z_t^i \in [0,1]$.

  2. Learner will pick $$\hat{y_t} = \sum_{i=1}^N \dfrac{w_t^i}{W_t}z_t^i. \quad \text{where} \quad W_t = \sum_{i=1}^N w_t^i$$

This can also be interpreted as selecting the action according to the policy $\boldsymbol{p}_t = (p^1_t,\ldots,p^N_t)$ where $p^i_t = \dfrac{w_t^i}{W_t}$.

  1. Submit our decision $\hat{y_t}$ and suffer a loss $l(\hat{y_t}, y)$.

  2. Update weights: $$\forall i \in [N], w_{t+1}^i = w_t^i \exp(-\eta \cdot l(z_t^i, y_t))= w_1^i \cdot \exp \left( -\eta\sum_{j=1}^t l\left( z_j^i,y^t \right) \right).$$

With $w_1^i = 1$, we have $$w_{t+1}^i = \exp \left( -\eta\sum_{j=1}^t l\left( z_j^i,y^t \right) \right)$$

Regret Rate:

$RT^{EW}_T = O(\sqrt{T \log{N}})$

Hedge

Framework: [ $N$ -experts]

For $t = 1, \ldots, T$, do:

  1. Learner picks an expert $i_t$ to follow by sampling from the distribution $\boldsymbol{p}_t\in\triangle^N$, i.e., $i_t\sim \boldsymbol{p}_t$

  2. Learner observes $\boldsymbol{l}_t = (l^1_t,\ldots,l^N_t)$

End-for

$$ R_T = \sum_{t=1}^T l_t^{i_t} - \min_{i\in N}\sum_{t=1}^T l^i_t $$

Note. The second term above is the cumulative loss of best expert

$$ \mathbb{E} [R_T] = \sum_{t=1}^T \boldsymbol p_t^T \boldsymbol l_t - \min_{i\in N}\sum_{t=1}^T l^i_t $$

**Algorithm: Hedge

$$p^i_t = \frac{w^i_t}{W_t}$$

$$w_{t}^i = \exp \left( -\eta\sum_{j=1}^{t-1} l^{i}_t \right)$$

$$i_t\sim\boldsymbol p_t$$

Regret:

$\mathbb{E} [R_T] = O(\sqrt{T \log{N}})$

Bandits

For $t = 1, \ldots, T$, do:

  1. Learner picks an expert $i_t$ to follow by sampling from the distribution $\boldsymbol{p}_t\in\triangle^N$, i.e., $i_t\sim \boldsymbol{p}_t$

  2. Adversary picks $\boldsymbol l_t = (l^1_t,\ldots,l^N_t)$

  3. Learner observes $l^{i_t}_t$

End-for

Note Hedge needs full information, i.e., the whole vector $\boldsymbol l_t$ should be observed. The procedure above, however, only observes $l^{i_t}_t$ which is the loss for the selected action. Hence, to apply hedge, we need to find an estimated loss vector $\tilde{\boldsymbol{l_t}}$ by exploiting the observed loss $l^{i_t}_t$.

Estimation Method: Importance Weighting (IW)

$\tilde{\boldsymbol{l_t}} = (\tilde{l}^1_t,\ldots,\tilde{l}^N_t)$, where $\tilde{l}^i_t = \frac{l^{i_t}_t}{p^i_t}$ if $i=i_t$ and $\tilde{l}^i_t = 0$ if $i\neq i_t$

Property: $\tilde{\boldsymbol{l_t}}$ is the unbiased estimator of $\boldsymbol{l_t}$

Proof:

$$\mathbb E [\tilde{l}^i_t] = \mathbb P(i=i_t) \mathbb E [\tilde{l}^i_t|i=i_t] + \mathbb P(i \neq i_t) \mathbb E [\tilde{l}^i_t|i\neq i_t]$$

$$=\mathbb P(i=i_t) \mathbb E [\tilde{l}^i_t|i=i_t]$$

$$=p^i_t \frac{l^i_t}{p^i_t}$$

$$ = l^i_t$$

Algorithm for bandits: EXP3(Hedge+IW)

$$p^i_t = \frac{w^i_t}{W_t}$$

$$w_{t}^i = \exp \left( -\eta \tilde{L}^i_{t-1} \right)$$

$$\tilde{L}^i_{t-1} = \sum_{j=1}^{t-1} \tilde{l}^{i}_t$$

$$i_t\sim\boldsymbol p_t$$

Regret: $R^{EXP3}_T = O(\sqrt{NT log(N)})$

Proof.

Define $\Phi(t) = -\frac{1}{\eta} \log(W_t)$,

$$\Phi(t+1)-\Phi(t) = -\frac{1}{\eta} \log(\frac{W_{t+1}}{W_{t}})$$

$$=-\frac{1}{\eta}\log(\frac{\sum_{i\in N} w^i_{t+1}}{W_t}) = =-\frac{1}{\eta}\log(\frac{\sum_{i\in N} w^i_{t}\exp(-\eta \tilde{l}^i_t)}{W_t})$$

$$=-\frac{1}{\eta}\log(\sum_{i\in N} \frac{w^i_{t}}{W_t}\exp(-\eta \tilde{l}^i_t)) = -\frac{1}{\eta}\log(\sum_{i\in N} p^i_t \exp(-\eta \tilde{l}^i_t))$$

$$=-\frac{1}{\eta}\log(\mathbb E_{i\sim \boldsymbol p_t}[\exp(-\eta \tilde{l}^i_t)])$$

$$\geq -\frac{1}{\eta}\log(\mathbb E_{i\sim \boldsymbol p_t}[1-\eta \tilde{l}^i_t + \frac{\eta^2 (\tilde{l}^i_t)^2}{2}])$$

$$\geq \mathbb E_{i\sim \boldsymbol p_t}[\tilde{l}^i_t - \frac{\eta}{2}(\tilde{l}^i_t)^2] $$

$$=\boldsymbol p_t^T \tilde{\boldsymbol{l_t}} - \frac{\eta}{2}\sum_{i\in N}p^i_t (\tilde{l}^i_t)^2$$

Then we get $$\mathbb{E} [\Phi(t+1)-\Phi(t)] \geq \boldsymbol p_t^T \boldsymbol{l_t} - \frac{\eta}{2}\sum_{i\in N}p^i_t \mathbb{E}[(\tilde{l}^i_t)^2]$$

We have $$\mathbb{E}[(\tilde{l}^i_t)^2] = \mathbb{E}[(\tilde{l}^i_t)^2|i=i_t]\mathbb P(i=i_t) + \mathbb{E}[(\tilde{l}^i_t)^2|i\neq i_t]\mathbb P(i\neq i_t)$$

$$=(\frac{l^i_t}{p^i_t})^2 p^i_t = \frac{(l^i_t)^2}{p^i_t}$$

Combining the above two relations gives

$$\mathbb{E} [\Phi(t+1)-\Phi(t)] \geq \boldsymbol p_t^T \boldsymbol{l_t} - \frac{\eta}{2}\sum_{i\in N}p^i_t \frac{(l^i_t)^2}{p^i_t} $$

$$ = \boldsymbol p_t^T \boldsymbol{l_t} - \frac{\eta}{2}\sum_{i\in N}(l^i_t)^2$$

$$\geq \boldsymbol p_t^T \boldsymbol{l_t} - \frac{\eta}{2}N $$ (because the loss is bounded by 1)

Thereby, we get $$\sum_{t=1}^T \mathbb E [\Phi(t+1)-\Phi(t)] \geq \sum_{t=1}^T (\boldsymbol p_t^T \boldsymbol l_t - \frac{\eta}{2}N) $$

⚠️ **GitHub.com Fallback** ⚠️