Our goal is to design an algorithm that achieves a sublinear-in- $T$ regret. For a general loss function $l$, we can't hope to do anything. Therefore, we make a reasonable assumption that $\ell(\hat y, y ) : [0,1] \times [0,1] \rightarrow [0,1]$ is convex with respect to $\hat y$ (examples: $\ell(\hat y , y) = |\hat y - y|,(\hat y , y)^2,\dots$).
Remark: In class we only required $l$ to be a convex real function. However, the assumption $l(\hat y_t, y_t) \in [0, 1]$ is critical for the proof of EWA algorithm.
Follow-The-Leader (FTL)
We first look at the Follow-The-Leader algorithm:
Let $i_{t}^{\star} := \arg \min_{i \in [N]} L_{t-1}^{i}$. In day $t$, the FTL algorithm follows $i^{\star}_t$, i.e., we play $\hat y_t =z_t^{i^{\star}_t}$
The performance of FTL algorithm is bad, because it suffers linear regret:
Consider an example with $N = 2$.
For day $1$, we have $l(z_1^1, y_1) = 0.51$ and $l(z_1^2, y_1) = 0.49$.
Starting from the second day, we have $l(z_t^1, y_t) = 0 , l(z_t^2, y_t) = 1$ when $t$ is even, and $l(z_t^1, y_t) = 1 , l(z_t^2, y_t) = 0$ when $t$ is odd.
It's easy to verify that $L_T^{FTL} \approx T$, while $L_T^1 \approx \frac{T}{2}$, So the regret is roughly $\frac{T}{2}$.
Follow-The-Leader is unstable since it is a greedy algorithm and is sensitive to "leader changes". We need a smoother algorithm that is not as sensitive.
Exponential Weights Algorithm (EWA)
EWA is less sensitive to changes in leader and provides sub-linear regret. We first introduce the algorithm:
Proof of Claim 4: It's sufficient to show that $\Phi(t+1) - \Phi(t) \geq (1 - e^{-\eta}) \cdot l(\hat y_t, y_t)$. According to the definition of $\Phi(t)$, we have
Remark: In class we used a different way to prove Claim 4 by introducing the following claim without the proof: Given random variable $X \in [0, 1]$ and $s \in \mathbb{R}$, we have
In fact this inequality is implicitly proved in the lecture notes, but only for the special random variable $X_t: X_t = l(z_t^i, y_t) ~~ w.p.~~ \frac{w_{t}^i}{W_t}$. Generalizing the procedure gives a formal proof for the claim above: For a random variable $X \in [0, 1]$ and $s \in \mathbb{R}$, the convexity of $e^{sX}$ gives