CS7545_Sp24_Lecture_21 - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Instructor: Guanghui Wang

Notes for Lecture 21

April 4, 2024

Scribes: Qihang Hu

Proof Intuition

  1. $x^{k}_{t}$ eveolve class to each other
  2. Averaging the sgd $\nabla f_{i_{t}^{k}}(x_{t}^{k})$ can yield a reduction in the vatiation by a factor of k.\

ccc1e84d6fbfcf031a2a5489d6fa26d

Goal

$Ef(\hat{x_T})-f^*\leq \epsilon(>0) = O(\frac{1}{kT})$

Lemma 1

With assumptions (a), (b), (c), (d) and $\eta_t \leq \frac{1}{4L}$
$$E || \bar{x}_{t+1} - x^* || \leq (1 - \mu \eta_t) E || \bar{x}_t - x^* ||^2 + \eta_t^2 E || g_t - \bar{g}_t ||^2 - \frac{1}{2} \eta_t E \left(f(\bar{x}_t) - f^*\right) + 2\eta_t \frac{L}{K} \sum_{k=1}^K E || \bar{x}_t - x_t^k ||^2$$

Proof

$${|| \bar{x_{t+1}} - x^* || }^2 = { || \bar{x_t} - \eta_t g_t - x^* || }^2$$

$$= {|| \bar{x_t} - \eta_t g_t - x^* + \eta_t \bar{g_t} - \eta_t\bar{g_t} ||}^2$$

$$= {|| \bar{x_t} - x^* - \eta_t \bar{g_t} ||}^2 + \eta_t^2 ||g_t - \bar{g_t}||^2 + 2 \eta_t \left( \bar{x_t} - x^* - \eta_t \bar{g_t}, \bar{g_t} - g_t \right)$$

L-smoothness

$$f(y) \geq f(x) + <y-x, \nabla f(x) > + \frac{1}{2L} ||\nabla f(x) - \nabla f(y)||^2$$

$$f(y) - f(x) = f(z) - f(x) - (f(z) - f(y))$$

$$\leq <z-x, \nabla f(x)> - (< z-y, \nabla f(y) > + \frac{L}{2} ||z-y||^2)$$

$$= <y-x + z-y, \nabla f(x)> - (< z-y, \nabla f(y) > + \frac{L}{2} ||z-y||^2)$$

$$= <y-x, \nabla f(x)> + <z-y, \nabla f(x) - \nabla f(y)> - \frac{L}{2} ||z-y||_2^2$$

$$z = y- \frac{1}{L}(\nabla f(y) - \nabla f(x))$$

Continued proof of lemma 1

$$||\bar{x_t} - x^* -\eta_t \bar{g_t}||^2$$

$$= ||\bar{x_t} - x^* ||^2 +\eta_t^2 ||\bar{g_t}||^2 - 2\eta_t <\bar{x_t} - x^* ,\bar{g_t}> $$

$$= ||\bar{x_t} - x^* ||^2 +\eta_t^2 ||\bar{g_t}||^2 - 2\eta_t \frac{1}{K} \sum_{k=1}^{K} <\bar{x_t} - x^*, \nabla f(x_t^k)>$$

$$\leq ||\bar{x_t} - x^* ||^2 + \eta_t^2 \frac{1}{K} \sum_{k=1}^{K} ||\nabla f(x_t^k)||^2 - 2\eta_t \frac{1}{k} \sum_{k=1}^{K} <\bar{x_t} - x_t^k + x_k^t - x^* , \nabla f(x_t^k)>$$

$$||\sum_{i=1}^{K} a_i||^2 \leq ||\sum_{i=1}^{K} ||a_i||||^2 \text{Triangular Ineq}$$

$$\leq k\sum_{i=1}^{K} ||a_i||^2 \text{Cauchy Shwartz}$$

Continued proof of lemma 1

$$=||\bar{x_t} - x^* ||^2 + \eta_t^2 \frac{1}{K} \sum_{k=1}^{K} ||\nabla f(x_t^k) - \nabla f(x^* ) ||^2 - 2\eta_t \frac{1}{K} \sum_{k=1}^{K}<x_t^k -x^* , \nabla f(x_t^k)> -2\eta_t \frac{1}{K} \sum_{k=1}^{K} <\bar{x_t} - x_k^t, \nabla f(x_t^k)>$$

By $\mu$-strong cvx

$$2< a, b > \leq \gamma ||a||^2 + \frac{1}{\gamma} ||b||^2 \Leftrightarrow ||\sqrt{ \gamma} a - \frac{1}{\sqrt{\gamma}} b ||^2$$

$$-<x_t^k - x^* , \nabla f(x_t^k)> \leq -(f(x_t^k) - f^* ) - \frac{\mu}{2} ||x_t^k - x^* ||^2$$

$$-2 <\bar{x_t} - x_k^t, \nabla f(x_t^k)> \leq 2L ||\bar{x_t} - x_k^t||^2 + \frac{1}{2L} ||\nabla f(x_t^k)||^2$$

$$= \leq 2L ||\bar{x_t} - x_k^t||^2 + \frac{1}{2L} ||\nabla f(x_t^k) - \nabla f(x^* )||^2$$

$$= \leq 2L ||\bar{x_t} - x_k^t||^2 + f(x_t^k) - f^* $$

$$||\bar{x_t} - x^* - \eta_t \bar{g_t}||^2 \leq ||\bar{x_t} - x^* ||^2 + 2\eta_t \frac{L}{K} \sum_{k=1}^{K} ||\bar{x_t} - x_t^k ||^2 + 2\eta_t \frac{1}{K} \sum_{k=1}^{K} ((\eta_t L - \frac{1}{2})(f(x_t^k)-f^* ) - \frac{\mu}{2} ||x_t^k - x^* ||^2$$

$$\eta_t L - \frac{1}{2} \leq -\frac{1}{4}$$

$$\leq -\frac{1}{2} (f(\frac{1}{K} \sum_{k=1}^{K} x_t^k) - f^* ) - \eta_t \mu ||\frac{1}{K} \sum_{k=1}{K} x_t^k - x^* ||^2$$

$$||\bar{x_t} - x^* -\eta_t \bar{g_t}||^2 \leq (1-\mu \eta_t) ||\bar{x_t} - x^* ||^2 - \frac{1}{2} \eta_t (f(\bar{x_t}) - f^* ) + 2\eta_t \frac{L}{K} \sum_{k=1}{K} ||\bar{x_t} - x_t^k||^2$$

Lemma 2

$$E||\eta_t - \bar{g_t}|| ^2 = E||\frac{1}{K} \sum_{k=1}{K} (\nabla f_{i_t^k}(x_t^k)) - \nabla f(x_t^k)||^2$$

$$= \frac{1}{k^2} \sum_{k=1}^{K} E ||\nabla f_{i_t^k}(x_t^k) - \nabla f(x_t^k)||^2 \leq \frac{6^2}{k}$$

$$\text{for }x_1, ..., x_k \text{, }Var(\sum_{k=1}^{K} x_k) = \sum Var(x_k)$$

⚠️ **GitHub.com Fallback** ⚠️