CS 7545: Machine Learning Theory -- Spring 2024
Instructor: Guanghui Wang
-
$x^{k}_{t}$ eveolve class to each other
- Averaging the sgd $\nabla f_{i_{t}^{k}}(x_{t}^{k})$ can yield a reduction in the vatiation by a factor of k.\

$Ef(\hat{x_T})-f^*\leq \epsilon(>0) = O(\frac{1}{kT})$
With assumptions (a), (b), (c), (d) and $\eta_t \leq \frac{1}{4L}$
$$E || \bar{x}_{t+1} - x^* || \leq (1 - \mu \eta_t) E || \bar{x}_t - x^* ||^2 + \eta_t^2 E || g_t - \bar{g}_t ||^2 - \frac{1}{2} \eta_t E \left(f(\bar{x}_t) - f^*\right) + 2\eta_t \frac{L}{K} \sum_{k=1}^K E || \bar{x}_t - x_t^k ||^2$$
$${|| \bar{x_{t+1}} - x^* || }^2 = { || \bar{x_t} - \eta_t g_t - x^* || }^2$$
$$= {|| \bar{x_t} - \eta_t g_t - x^* + \eta_t \bar{g_t} - \eta_t\bar{g_t} ||}^2$$
$$= {|| \bar{x_t} - x^* - \eta_t \bar{g_t} ||}^2 + \eta_t^2 ||g_t - \bar{g_t}||^2 + 2 \eta_t \left( \bar{x_t} - x^* - \eta_t \bar{g_t}, \bar{g_t} - g_t \right)$$
$$f(y) \geq f(x) + <y-x, \nabla f(x) > + \frac{1}{2L} ||\nabla f(x) - \nabla f(y)||^2$$
$$f(y) - f(x) = f(z) - f(x) - (f(z) - f(y))$$
$$\leq <z-x, \nabla f(x)> - (< z-y, \nabla f(y) > + \frac{L}{2} ||z-y||^2)$$
$$= <y-x + z-y, \nabla f(x)> - (< z-y, \nabla f(y) > + \frac{L}{2} ||z-y||^2)$$
$$= <y-x, \nabla f(x)> + <z-y, \nabla f(x) - \nabla f(y)> - \frac{L}{2} ||z-y||_2^2$$
$$z = y- \frac{1}{L}(\nabla f(y) - \nabla f(x))$$
Continued proof of lemma 1
$$||\bar{x_t} - x^* -\eta_t \bar{g_t}||^2$$
$$= ||\bar{x_t} - x^* ||^2 +\eta_t^2 ||\bar{g_t}||^2 - 2\eta_t <\bar{x_t} - x^* ,\bar{g_t}> $$
$$= ||\bar{x_t} - x^* ||^2 +\eta_t^2 ||\bar{g_t}||^2 - 2\eta_t \frac{1}{K} \sum_{k=1}^{K} <\bar{x_t} - x^*, \nabla f(x_t^k)>$$
$$\leq ||\bar{x_t} - x^* ||^2 + \eta_t^2 \frac{1}{K} \sum_{k=1}^{K} ||\nabla f(x_t^k)||^2 - 2\eta_t \frac{1}{k} \sum_{k=1}^{K} <\bar{x_t} - x_t^k + x_k^t - x^* , \nabla f(x_t^k)>$$
$$||\sum_{i=1}^{K} a_i||^2 \leq ||\sum_{i=1}^{K} ||a_i||||^2 \text{Triangular Ineq}$$
$$\leq k\sum_{i=1}^{K} ||a_i||^2 \text{Cauchy Shwartz}$$
Continued proof of lemma 1
$$=||\bar{x_t} - x^* ||^2 + \eta_t^2 \frac{1}{K} \sum_{k=1}^{K} ||\nabla f(x_t^k) - \nabla f(x^* ) ||^2 - 2\eta_t \frac{1}{K} \sum_{k=1}^{K}<x_t^k -x^* , \nabla f(x_t^k)> -2\eta_t \frac{1}{K} \sum_{k=1}^{K} <\bar{x_t} - x_k^t, \nabla f(x_t^k)>$$
$$2< a, b > \leq \gamma ||a||^2 + \frac{1}{\gamma} ||b||^2 \Leftrightarrow ||\sqrt{ \gamma} a - \frac{1}{\sqrt{\gamma}} b ||^2$$
$$-<x_t^k - x^* , \nabla f(x_t^k)> \leq -(f(x_t^k) - f^* ) - \frac{\mu}{2} ||x_t^k - x^* ||^2$$
$$-2 <\bar{x_t} - x_k^t, \nabla f(x_t^k)> \leq 2L ||\bar{x_t} - x_k^t||^2 + \frac{1}{2L} ||\nabla f(x_t^k)||^2$$
$$= \leq 2L ||\bar{x_t} - x_k^t||^2 + \frac{1}{2L} ||\nabla f(x_t^k) - \nabla f(x^* )||^2$$
$$= \leq 2L ||\bar{x_t} - x_k^t||^2 + f(x_t^k) - f^* $$
$$||\bar{x_t} - x^* - \eta_t \bar{g_t}||^2 \leq ||\bar{x_t} - x^* ||^2 + 2\eta_t \frac{L}{K} \sum_{k=1}^{K} ||\bar{x_t} - x_t^k ||^2 + 2\eta_t \frac{1}{K} \sum_{k=1}^{K} ((\eta_t L - \frac{1}{2})(f(x_t^k)-f^* ) - \frac{\mu}{2} ||x_t^k - x^* ||^2$$
$$\eta_t L - \frac{1}{2} \leq -\frac{1}{4}$$
$$\leq -\frac{1}{2} (f(\frac{1}{K} \sum_{k=1}^{K} x_t^k) - f^* ) - \eta_t \mu ||\frac{1}{K} \sum_{k=1}{K} x_t^k - x^* ||^2$$
$$||\bar{x_t} - x^* -\eta_t \bar{g_t}||^2 \leq (1-\mu \eta_t) ||\bar{x_t} - x^* ||^2 - \frac{1}{2} \eta_t (f(\bar{x_t}) - f^* ) + 2\eta_t \frac{L}{K} \sum_{k=1}{K} ||\bar{x_t} - x_t^k||^2$$
$$E||\eta_t - \bar{g_t}|| ^2 = E||\frac{1}{K} \sum_{k=1}{K} (\nabla f_{i_t^k}(x_t^k)) - \nabla f(x_t^k)||^2$$
$$= \frac{1}{k^2} \sum_{k=1}^{K} E ||\nabla f_{i_t^k}(x_t^k) - \nabla f(x_t^k)||^2 \leq \frac{6^2}{k}$$
$$\text{for }x_1, ..., x_k \text{, }Var(\sum_{k=1}^{K} x_k) = \sum Var(x_k)$$