CS7545_Sp24_Lecture_03 - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Instructor: Jacob Abernethy

Notes for Lecture 03 - Convex Analysis

January 16, 2024

Scribes: Elias Izmirlian, Carlos Sanchez Vargas

Announcements

Exams moved one week forward:

  • Exam 1: 1/30 -> 2/6
  • Exam 2: 2/27 -> 3/5
  • Exam 3: 4/2 -> 4/9

HWs due 4 - 7 days before Exam, Bonus points for submitting homeworks early

Exam content will largely overlap with homework content - "In the same zone as homework problems"

Last Time

Dual Norms
Given a norm $‖⋅‖$ on $\mathbb{R}^n$, the dual norm $‖⋅‖_∗$ is defined as $$‖ \vec{y} ‖_∗ = \sup\limits _{x \in \mathbb{R}^n, ‖x‖ \leq 1}⁡ \langle \vec{x},\vec{y} \rangle$$ Alternately, $$‖ \vec{y} ‖_∗ = \sup\limits _{x \in \mathbb{R}^n, ‖x‖ \neq 0}⁡ \frac{\langle \vec{x},\vec{y} \rangle}{‖x‖}$$ A $p$-Norm is dual to some $q$-norm iff $\frac{1}{p} + \frac{1}{q} = 1$.

Young's Inequality
Let $p, q \in (1, \infty), \frac{1}{p} + \frac{1}{q} = 1$. Then, $$\forall a, b \in \mathbb{R}_{\geq 0}, ab \leq \frac{1}{p}a^p + \frac{1}{q}b^q$$

Hölder's Inequality

Let $p, q$ be dual norms. Then, $$\forall x,y \in \mathbb{R}^n : \langle x, y \rangle \leq ‖x‖_p ‖y‖_q$$ Proof $$\frac{\langle x, y \rangle}{‖x‖_p ‖y‖_q} = \left\langle \frac{x}{‖x‖_p}, \frac{y}{‖y‖_q} \right\rangle \leq \sum^n\limits _{i = 1}\frac{|x_i|}{‖x‖_p} \frac{|y_i|}{‖y‖_q} \leq \sum^n\limits _{i = 1} (\frac{1}{p} \frac{|x_i|^p}{‖x‖_p^p} + \frac{1}{q} \frac{|y_i|^q}{‖y‖_q^q})$$ $$= \frac{1}{p} \frac{‖x‖_p^p}{‖x‖_p^p} + \frac{1}{q} \frac{‖y‖_q^q}{‖y‖_q^q} = \frac{1}{p} + \frac{1}{q} = 1$$ $$\frac{\langle x, y \rangle}{‖x‖_p ‖y‖_q} \leq 1 \rightarrow \langle x, y \rangle \leq ‖x‖_p ‖y‖_q$$

Restated:
Theorem (Hölder):
Let $‖\cdot‖, ‖\cdot‖_{*}$ be a "dual pair", then:

$$\forall x, y \in \mathbb{R}^n : \langle x, y \rangle \leq ‖x‖ ‖y‖_*$$

Convex Analysis

Definition: A set $U \subseteq \mathbb{R}^n$ is convex if $\forall x, y \in U : \frac{x + y}{2} \in U$
Alternate definition: $\forall \alpha \in [0, 1]: \alpha x + (1 - \alpha)y \in U$

Calculus Definitions

Given a function $f$, the gradient $\bigtriangledown f(x) = \left[\frac{\partial f}{\partial x}\right]$

The Hessian of $f$ is the matrix of 2nd derivatives $\bigtriangledown^2 f(x) = \left[\frac{\partial^2 f}{\partial x_i \partial x_j}\right]_{i = 1 ... n, j = 1 ... n}$, which is symmetric.

Lipschitz

Definition: A function $f$ is $c$-Lipschitz (w.r.t. some norm $‖\cdot‖$) if: $$\forall x,y \in \text{dom}(f) : f(x) - f(y) \leq c‖x - y‖$$ Fact: If $f$ is differentiable, then:
$c$-Lipschitz $\Longleftrightarrow ‖\bigtriangledown f(x)‖_* \leq c$

Proof Sketch ($\Longleftarrow$):
Using the Mean Value Theorem, take $z$ on the line between $x$ and $y$ (convex combination) so that $f(x) - f(y) = \bigtriangledown f(z)^T(x - y)$, then by Hölder we get $\bigtriangledown f(z)^T(x - y) \leq ‖\bigtriangledown f(z)‖_* ‖x - y‖ \leq c ‖x - y‖$

Jensen's Inequality
For all random variables $X$ in the domain of a convex function $f$: $$f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]$$ This inequality is called Jensen's Inequality ("most likely the most important inequality in learning theory")

Definitions of Convexity
Definition 1: A function $f$ is convex if the domain of $f$ is convex and $\forall x, y \in \text{dom}(f): f(\frac{x + y}{2}) \leq \frac{f(x) + f(y)}{2}$

Definition 2: If $f$ is differentiable, then $f$ is convex $\Leftrightarrow \forall x, y \in \text{dom}(f): f(y) \geq f(x) + \bigtriangledown f(x)^T(y - x)$

Definition 3: If $f$ is 2-differentiable, then $f$ is convex $\Leftrightarrow \bigtriangledown ^2 f(x) \succeq 0, \forall x \in \text{dom}(f)$

Smoothness and Strong Convexity

Let $f$ be any differentiable function. Let $‖ \cdot ‖$ be any norm. Let $\mu > 0.$ $f$ is $\mu$-smooth if: $$\forall x, y \in \text{dom}(f): f(y) \leq f(x) + \bigtriangledown f(x)^T(y - x) + \frac{\mu}{2}‖y - x‖^2$$ $f$ is $\mu$-strongly convex if: $$\forall x,y \in \text{dom}(f): f(y) \geq f(x) + \bigtriangledown f(x)^T (y - x) + \frac{\mu}{2}‖x - y‖^2$$ If $‖ \cdot ‖$ is 2-norm then:
$f$ is $\mu$-strongly convex $\Leftrightarrow \bigtriangledown^2 f(x) \succeq \mu I$ (note that the last term is equivalent to $\bigtriangledown^2 f(x) - \mu I \succeq 0$),

and similarly, $f$ is $\mu$-smooth $\Leftrightarrow \bigtriangledown^2 f(x) \preceq \mu I$.

Homework

  1. Let $p$ lie in the probability simplex $\Delta_N$ and have all entries greater than 0. That is, $$\sum_{i=1}^N p_i = 1 \text{ and } p_i > 0 \text{ for } i=1,\ldots,N.$$
  • (Not Required, done in class) Prove using Hölder's Inequality that $$\sum_{i=1}^N \frac{1}{{p_i}^{q-1}} \geq N^q \text{ for any } q > 1$$
  • Prove that $$\sum_{i=1}^N \left(p_i + \frac 1 {p_i} \right)^2 \geq N^3 + 2N + 1/N$$
  1. Convexity:
  • Show that the following function is convex on the domain $p \in \Delta_N$: $$f(p) = \sum_{i=1}^N p_i \log p_i.$$
  • Show that if a differentiable function $g$ is convex then it satisfies the ``gradient monotonicity property:

$$ (\nabla g(x) - \nabla g(y))^T(x - y) \geq 0 \text{ for all } x,y \in \text{dom}(g)$$

⚠️ **GitHub.com Fallback** ⚠️