Given a function $f$, the gradient $\bigtriangledown f(x) = \left[\frac{\partial f}{\partial x}\right]$
The Hessian of $f$ is the matrix of 2nd derivatives $\bigtriangledown^2 f(x) = \left[\frac{\partial^2 f}{\partial x_i \partial x_j}\right]_{i = 1 ... n, j = 1 ... n}$, which is symmetric.
Lipschitz
A function $f$ is $c$-Lipschitz (w.r.t. some norm $‖\cdot‖$) if: $$\boxed{\forall x,y \in \text{dom}(f) : f(x) - f(y) \leq c‖x - y‖}$$
Fact: If $f$ is differentiable, then: $c$-Lipschitz $\Longleftrightarrow ‖\bigtriangledown f(x)‖_* \leq c$
Proof Sketch ($\Longleftarrow$):
Using the Mean Value Theorem, take $z$ on the line between $x$ and $y$ (convex combination) so that $f(x) - f(y) = \bigtriangledown f(z)^T(x - y)$,
For all random variables $X$ in the domain of a convex function $f$: $$\boxed{f(\mathbb{E}[X]) \leq \mathbb{E}[f(X)]}$$
This inequality is called Jensen's Inequality ("most likely the most important inequality in learning theory")
Definitions of Convexity
Definition 1: A function $f$ is convex if the domain of $f$ is convex and $\forall x, y \in \text{dom}(f): f(\frac{x + y}{2}) \leq \frac{f(x) + f(y)}{2}$
Definition 2: If $f$ is differentiable, then $f$ is convex $\Leftrightarrow \forall x, y \in \text{dom}(f): f(y) \geq f(x) + \bigtriangledown f(x)^T(y - x)$
Definition 3: If $f$ is 2-differentiable, then $f$ is convex $\Leftrightarrow \bigtriangledown ^2 f(x) \succeq 0, \forall x \in \text{dom}(f)$
Smoothness and Strong Convexity
Let $f$ be any differentiable function. Let $‖ \cdot ‖$ be any norm. Let $\mu > 0.$
If $‖ \cdot ‖$ is 2-norm then: $f$ is $\mu$-strongly convex $\Leftrightarrow \bigtriangledown^2 f(x) \succeq \mu I$ (note that the last term is equivalent to $\bigtriangledown^2 f(x) - \mu I \succeq 0$), and similarly, $f$ is $\mu$-smooth $\Leftrightarrow \bigtriangledown^2 f(x) \preceq \mu I$.
Homework
Let $p$ lie in the probability simplex $\Delta_N$ and have all entries greater than 0. That is,
$$\sum_{i=1}^N p_i = 1 \text{ and } p_i > 0 \text{ for } i=1,\ldots,N.$$
(Not Required, done in class) Prove using Hölder's Inequality that $$\sum_{i=1}^N \frac{1}{{p_i}^{q-1}} \geq N^q \text{ for any } q > 1$$