CS7545_Sp24_Lecture_04: Convex Analysis - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Instructor: Jacob Abernethy

Notes for Lecture 04

January 18, 2024

Scribes: Siddharth M. Sundaram, Ricardo J. Velasquez

Announcements

  • Tuesday ($\text{23}^{\text{rd}}$ January 2024) may be virtual as Prof. Jake may be out of town.

Homework Problem (discussed in class)

Problem:

Use Holder's Inequality to show:

$$ N^r \le \sum_{i=1}^{N} \frac{1}{p_i^{r - 1}} $$

where $r > 1$, $p \in \Delta_N$, and $p_i > 0$ for $i= 1,\dots, N$, where:

$$\Delta_N := \{ v \in \mathbb{R}^{N}: \sum_{i=1}^{N} v_i = 1, v_i \ge 0 \text{ for } i = 1, \dots, N \}$$

Recall Holder's Inequality: $x^Ty = \langle x, y\rangle \le \lVert x \rVert_a \lVert y \rVert_b$ where $\frac{1}{a} + \frac{1}{b} = 1$.

Solution:

Consider $x,y \in \mathbb{R}^N$ such that $x_i = \frac{1}{p_i^{(r-1)/r}}$ and $y_i = p_i^{(r-1)/r}$ for $i=1,\dots,N$. Note that each entry of $x$ and $y$ are reciprocals of one another, so we get that the inner product is $\langle x, y\rangle = N$. Set $r$ such that $\frac{1}{r} + \frac{1}{q} = 1$. Applying Holder's Inequality with $x$, $y$, $\lVert \cdot \rVert_r$ and $\lVert \cdot \rVert_q$:

$$\langle x, y \rangle \le ||x||_r ||y||_q $$

$$\iff N \le \left(\sum_{i=1}^{N}\left(\frac{1}{p_i^{(r-1)/r}}\right)^{r}\right)^{1/r} \left(\sum_{i=1}^{N}\left(p_i^{(r-1)/r}\right)^{q}\right)^{1/q}$$

$$\iff N \le \left(\sum_{i=1}^{N}\frac{1}{p_i^{r-1}}\right)^{1/r} \left(\sum_{i=1}^{N}\left(p_i^{1/q}\right)^{q}\right)^{1/q}$$

$$\iff N \le \left(\sum_{i=1}^{N}\frac{1}{p_i^{r-1}}\right)^{1/r} \left(\sum_{i=1}^{N}p_i\right)^{1/q}$$

$$\iff N \le \left(\sum_{i=1}^{N}\frac{1}{p_i^{r-1}}\right)^{1/r} \Rightarrow \boxed{N^r \le \sum_{i=1}^{N}\frac{1}{p_i^{r-1}}} $$

In the above, we use the fact that $\frac{r - 1}{r} = 1 - \frac{1}{r} = \frac{1}{q}$, and $\sum_\limits{i=1}^{N}p_i = 1$.

Convex Analysis (continued)

Fenchel Conjugates

Definition 4.1 (Concave Function)

A function $f$ is concave if and only if $-f$ is convex.

Definition 4.2 (Fenchel Conjugate)

Let $f$ be a convex function on a subset of $\mathbb{R}^n$. The Fenchel conjugate of $f$ is defined as $$f^*(\theta) = \sup_{x\in \text{dom}(f)}(\langle x, \theta \rangle - f(x))$$

Example:

Let $M \in \mathbb{R}^{n \times n}$ be a positive definite matrix. Define $f(x) = \frac{1}{2} x^TMx$. Then, the Fenchel conjugate of $f$ is $f^*(\theta) = \frac{1}{2} \theta^TM^{-1}\theta$

Proof:

By definition we have $$f^*(\theta) = \sup_{x\in \text{dom}(f)}(x^T\theta-\frac{1}{2} x^TMx)$$ Since $x^T\theta-\frac{1}{2} x^TMx$ is concave, then we can obtain the supremum by taking gradients and setting the result to zero, $$\nabla_x (x^T\theta-\frac{1}{2} x^TMx) = 0$$ $$\iff \nabla_x (x^T\theta)-\nabla_x (\frac{1}{2} x^TMx) = 0$$ $$\iff \theta-Mx = 0$$ $$\iff x = M^{-1}\theta$$ Having obtained the value in which the function achieves its maximum, we can plug it back into the original equation to compute the resulting Fenchel conjugate, $$f^*(\theta) = (M^{-1}\theta)^T\theta-\frac{1}{2} (M^{-1}\theta)^TM(M^{-1}\theta) = \frac{1}{2} \theta^TM^{-1}\theta$$

Remarks:

  • To easily observe that the function $x^T\theta-\frac{1}{2} x^TMx$ is concave it suffices to note that the Hessian of its negation is $\nabla_x^2(-x^T\theta+\frac{1}{2} x^TMx) = \nabla_x^2(-x^T\theta)+\nabla_x^2(\frac{1}{2} x^TMx)=M$. Since we assumed in the previous proof that $M$ is symmetric positive definite, then $-x^T\theta+\frac{1}{2} x^TMx$ is convex, which implies that $x^T\theta-\frac{1}{2} x^TMx$ is concave.
  • Recall that the inverse of a symmetric matrix is symmetric. Thus, $(M^{-1})^{T} = M^{-1}$
  • Every positive definite matrix $M$ has an eigendecomposition, i.e. there exists some orthogonal matrix $U$ such that $M = U^T\Sigma U$, where $\Sigma$ is a diagonal matrix containing the eigenvalues of $M$. Therefore, it follows that $M^{-1} = U^T\Sigma^{-1}U$, where $\Sigma^{-1}$ consists of the corresponding multiplicative inverses of the eigenvalues of $M$.

Proposition:

$f(x)$ is convex if and only if $f(x) + \langle x,\theta \rangle$ is convex.

Facts about Fenchel Conjugates:

  • If $f$ is a closed convex function, then $(f^*)^* = f$
  • $\text{dom}(f^*) = \{ \nabla f(x) : x \in \text{dom}(f)\}$
  • If $f$ is smooth and strictly convex, then the map $x \mapsto \nabla f(x)$ is injective. In this case, $\nabla f^*$ is the inverse map of $\nabla f$. In other words, for all $x \in \text{dom}(f)$, $\nabla f^*(\nabla f(x)) = x$ and for all $\theta \in \text{dom}(f^*)$, $\nabla f(\nabla f^*(\theta))=\theta$.

Bregman Divergence

Definition 4.3 (Bregman Divergence)

Let $f$ be a convex function. The Bregman divergence of $f$ is defined as $$D_f: \text{dom}(f) \times \text{dom}(f) \rightarrow \mathbb{R}$$ $$D_f(x,y) = f(x) - f(y) - \nabla f(y)^T(x-y)$$

Intuition:

Observe that we can rewrite the Bregman divergence equation as $f(x) - (f(y) + \nabla f(y)^T(x-y))$. In other words, the Bregman divergence computes the difference between the function value and its linear approximation at a point. Equivalently, we can say $D_f$ measures how "bad" the linear approximation of $f$ at $x$ is when taken at $y$.

Facts about the Bregman Divergence:

  • $D_f$ is always non-negative as a consequence of the convexity of $f$.
  • $D_f$ is not necessarily symmetric.

Examples:

  • Let $f(x) = \frac{1}{2} \lVert x \rVert_2^2$. Then, $D_f (x,y) = \frac{1}{2} \lVert x-y \rVert_2^2$
    • $D_f$ is symmetric in this case.
  • Let $f(p) = \sum\limits_{i=1}^n p_i \log(p_i)$. Then, $D_f(p,q) = \sum\limits_{i=1}^n p_i \log(\frac{p_i}{q_i})$

Homework

(a) Let $f, g$ be convex functions such that $f^* = g$. Write the convex conjugate of $f_{\alpha}(\cdot) = \alpha f(\cdot)$ (where $\alpha \in \mathbb{R}^+$) in terms of $\alpha$ and $g$.

(b) Let $f(x) := \sqrt{1 + x^2}$. What is its Fenchel conjugate, $f^*(\theta)$?

⚠️ **GitHub.com Fallback** ⚠️