where $r > 1$, $p \in \Delta_N$, and $p_i > 0$ for $i= 1,\dots, N$, where:
$$\Delta_N := \{ v \in \mathbb{R}^{N}: \sum_{i=1}^{N} v_i = 1, v_i \ge 0 \text{ for } i = 1, \dots, N \}$$
Recall Holder's Inequality: $x^Ty = \langle x, y\rangle \le \lVert x \rVert_a \lVert y \rVert_b$ where $\frac{1}{a} + \frac{1}{b} = 1$.
Solution:
Consider $x,y \in \mathbb{R}^N$ such that $x_i = \frac{1}{p_i^{(r-1)/r}}$ and $y_i = p_i^{(r-1)/r}$ for $i=1,\dots,N$. Note that each entry of $x$ and $y$ are reciprocals of one another, so we get that the inner product is $\langle x, y\rangle = N$. Set $r$ such that $\frac{1}{r} + \frac{1}{q} = 1$. Applying Holder's Inequality with $x$, $y$, $\lVert \cdot \rVert_r$ and $\lVert \cdot \rVert_q$:
$$\langle x, y \rangle \le ||x||_r ||y||_q $$
$$\iff N \le \left(\sum_{i=1}^{N}\left(\frac{1}{p_i^{(r-1)/r}}\right)^{r}\right)^{1/r} \left(\sum_{i=1}^{N}\left(p_i^{(r-1)/r}\right)^{q}\right)^{1/q}$$
$$\iff N \le \left(\sum_{i=1}^{N}\frac{1}{p_i^{r-1}}\right)^{1/r} \left(\sum_{i=1}^{N}\left(p_i^{1/q}\right)^{q}\right)^{1/q}$$
$$\iff N \le \left(\sum_{i=1}^{N}\frac{1}{p_i^{r-1}}\right)^{1/r} \left(\sum_{i=1}^{N}p_i\right)^{1/q}$$
$$\iff N \le \left(\sum_{i=1}^{N}\frac{1}{p_i^{r-1}}\right)^{1/r} \Rightarrow \boxed{N^r \le \sum_{i=1}^{N}\frac{1}{p_i^{r-1}}} $$
In the above, we use the fact that $\frac{r - 1}{r} = 1 - \frac{1}{r} = \frac{1}{q}$, and $\sum_\limits{i=1}^{N}p_i = 1$.
Let $f$ be a convex function on a subset of $\mathbb{R}^n$. The Fenchel conjugate of $f$ is defined as
$$f^*(\theta) = \sup_{x\in \text{dom}(f)}(\langle x, \theta \rangle - f(x))$$
Example:
Let $M \in \mathbb{R}^{n \times n}$ be a positive definite matrix. Define $f(x) = \frac{1}{2} x^TMx$. Then, the Fenchel conjugate of $f$ is $f^*(\theta) = \frac{1}{2} \theta^TM^{-1}\theta$
Proof:
By definition we have
$$f^*(\theta) = \sup_{x\in \text{dom}(f)}(x^T\theta-\frac{1}{2} x^TMx)$$
Since $x^T\theta-\frac{1}{2} x^TMx$ is concave, then we can obtain the supremum by taking gradients and setting the result to zero,
$$\nabla_x (x^T\theta-\frac{1}{2} x^TMx) = 0$$$$\iff \nabla_x (x^T\theta)-\nabla_x (\frac{1}{2} x^TMx) = 0$$$$\iff \theta-Mx = 0$$$$\iff x = M^{-1}\theta$$
Having obtained the value in which the function achieves its maximum, we can plug it back into the original equation to compute the resulting Fenchel conjugate,
$$f^*(\theta) = (M^{-1}\theta)^T\theta-\frac{1}{2} (M^{-1}\theta)^TM(M^{-1}\theta) = \frac{1}{2} \theta^TM^{-1}\theta$$
Remarks:
To easily observe that the function $x^T\theta-\frac{1}{2} x^TMx$ is concave it suffices to note that the Hessian of its negation is $\nabla_x^2(-x^T\theta+\frac{1}{2} x^TMx) = \nabla_x^2(-x^T\theta)+\nabla_x^2(\frac{1}{2} x^TMx)=M$. Since we assumed in the previous proof that $M$ is symmetric positive definite, then $-x^T\theta+\frac{1}{2} x^TMx$ is convex, which implies that $x^T\theta-\frac{1}{2} x^TMx$ is concave.
Recall that the inverse of a symmetric matrix is symmetric. Thus, $(M^{-1})^{T} = M^{-1}$
Every positive definite matrix $M$ has an eigendecomposition, i.e. there exists some orthogonal matrix$U$ such that $M = U^T\Sigma U$, where $\Sigma$ is a diagonal matrix containing the eigenvalues of $M$. Therefore, it follows that $M^{-1} = U^T\Sigma^{-1}U$, where $\Sigma^{-1}$ consists of the corresponding multiplicative inverses of the eigenvalues of $M$.
Proposition:
$f(x)$ is convex if and only if $f(x) + \langle x,\theta \rangle$ is convex.
$\text{dom}(f^*) = \{ \nabla f(x) : x \in \text{dom}(f)\}$
If $f$ is smooth and strictly convex, then the map $x \mapsto \nabla f(x)$ is injective. In this case, $\nabla f^*$ is the inverse map of $\nabla f$. In other words, for all $x \in \text{dom}(f)$, $\nabla f^*(\nabla f(x)) = x$ and for all $\theta \in \text{dom}(f^*)$, $\nabla f(\nabla f^*(\theta))=\theta$.
Let $f$ be a convex function. The Bregman divergence of $f$ is defined as
$$D_f: \text{dom}(f) \times \text{dom}(f) \rightarrow \mathbb{R}$$$$D_f(x,y) = f(x) - f(y) - \nabla f(y)^T(x-y)$$
Intuition:
Observe that we can rewrite the Bregman divergence equation as $f(x) - (f(y) + \nabla f(y)^T(x-y))$. In other words, the Bregman divergence computes the difference between the function value and its linear approximation at a point. Equivalently, we can say $D_f$ measures how "bad" the linear approximation of $f$ at $x$ is when taken at $y$.
Facts about the Bregman Divergence:
$D_f$ is always non-negative as a consequence of the convexity of $f$.
$D_f$ is not necessarily symmetric.
Examples:
Let $f(x) = \frac{1}{2} \lVert x \rVert_2^2$. Then, $D_f (x,y) = \frac{1}{2} \lVert x-y \rVert_2^2$
$f$ in this case is the negation of the entropy function used in information theory. Also, $D_f$ in this case is the relative entropy or Kullback-Leibler divergence.
Homework
(a) Let $f, g$ be convex functions such that $f^* = g$. Write the convex conjugate of $f_{\alpha}(\cdot) = \alpha f(\cdot)$ (where $\alpha \in \mathbb{R}^+$) in terms of $\alpha$ and $g$.
(b) Let $f(x) := \sqrt{1 + x^2}$. What is its Fenchel conjugate, $f^*(\theta)$?