CS7545_Sp23_Lecture_02: Norms and Inequalities - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2023

Instructor: Jacob Abernethy

Notes for Lecture 02

January 12, 2023

Scribes: David Gordon, Cameron Bennett

Notation

  • Prof. Abernethy will try to use a notation to specify vectors (for example $\overline{x}$, $\vec{x}$, or $\hat{x}$), but often any lowercase Latin letter (especially $x$, $y$, or $z$) likely represents a vector.
  • We'll be using subscript $x_i$ to denote the $i$-th coordinate of a vector.
  • An uppercase letter such as $M$ or $A$ will likely denote a matrix.
  • Scalars will typically be denoted by lowercase Greek letters, such as $\alpha$ or $\beta$.
  • The notations $\vec{x}^{\top}\vec{y}$ and $\langle \vec{x},\vec{y} \rangle$ will likely be used interchangeably.

Matrices

Definition 2.1 (Symmetric Matrix)
A symmetric matrix has the following equivalent properties

  1. $M_{ij}=M_{ji} \forall i,j$
  2. $M^{T} = M$

Definition 2.2 (Positive Semi-Definite)
A symmetric matrix $M \in \mathbb{R}^{n\times n}$ is described as positive semi-definite (or PSD) iff $\forall x \in \mathbb{R}^{n}$, $x^{\top}Mx \geq 0$. Sometimes, this is also denoted $M \succeq 0$

Note: $\vec{x}^{\top}M\vec{x}$ can also be computed as $\sum\limits_{i=0}^{n}\sum\limits_{j=0}^{n}x_{i}x_{j}M_{ij}$

Definition 2.3 (Positive Definite)
A symmetric matrix $M \in \mathbb{R}^{n\times n}$ is further described as positive definite (or PD) if it is positive semi-definite and $\vec{x}^{\top}M\vec{x} = 0$ implies $\vec{x} = \vec{0}$.

Note: Positive semi-definite matrices have all eigenvalues nonnegative, while positive definite matrices have all eigenvalues strictly positive. additionally, covariance matrices are always positive semi-definite.

Norms

Definition 2.4 (Norm)
A function $||\cdot||: \mathbb{R}^{n} \rightarrow \mathbb{R}$ is a norm iff $\forall x,y \in \mathbb{R}^{n}$ the following hold:

  1. $||\vec{x}|| = 0$ implies $\vec{x} = \vec{0}$
  2. $||\alpha\vec{x}|| = |\alpha|(||\vec{x}||)$
  3. $||x+y|| \leq ||x|| + ||y||$

Note: The third inequality is often referred to as the triangle inequality. Additionally, these three statements also imply that a valid norm is always positive by the following argument:

$$\begin{align*} ||x + (-x)|| &\leq ||x|| + ||-x|| & &\text{By property 3} \\ 0 = ||0|| &\leq ||x|| + ||-x|| & &\text{By property 1} \\ 0 &\leq 2||x|| & &\text{By property 2} \\ 0 &\leq ||x|| \end{align*}$$

Examples:

  1. 2-Norm: $||\cdot||_ {2} = \sqrt{\sum_i |x_i|^2}$
  2. 1-Norm: $||\cdot||_ {1} = \sum_i |x_i|$
  3. $\infty$-Norm: $||\cdot||_ {\infty} = \max{|x_i|}$
  4. p-Norm: $||\cdot||_ {p} = (\sum_i |x_i|^p)^{1/p}$, for $p \in (1,\infty)$
  5. Matrix Norm: $||\cdot||_ {M} = x^{\top}M x$ where $M$ is a positive definite matrix

Definition 2.5 (Dual Norm)
Given a norm $||\cdot||$, the dual norm $||\cdot||_ {*}$ is defined as $||y||_ {*} = \sup\limits_ {x: ||x|| \leq 1} \langle \vec{x},\vec{y} \rangle$.

Note: In some cases (such as this definition in the space $\mathbb{R}^{n}$) $\max$ and $\sup$ are equivalent. This gives an alternate but equivalent definition for these cases: $||y||_ {*} = \max\limits_ {x: ||x|| \leq 1} \langle \vec{x},\vec{y} \rangle$.

Examples:

  1. The dual norm of a dual norm is the original norm.
  2. $||\cdot||_ {1}$ is dual to $||\cdot||_ {\infty}$
  3. $||\cdot||_ {p}$ is dual to $||\cdot||_ {q}$ when $\frac{1}{p} + \frac{1}{q} = 1$.
  4. If $M$ is a positive definite matrix then $||\cdot||_ {M}$ is dual to $||\cdot||_ {M^{-1}}$

Exercise: Prove that iff $M$ is Positive Definite then $M^{-1}$ is Positive Definite

Inequalities:

Theorem 2.1 (Cauchy-Schwarz Inequality)
Cauchy-Schwarz states that for any $x,y \in \mathbb{R}^{n}$, $\langle \vec{x},\vec{y} \rangle \leq ||x||_ {2}||y||_ {2}$

Theorem 2.2 (Young's Inequality)
Young's Inequality states that if $p,q \in (1,\infty)$ with $\frac{1}{p} + \frac{1}{q} = 1$ and $a,b \in \mathbb{R}_{\geq 0}$, then $ab < \frac{1}{p}a^{p} + \frac{1}{q}b^{q}$

Proof:

$$\begin{align*} \log(ab) &= \log(a) + \log(b)\\ &= \frac{1}{p}\log(a^{p}) + \frac{1}{q}\log(b^{q})\\ &\leq \log(\frac{1}{p}a^{p} + \frac{1}{q}b^{q}) && \text{By the concavity of } \log(\cdot) \end{align*}$$

Theorem 2.3 (Holder's Inequality for p-Norms)
Holder's Inequality for p-Norms states that if $\frac{1}{p} + \frac{1}{q} = 1$, $\langle \vec{x},\vec{y}\rangle \leq ||x||_ {p}||y||_ {q}$

Proof:

$$\begin{align*} \frac{\langle \vec{x},\vec{y} \rangle }{||x|| _{p} ||y||_ {q}} &\leq \sum_{i}\left(\frac{|x_ i|}{||x||_ p}\right)\left(\frac{|y_{i}|}{||y||_{q}}\right) && \text{By definition of } \langle x,y \rangle \text{ and the fact that } |x_i| > x_i\\\ &\leq \sum_{i}\left(\frac{1}{p}\left(\frac{|x_{i}|}{||x||_{p}}\right)^{p} + \frac{1}{q}\left(\frac{|y_{i}|}{||y||_{q}}\right)^{q}\right) && \text{By Young's inequality} \\ &\leq \frac{1}{p}\frac{\sum_i|x_{i}|^{p}}{||x_{i}||^{p}_{p}} + \frac{1}{q}\frac{\sum_i |y_{i}|^{q}}{||x_{i}||^{q}_{q}} && \text{Pulling constants out of the sum}\\\ &\leq \frac{1}{p} + \frac{1}{q} = 1 && \sum_i|x_i|^p = \left(||x_i||_p\right)^p \text{ and } \sum_i |y_i|^q = \left(||y_i||_q\right)^q\\\ \langle \vec{x},\vec{y} \rangle &\leq ||x||_p||y||_q && \text{Multiplying both sides by } ||x||_p||y||_q \end{align*}$$

Theorem 2.4 (Holder's Generalized Inequality)
Holder's Generalized Inequality states that $\langle \vec{x},\vec{y}\rangle \leq ||x||||y||_ {*}$, where $||\cdot||$ and $||\cdot||_ {*}$ are two norms dual to each other.

Note: As $\frac{1}{2} + \frac{1}{2} = 1$, $||\cdot||_ {2}$ is self dual. Because of this, Holders Generalized Inequality implies Holders Inequality for p-Norms which in turn implies this statement of the Cauchy-Schwartz Inequality.

⚠️ **GitHub.com Fallback** ⚠️