CS7545_Sp23_Lecture_02: Norms and Inequalities - mltheory/CS7545 GitHub Wiki
CS 7545: Machine Learning Theory -- Spring 2023
Instructor: Jacob Abernethy
Notes for Lecture 02
January 12, 2023
Scribes: David Gordon, Cameron Bennett
Notation
Prof. Abernethy will try to use a notation to specify vectors (for example $\overline{x}$, $\vec{x}$, or $\hat{x}$), but often any lowercase Latin letter (especially $x$, $y$, or $z$) likely represents a vector.
We'll be using subscript $x_i$ to denote the $i$-th coordinate of a vector.
An uppercase letter such as $M$ or $A$ will likely denote a matrix.
Scalars will typically be denoted by lowercase Greek letters, such as $\alpha$ or $\beta$.
The notations $\vec{x}^{\top}\vec{y}$ and $\langle \vec{x},\vec{y} \rangle$ will likely be used interchangeably.
Matrices
Definition 2.1 (Symmetric Matrix)
A symmetric matrix has the following equivalent properties
$M_{ij}=M_{ji} \forall i,j$
$M^{T} = M$
Definition 2.2 (Positive Semi-Definite)
A symmetric matrix $M \in \mathbb{R}^{n\times n}$ is described as positive semi-definite (or PSD) iff $\forall x \in \mathbb{R}^{n}$, $x^{\top}Mx \geq 0$. Sometimes, this is also denoted $M \succeq 0$
Note:$\vec{x}^{\top}M\vec{x}$ can also be computed as $\sum\limits_{i=0}^{n}\sum\limits_{j=0}^{n}x_{i}x_{j}M_{ij}$
Definition 2.3 (Positive Definite)
A symmetric matrix $M \in \mathbb{R}^{n\times n}$ is further described as positive definite (or PD) if it is positive semi-definite and $\vec{x}^{\top}M\vec{x} = 0$ implies $\vec{x} = \vec{0}$.
Note: Positive semi-definite matrices have all eigenvalues nonnegative, while positive definite matrices have all eigenvalues strictly positive. additionally, covariance matrices are always positive semi-definite.
Norms
Definition 2.4 (Norm)
A function $||\cdot||: \mathbb{R}^{n} \rightarrow \mathbb{R}$ is a norm iff $\forall x,y \in \mathbb{R}^{n}$ the following hold:
$||\vec{x}|| = 0$ implies $\vec{x} = \vec{0}$
$||\alpha\vec{x}|| = |\alpha|(||\vec{x}||)$
$||x+y|| \leq ||x|| + ||y||$
Note: The third inequality is often referred to as the triangle inequality. Additionally, these three statements also imply that a valid norm is always positive by the following argument:
p-Norm: $||\cdot||_ {p} = (\sum_i |x_i|^p)^{1/p}$, for $p \in (1,\infty)$
Matrix Norm: $||\cdot||_ {M} = x^{\top}M x$ where $M$ is a positive definite matrix
Definition 2.5 (Dual Norm)
Given a norm $||\cdot||$, the dual norm $||\cdot||_ {*}$ is defined as $||y||_ {*} = \sup\limits_ {x: ||x|| \leq 1} \langle \vec{x},\vec{y} \rangle$.
Note: In some cases (such as this definition in the space $\mathbb{R}^{n}$) $\max$ and $\sup$ are equivalent. This gives an alternate but equivalent definition for these cases: $||y||_ {*} = \max\limits_ {x: ||x|| \leq 1} \langle \vec{x},\vec{y} \rangle$.
Examples:
The dual norm of a dual norm is the original norm.
$||\cdot||_ {1}$ is dual to $||\cdot||_ {\infty}$
$||\cdot||_ {p}$ is dual to $||\cdot||_ {q}$ when $\frac{1}{p} + \frac{1}{q} = 1$.
If $M$ is a positive definite matrix then $||\cdot||_ {M}$ is dual to $||\cdot||_ {M^{-1}}$
Exercise: Prove that iff $M$ is Positive Definite then $M^{-1}$ is Positive Definite
Inequalities:
Theorem 2.1 (Cauchy-Schwarz Inequality)
Cauchy-Schwarz states that for any $x,y \in \mathbb{R}^{n}$, $\langle \vec{x},\vec{y} \rangle \leq ||x||_ {2}||y||_ {2}$
Theorem 2.2 (Young's Inequality)
Young's Inequality states that if $p,q \in (1,\infty)$ with $\frac{1}{p} + \frac{1}{q} = 1$ and $a,b \in \mathbb{R}_{\geq 0}$, then
$ab < \frac{1}{p}a^{p} + \frac{1}{q}b^{q}$
Theorem 2.3 (Holder's Inequality for p-Norms)
Holder's Inequality for p-Norms states that if $\frac{1}{p} + \frac{1}{q} = 1$, $\langle \vec{x},\vec{y}\rangle \leq ||x||_ {p}||y||_ {q}$
Proof:
$$\begin{align*}
\frac{\langle \vec{x},\vec{y} \rangle }{||x|| _{p} ||y||_ {q}}
&\leq \sum_{i}\left(\frac{|x_ i|}{||x||_ p}\right)\left(\frac{|y_{i}|}{||y||_{q}}\right)
&& \text{By definition of } \langle x,y \rangle \text{ and the fact that } |x_i| > x_i\\\
&\leq \sum_{i}\left(\frac{1}{p}\left(\frac{|x_{i}|}{||x||_{p}}\right)^{p} + \frac{1}{q}\left(\frac{|y_{i}|}{||y||_{q}}\right)^{q}\right)
&& \text{By Young's inequality} \\
&\leq \frac{1}{p}\frac{\sum_i|x_{i}|^{p}}{||x_{i}||^{p}_{p}} + \frac{1}{q}\frac{\sum_i |y_{i}|^{q}}{||x_{i}||^{q}_{q}}
&& \text{Pulling constants out of the sum}\\\
&\leq \frac{1}{p} + \frac{1}{q} = 1
&& \sum_i|x_i|^p = \left(||x_i||_p\right)^p \text{ and } \sum_i |y_i|^q = \left(||y_i||_q\right)^q\\\
\langle \vec{x},\vec{y} \rangle &\leq ||x||_p||y||_q
&& \text{Multiplying both sides by } ||x||_p||y||_q
\end{align*}$$
Theorem 2.4 (Holder's Generalized Inequality)
Holder's Generalized Inequality states that $\langle \vec{x},\vec{y}\rangle \leq ||x||||y||_ {*}$, where $||\cdot||$ and $||\cdot||_ {*}$ are two norms dual to each other.
Note: As $\frac{1}{2} + \frac{1}{2} = 1$, $||\cdot||_ {2}$ is self dual. Because of this, Holders Generalized Inequality implies Holders Inequality for p-Norms which in turn implies this statement of the Cauchy-Schwartz Inequality.