CS7545_Sp24_Lecture_02 - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Instructor: Jacob Abernethy

Notes for Lecture 02

January 11, 2024

Scribes: Marcus Ma, Yifan Zhao

Class Logistics

The one-stop shop for class information is this class Wiki. The wiki is meant to be a public resource that everyone can edit and collaborate on (including non-scribes, if you think the scribes missed anything!). For every lecture, homework is given before/during/right after class. The homework problem will be in the scribe notes. Homework assignments are due four days before the exams. Grade cutoffs may be adjusted at the end of the semester, but only down (i.e., getting a 90% will guarantee an A). There will be no final exam.

Linear Algebra Notation

Here are some common definitions and notations for linear algebra concepts used in the class.

Variables

  • Vectors are denoted with $\vec{x}$, $\hat{x}$, or just $x$.
  • Matrices are usually upper-case letters; i.e., $A, M \in \mathbb{R}^{m \times n}$.
  • Scalars are often Greek letters, i.e., $\alpha, \beta, \mu > 0$.
  • Inner products can be represented as: $$\vec{x} \cdot \vec{y} = \vec{x}^{T}\vec{y} = \langle \vec{x}, \vec{y} \rangle \coloneqq \sum_{i=1}^n x_iy_i$$

Matrices

  • Symmetric matrix: $M \in \mathbb{R}^{n \times n}$ satisfies $M_{ij}=M_{ji} $, or equiv $M = M^T$.
  • Positive Semi-Definite matrix: A matrix $M \in \mathbb{R}^{n \times n}$ is Positive Semi-Definite (PSD) if $\forall \vec{x} \in \mathbb{R}^n$ we have $\vec{x}^T M \vec{x} \geq 0$. Often written as $M \geq 0$. We will assume in this class that all PSD matrices are symmetric unless otherwise noted.
  • Positive Definite matrix (a subset of PSD matrices): $M$ is Positive Definite (PD) if $M$ is PSD and $\vec{x}^T M \vec{x} = 0 \iff \vec{x} = 0$. Denoted $M > 0$.

Fact:

  • $M \geq 0$ iff all eigenvalues of $M \geq 0$.
  • $M > 0$ iff all eigenvalues of $M > 0$.

Norms and Inequalities

A function $||\cdot||$: $\mathbb{R}^n \rightarrow \mathbb{R}$ is a norm iff $\forall \vec{x}, \vec{y} \in \mathbb{R}^n, \forall \alpha \in \mathbb{R}$ we have:

  1. $||\vec{x}|| = 0 \iff x = 0$
  2. $||\alpha \cdot \vec{x}|| = |\alpha| \cdot ||\vec{x}||$
  3. Triangle Inequality: $||\vec{x} + \vec{y}|| \leq ||\vec{x}|| + ||\vec{y}||$

Theorem: All norms of non-zero inputs are positive.
Proof:

$$0 = ||0|| = ||0+\vec{x}-\vec{x}||$$

$$||0+\vec{x}-\vec{x}||\le ||0|| + ||\vec{x}|| + ||-\vec{x}||$$

$$0 \le 2||\vec{x}||$$

$$0 \le ||\vec{x}||$$

And since we established that the norm of $x$ is 0 only when $x=0$, we arrive at our result.

Common Norms

  • 2-norm: $$||\vec{x}||_2 \coloneqq \sqrt{\sum^n_{i=1} x_i^2}$$
  • 1-norm: $$||\vec{x}||_1 \coloneqq \sum_{i=1}^n |x_i|$$
  • ∞-norm: $$||\vec{x}||_{\infty} \coloneqq \max_{i=1 ... n} |x_i|$$
  • p-norm: $$||\vec{x}||_{p} \coloneqq \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}, p \in (1, \infty)$$
  • Let M be a PSD matrix, then M-norm: $$||\vec{x}||_M \coloneqq \sqrt{\vec{x}^TM\vec{x}}$$

Given any norm $||\cdot||$ on $\mathbb{R}^n$, the dual norm $||\cdot||^*$ is defined as:

$$||\vec{y}||^* \coloneqq \sup_{\vec{x} \in \mathbb{R}^n, ||\vec{x}|| \leq 1} \langle x, y \rangle = \sup_{\vec{x} \in \mathbb{R}^n} \frac{\langle x, y \rangle}{||\vec{x}||} $$ , ||\vec{x}|| \leq 1} \f, ||\vec{x}|| \leq 1} \frac{\langle x, y \rangle}{||\vec{x}||} $$

Examples:

  • $||\cdot||_1$ is dual to $||\cdot||_{\infty}$.
  • $||\cdot||_2$ is dual to itself.
  • $||\cdot||_p$ is dual to $||\cdot||_q$ iff $\frac{1}{p} + \frac{1}{q} = 1$, with $p, q \in (1,\infty)$.
  • Let $M$ be a PSD matrix in $\mathbb{R}^{n \times n}$. Then $||\cdot||_M$ is dual to $||\cdot||_{M^{-1}}$.

Exercises:

  1. (Reflexivity) Can you prove that if norm A is dual to norm B, then norm B is dual to norm A?
  2. (Self-Duality) Can you prove that the 2-norm is the only norm dual to itself?
  3. (Inversion) Can you prove why the norm of a PSD matrix $M$ is the $M^{-1}$-norm?

Cauchy-Schwarz Inequality

Let $x$ and $y$ be vectors in $\mathbb{R}^n$. Then, for all $x$ and $y$, we have:

$$\boxed{x^Ty \leq ||x||_2 \cdot ||y||_2}$$ Exercise: The Cauchy-Schwarz Inequality can be proven directly from the fact that the 2-norm is dual to itself. Can you think of the proof?

Aside: A large part of theoretical machine learning involves establishing inequalities and upper bounds for various problems (i.e., minimizing loss). The success of many machine learning models is derived from provable lower/upper bounds of optimization/minimization metrics.

Young's Inequality

Let $p, q \in (1, \infty), \frac{1}{p} + \frac{1}{q} = 1$. Then for any $a, b \in \mathbb{R}_{\geq 0}$,

$$ \boxed{ab \leq \frac{1}{p}a^p + \frac{1}{q}b^q}$$

Proof:

$$\log(ab) = \log a + \log b$$ $$= \frac{1}{p} \log (a^p) + \frac{1}{q} \log (b^q)$$
Aside: The log function is concave, meaning if you pick any two points along the curve of the log function and then find their average log value, the log value of their midpoint (in terms of x-coordinates) will always be higher than this average. The mathematical way of saying this is:

$$r \log (\alpha) + (1-r) \log (\beta) \leq \log (r\alpha + (1-r)\beta)$$

Combining the previous two equations, we get:

$$\frac{1}{p} \log (a^p) + \frac{1}{q} \log (b^q) \leq \log (\frac{1}{p}a^p + \frac{1}{q} b^q)$$ And because $\log$ is a monotonically increasing function, Young's Inequality follows.

Homework

Prove the following norm inequalities. Assume $\mathbf{x} \in \mathbb{R}^N$.

Part a)

$$\| \mathbf{x} \|_2 \leq \| \mathbf{x} \|_1 \leq \sqrt N \| \mathbf{x} \|_2$$

Part b)

$$ \| \mathbf{x} \|_\infty \leq \| \mathbf{x} \|_2 \leq \sqrt N \| \mathbf{x} \|_{\infty} $$

Part c)

$$\| \mathbf{x} \|_\infty \leq \| \mathbf{x} \|_1 \leq N \| \mathbf{x} \|_\infty$$

Part d)

$$ \| \mathbf{x} \|_p \leq N^{1/p} \| \mathbf{x} \|_\infty \text{ for } p > 1 $$

Part e)

$$ \lim_{p\to+\infty} \| \mathbf{x} \|_p = \| \mathbf{x} \|_\infty $$

⚠️ **GitHub.com Fallback** ⚠️