CS7545_Sp24_Lecture_02 - mltheory/CS7545 GitHub Wiki
The one-stop shop for class information is this class Wiki. The wiki is meant to be a public resource that everyone can edit and collaborate on (including non-scribes, if you think the scribes missed anything!). For every lecture, homework is given before/during/right after class. The homework problem will be in the scribe notes. Homework assignments are due four days before the exams. Grade cutoffs may be adjusted at the end of the semester, but only down (i.e., getting a 90% will guarantee an A). There will be no final exam.
Here are some common definitions and notations for linear algebra concepts used in the class.
- Vectors are denoted with
$\vec{x}$ ,$\hat{x}$ , or just$x$ . - Matrices are usually upper-case letters; i.e.,
$A, M \in \mathbb{R}^{m \times n}$ . - Scalars are often Greek letters, i.e.,
$\alpha, \beta, \mu > 0$ . - Inner products can be represented as:
$$\vec{x} \cdot \vec{y} = \vec{x}^{T}\vec{y} = \langle \vec{x}, \vec{y} \rangle \coloneqq \sum_{i=1}^n x_iy_i$$
-
Symmetric matrix:
$M \in \mathbb{R}^{n \times n}$ satisfies$M_{ij}=M_{ji} $ , or equiv$M = M^T$ . -
Positive Semi-Definite matrix: A matrix
$M \in \mathbb{R}^{n \times n}$ is Positive Semi-Definite (PSD) if$\forall \vec{x} \in \mathbb{R}^n$ we have$\vec{x}^T M \vec{x} \geq 0$ . Often written as$M \geq 0$ . We will assume in this class that all PSD matrices are symmetric unless otherwise noted. - Positive Definite matrix (a subset of PSD matrices):
$M$ is Positive Definite (PD) if$M$ is PSD and$\vec{x}^T M \vec{x} = 0 \iff \vec{x} = 0$ . Denoted$M > 0$ .
Fact:
-
$M \geq 0$ iff all eigenvalues of$M \geq 0$ . -
$M > 0$ iff all eigenvalues of$M > 0$ .
A function
$||\vec{x}|| = 0 \iff x = 0$ $||\alpha \cdot \vec{x}|| = |\alpha| \cdot ||\vec{x}||$ - Triangle Inequality:
$||\vec{x} + \vec{y}|| \leq ||\vec{x}|| + ||\vec{y}||$
Theorem: All norms of non-zero inputs are positive.
Proof:
And since we established that the norm of
- 2-norm:
$$||\vec{x}||_2 \coloneqq \sqrt{\sum^n_{i=1} x_i^2}$$ - 1-norm:
$$||\vec{x}||_1 \coloneqq \sum_{i=1}^n |x_i|$$ - ∞-norm:
$$||\vec{x}||_{\infty} \coloneqq \max_{i=1 ... n} |x_i|$$ - p-norm:
$$||\vec{x}||_{p} \coloneqq \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}, p \in (1, \infty)$$ - Let M be a PSD matrix, then M-norm:
$$||\vec{x}||_M \coloneqq \sqrt{\vec{x}^TM\vec{x}}$$
Given any norm
Examples:
-
$||\cdot||_1$ is dual to$||\cdot||_{\infty}$ . -
$||\cdot||_2$ is dual to itself. -
$||\cdot||_p$ is dual to$||\cdot||_q$ iff$\frac{1}{p} + \frac{1}{q} = 1$ , with$p, q \in (1,\infty)$ . - Let
$M$ be a PSD matrix in$\mathbb{R}^{n \times n}$ . Then$||\cdot||_M$ is dual to$||\cdot||_{M^{-1}}$ .
Exercises:
- (Reflexivity) Can you prove that if norm A is dual to norm B, then norm B is dual to norm A?
- (Self-Duality) Can you prove that the 2-norm is the only norm dual to itself?
- (Inversion) Can you prove why the norm of a PSD matrix
$M$ is the$M^{-1}$ -norm?
Let
Aside: A large part of theoretical machine learning involves establishing inequalities and upper bounds for various problems (i.e., minimizing loss). The success of many machine learning models is derived from provable lower/upper bounds of optimization/minimization metrics.
Let
Proof:
Aside: The log function is concave, meaning if you pick any two points along the curve of the log function and then find their average log value, the log value of their midpoint (in terms of x-coordinates) will always be higher than this average. The mathematical way of saying this is:
Combining the previous two equations, we get:
Prove the following norm inequalities. Assume
Part a)
Part b)
Part c)
Part d)
Part e)