# CS 7545: Machine Learning Theory -- Spring 2024

## Exam Details:

No calculator needed. Cheat sheet allowed with "normal" size paper front and back.

## Terms:

Recall $X$ is in input space, $D$ is distribution on $X \times$ { -1, 1 }, $\mathbb{H}$ is Hypothesis Class.

### No-free-Lunch Theorem Review

Ensures that $\forall \mathbb{H}$, there is a distribution that is at least upper bound mentioned in previous lecture.

### Generalization error: $$L(h) = Pr[h(x) \neq y]$$

$$S = {(x_i, y_i)}_i=1^m ~ D^m$$ Remember that S consists of i.i.d variables.

Empirical error: $$\hat{L}_s(h)= \frac{1}{m} \sum _{i=1}^m \mathbb{1} (h(x_i)\neq y_i)$$

ERM: $$\hat{h} = argmin_{h \in \mathbb{H}} \hat{L}_s(h)$$

$$h^* = argmin_{h \in \mathbb{H}} L(h)$$

$$L(\hat{h}) - L(h) = (L(\hat{h}) - L(h^*)) + (L(h^*) - L(h))$$

Estimation error: $(L(\hat{h}) - L(h^*))$

Approximation error: $(L(h^*) - L(h))$

Prop: $$L(\hat{h}) - L(h^*) \leq 2 \sup_{h \in \mathbb{h}} \left| L(h) - \hat{L}_s(h) \right|$$

We now want to bound the supremum, which depends on the sample size $m$. Generally, as m gets larger, the deviation of the sample from the true distribution gets smaller.