Interestingly, this gives a theoretical basis for regularization: minimizing some other term in addition to loss function by minimizing Rademacher bound.
Consider a fixed $m, \delta$. We vary the function class $\mathcal{F}$.
Fix the values $C, m$. How should we minimize this upper bound on $L(f)$?
We use $l_2$-regularization and design a loss function $l(w, s) = \widehat{L}_S(w) + \lambda \| w\|_2^2, \lambda > 0$. This is the SVM.
VC-Dimension
For a hypothesis class $\mathcal{H}$ and sample $S$ of $m$ points, the restriction$\mathcal{H|}_S = \{(h(x_1), \ldots, h(x_m)): h \in \mathcal{H}\}$. (Total # of ways to classify $S$). We noticed that while $|\mathcal{H}|$ may be $\infty$, $|\mathcal{H|}_S| \leq 2^m.$
We define the growth function of $\mathcal{H}$ as the max number of ways to classify $m$ points:
$$\Pi_{\mathcal{H}}(m) = \max_{S \subseteq X, |S| = m} |\mathcal{H|}_S|$$
$\exists$ a set of size $1$ which is shattered? Yes! (interval includes the point or excludes the point.)
$\exists$ a set of size $2$ which is shattered? Yes! (interval includes both points, excludes both points, and 2 cases where includes one point but excludes the other.)
No set of size $3$ is shattered. Along the number line, a set of $3$ points with labels $+1, -1, +1$ cannot occur because you cannot use multiple intervals. Thus, $VC(\mathcal{H}) = 2$.
$\mathcal{H} = \{\text{axis-aligned rectangles in } \mathbb{R}^2 \}$ For some $a, b, c, d \in \mathbb{R}$.
$\exists$ a set of $3$ shattered points? Yes! Can shatter $3$ points as long as they are not colinear (A set of $3$ points that form a triangle can be classified all $8$ ways).
$\exists$ a set of $4$ shattered points? Yes! (choose a rhombus that is not aligned with an axis).
The convex hull is the smallest convex set containing all points. We prove that no $5$ points can be shattered.
Claim: a point inside the convex hull cannot be made $-1$ if others are $+1$ or if all points are on the convex hull. Additionally, you cannot make one of them $-1$ and others $+1$.
Thus, $VC(\mathcal{H}) = 4$.
What does $VC(\mathcal{H}) = \infty$ mean? It means that hypotheses in the class can shatter any number of points.
Example:
$\mathcal{H} = \{ \text{all convex polygons in } \mathbb{R}^2\}$.
Put all points on a circle and create a polygon where the number of sides is $\max\{\text{number of +1's}, \text{number of -1's}\}$. This arrangement can shatter any number of points.
$\mathcal{H} = \{ x \mapsto \text{sgn}(\sin(\alpha x)): \alpha \in \mathbb{R}\}$.
$VC(\mathcal{H}) = \infty$. We can adjust the frequency to shatter any number of points.
This is interesting! We only have $1$ parameter $\alpha$ in the class, yet the complexity of the class with VC dimension is $\infty.$
Homework
Graded. What is the VC-dimension of a union of $k$ intervals on the real line?
Graded. What is the VC-dimension of axis-aligned hyperrectangles in $\mathbb{R}^n$? An axis-aligned hyperrectangle $A$ in $\mathbb{R}^n$ is defined by $A=[x_1,y_1]\times \cdots \times [x_n, y_n]$ for $x_1,y_1,\dots,x_n,y_n\in\mathbb{R}$.