CS7545_Sp23_Lecture_13: VC Dimension - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2023

Instructor: Tyler Labonte

Notes for Lecture 13

February 21, 2023

Scribes: Nazal Mohamed, Paul Horton

VC Dimension:

Recap (for more details, refer to Lecture 11:

Given a set:

$$S = \left\{(x_i, y_i)\right\}_{i=1}^m \sim D^m$$ $$x_i \in X$$ $$y_i \in \{-1,1\}$$ $$\mathcal H \subseteq \left\{h:X \rightarrow \left\{-1,1 \right\}\right\} $$$$

Definition: The restriction of $\mathcal H$ to $S$ denoted by $\mathcal H|_s$ is

$\mathcal H|_s = { (h(x_1),...,h(x_m)):h \in \mathcal H}$

Definition: The growth function $\Pi_{\mathcal H}(m) = \max_{S \subseteq X, |S| = m} |\mathcal H|_s|$

The growth function is bounded by $\Pi_{\mathcal H}(m) \leq 2^m$ for a binary function.

Definition: A set of $S$ of size $m$ is shattered by $\mathcal H$ if $|\mathcal H|_s| = 2^m$

$$\iff \Pi_{\mathcal H}(m) = 2^m$$

Definition: The VC-dimension

$$VC_{dim}(\mathcal H) = \max \left\{m: \Pi_{\mathcal H}(m) = 2^m \right\}$$

This is the largest set shattered by $\mathcal H$.

Note: If the $VC_{dim}(\mathcal H) = d$, Then:

  1. There exists a shattered set of size d.
  2. There does not exist a shattered set of size d+1.

Remark: For any finite length hypothesis set $\mathcal H < \infty$:

$$VC_{dim}(\mathcal H) \leq \log_2(|\mathcal H|) $$

image

Examples (Cont.):

  1. Closed intervals in $\mathbb{R}$: $VC-dim(\mathcal{H}) = 2$

$$\mathcal{H} = \left \lbrace h : h(x) = \begin{cases} 1 & \text{if } x \in [a,b] \\ -1 & \text{otherwise} \end{cases}: a, b \in \mathbb R \right \rbrace$$

For detailed discussion, refer to Lecture 11

  1. Axis-aligned rectangles in $\mathbb{R}^2$: $VC-dim(\mathcal{H}) = 4$.

$$\mathcal H = \left \lbrace h = \begin{cases} \text{+}1 & \text{if } x\in [a, b] \text{ and } y\in [c, d] \\ \text{−}1 & \text{otherwise} \end{cases}: a, b, c, d \in \mathbb R \right \rbrace $$

In Lecture 11, we saw that there exists a set of 4 points that can be shattered. Now, we need to show that no set of 5 points can be shattered by axis-aligned rectangles in $\mathbb R^2$.

Let $\lbrace (x_i, y_i) \rbrace_{i=1}^5$ be any set of 5 points in $\mathbb R^2$.

Let $m_x = \mathrm{min} \lbrace x_i \rbrace_{i=1}^5$, $M_x = \mathrm{max} \lbrace x_i \rbrace_{i=1}^5$.

Similarly choose $m_y$ and $M_y$ for the $y_i$ values. If we construct a rectangle $[m_x, M_x] \times [m_y, M_y]$, two situations could arise: the $5^{th}$ point is on the rectangle or inside the rectangle.

In either case, the points cannot be separated if the $5^{th}$ point has a $-$ label and the remaining points have $+$ label.

image

  1. $\mathcal H = \lbrace h : \text{Halfspace in }\mathbb R^2 \rbrace$: $VC-dim(\mathcal H) = 3$.

Halfspaces in $\mathbb R^2$ are lines. The separating line should be such that points on one side of the line should be labeled $+$ while points on the other side should be labeled as $-$. For a set of three points which are not collinear, irrespective of the labeling of the points, this can be easily achieved by the line separating the two points with the identical label from the other point. No set of four points can be shattered by a line. If you arrange the points on a circle and label them as shown in the figure below, it can be seen that no line can separate them.

image

  1. $\mathcal H = \lbrace h : \text{Halfspace in } \mathbb R^n \rbrace$: $VC-dim(\mathcal H) = n+1$.

We can find a set of $n+1$ points in $\mathbb R^n$ and construct a hyperplane that separates them irrespective of their label.

Let $x_0, x_1, \ldots, x_{n}$ be a set of points in $\mathbb R^n$ such that $x_0$ is the origin and $x_i$ has 1 in the $i^{th}$ coordinate.

Let $y_0, \cdots, y_{n}$ be an arbitrary set of labels for these points (i.e, $y_i \in \lbrace -1,1 \rbrace$).

Let $a^Tx + b$ be the hyperplane that separates these points with $a^Tx_i + b \ge 0$ if $y_i = 1$ and $a^Tx_i + b < 0$ if $y_i = -1$.

Plugging in the chosen coordinates $x_1, \cdots, x_{n+1}$, we get that $a_i + b \ge 0$ if $y_i = 1$ and $a_i + b < 0$ if $y_i = -1$.

In the case of $x_0$, $b \ge 0$ if $y_0 = 1$ and $b < 0$ if $y_0 = -1$.

The most obvious choice of such $a$ and $b$ are $a = (y_0, y_1, \cdots, y_n)^T$ and $b = \frac{y_0}{2}$ satisfies these requirements. Thus, we get a hyperplane that separates the chosen $n+1$ points irrespective of their labels in $\mathbb R^n$.

In the next step, we use Radon's theorem to show that no set of $n+2$ points can be shattered in $\mathbb R^n$.

Theorem (Radon's): Any set $X$ of $n+2$ points in $R^n$ can be partitioned into two subsets $X_1$ and $X_2$ such that $Conv(X_1) \cap Conv(X_2) \neq \emptyset$.

Theorem Proof: Refer Theorem 3.4 in Foundations of Machine Learning

Coming back to our problem, consider any set $X$ of $n+2$ points in $\mathbb R^n$.

By Radon's theorem, they can be partitioned into two sets $X_1$ and $X_2$ such that their convex hulls intersect.

Note that if two sets of points can be separated by a hyperplane, then their convex hulls can also be separated by the same hyperplane (For more details, see Hyperplane Separation Theorem).

Therefore, $X_1$ and $X_2$ cannot be separated by a hyperplane. An illustration in $\mathbb R^2$ is given in the figure below.

image

  1. $\mathcal H = \lbrace \text{all convex polygons in } \mathbb R^2 \rbrace$ has $VC-dim(\mathcal H) = \infty$.

Depending on the number of points and labels, a convenient polygon can be chosen to shatter any number of points (see fig for illustration).

image

image

  1. $\mathcal H = \lbrace h(x) = \sin \alpha x: \alpha \in \mathbb R \rbrace$ has $VC-dim(\mathcal H) = \infty$.

Depending on the distribution of the points and labels, an $\alpha$ can be chosen to shatter any number of points.

⚠️ **GitHub.com Fallback** ⚠️