CS 7545: Machine Learning Theory -- Spring 2024

Practice Problems for Exam 1

Norm Inequality in Unit Ball: For $x \in \mathbb{R}^N$ and $p > 1$, show that if $\|x\|_p = 1$, then $\|x\|_\infty \leq 1$. Is the converse true? Justify your answer.

Solution: If $\|x\|_p = 1$, then by definition:

$$\left(\sum_{i=1}^N |x_i|^p\right)^{1/p} = 1$$

This implies that for each $i$, $|x_i|^p \leq 1$, and hence $|x_i| \leq 1$. Therefore, $\|x\|_\infty \leq 1$. The converse is not necessarily true, just consider the all-1's vector, which has $\infty$-norm equal to 1, but p-norm equal to $N^{1/p}$.

Equality of Vector Norms: Provide an example of a vector $x \in \mathbb{R}^N$ such that $\|x\|_1 = \|x\|_2 = \|x\|_\infty$. Explain why this equality holds for your example.

Solution: The all-zeros vector satisfies this equality

Cauchy-Schwarz Inequality in Probability Distributions: Let $p \in \Delta_N$ be a probability distribution. Use the Cauchy-Schwarz inequality to prove that:

$$\left(\sum_{i=1}^N p_i^2\right) \left(\sum_{i=1}^N \frac{1}{p_i^2}\right) \geq N^2$$

Solution: The Cauchy-Schwarz inequality states that for any vectors $a, b \in \mathbb{R}^N$,

$$\sum_{i=1}^N a_i b_i \leq \sqrt{\left(\sum_{i=1}^N a_i^2\right) \left(\sum_{i=1}^N b_i^2\right)}$$

Now let us choose the vectors $a, b \in \mathbb{R}^N$ to be as follows: $a_i = p_i$ and $b_i = 1/p_i$. Applying Cauchy-Schwarz we get:

$$\sum_{i=1}^N p_i \cdot \frac{1}{p_i} \leq \sqrt{\left(\sum_{i=1}^N p_i^2\right) \left(\sum_{i=1}^N \frac{1}{p_i^2}\right)}$$

and after squaring both sides,

$$\implies N^2 \leq \left(\sum_{i=1}^N p_i^2\right) \left(\sum_{i=1}^N \frac 1 {p_i^2} \right)$$

Strong Convexity: Show that if a differentiable function $f$ is convex and $\lambda > 0$, then the function $g(x) := f(x) + \frac \lambda 2 \|x\|^2$ is $\lambda$-strongly convex.

Solution:

Since $f$ is differentiable and convex, we have that for any $x,y \in \text{dom}(f)$,

$$f(y) - f(x) - \nabla f(x)^\top (y - x) \geq 0. \quad \quad \text{(*)}$$

Now let's consider the same quantity of the LHS, but for $g(\cdot)$. Note that $\nabla g(x) = \nabla f(x) + \lambda x$. Thus we have that

$$g(y) - g(x) - \nabla g(x)^\top (y - x) = f(y) - f(x) - \nabla f(x)^\top (y - x) + \frac \lambda 2 (\|y\|^2 - \|x\|^2) - \lambda x^\top(y-x).$$

$$\geq \frac \lambda 2 (\|y\|^2 - \|x\|^2) - \lambda x^\top(y-x) = \frac \lambda 2 \|y - x\|^2.$$

The inequality follows by simply using (*), and the final equality comes from rearranging terms. However, we are now done, since the above inequality is the definition of strong convexity of $g$.

Fenchel Conjugate of Exponential Function: Given a convex function $f(x) = e^x$, determine its Fenchel conjugate $f^*(y)$.

Solution: The Fenchel conjugate of $f$ is given by $f^*(y) = \sup_{x \in \text{dom}(f)} \{yx - f(x)\}$. In this case, $f^*(y) = \sup_{x} \{yx - e^x\}$. Setting the derivative of the inner function to zero gives the maximizing $x = \ln y$ and subsequently $f^*(y) = y \ln y - y.$.

Modified Random Walk Probability: In a classic random walk, where on each time period you take one step forward or backward with equal probability, it is a well-known fact that after $n$ time periods you typically end up about $\sqrt{n}$ steps away from where you started. Consider a modified random walk where on each time period $i$, you take a step forward or backward with equal probability, and the step size is $i^{-1/2}$. Let $X_i$ be the random variable representing the step taken at time $i$, with $X_i = i^{-1/2}$ or $X_i = -i^{-1/2}$ with equal probability. Show that after $n$ time periods, the probability that the random walk is more than $10\sqrt{\log n}$ distance from the start is less than 1 in a million.

Solution: We denote the sum of steps by $S_n = \sum_i X_i$. The expected position after $n$ steps is $E[S_n] = 0$ since the walk is symmetric. We want to bound the probability:

$$P(|S_n| > 10\sqrt{\log n}).$$

By Hoeffding's Inequality, for independent bounded random variables $X_i$ with $a_i \leq X_i \leq b_i$, we have:

$$P\left(\left|\sum_{i=1}^n (X_i - E[X_i])\right| \geq t\right) \leq 2\exp\left(-\frac{2t^2}{\sum (b_i - a_i)^2}\right).$$

For our random walk, $a_i = -i^{-1/2}$ and $b_i = i^{-1/2}$, hence $(b_i - a_i)^2 = 4i^{-1}$. The sum $\sum (b_i - a_i)^2$ then becomes $4\sum_{i=1\cdots n} i^{-1}$, which is approximately $4\log n$ for large $n$. Setting $t = 10\sqrt{\log n}$, we get:

$$P(|S_n| > 10\sqrt{\log n}) \leq 2\exp\left(-\frac{2(10\sqrt{\log n})^2}{4\log n}\right) = 2\exp\left(-50\right).$$

Since $2\exp(-50)$ is much less than $1/1,000,000$, we conclude that the probability of the random walk being more than $10\sqrt{\log n}$ steps away from the start after $n$ periods is less than 1 in a million.

Classification with Abstention: Let $\mathcal{X}$ be an input space and $\mathcal{D}$ be a distribution on $\mathcal{X}\times \{-1,1\}$. Recall that $\eta(x)=\Pr[Y=1|X=x]$.

In some situations, it makes sense for a classifier to abstain (i.e., predict "I don't know") on instances $x\in\mathcal{X}$ for which it is unsure. Suppose we evaluate the classifier as follows:

If the classifier makes a prediction (either $-1$ or $1$), it incurs a cost of $0$ if it is correct and a cost of $1$ if it is incorrect.
If the classifier chooses to abstain, it incurs a cost of $\theta\in [0,\frac{1}{2}]$.

Which classifier $h:\mathcal{X}\to\{-1,1,\text{ abstain}\}$ has the lowest expected error for this problem with respect to $\mathcal{D}$? Write $h(x)$ as a function of $\eta$ and $\theta$.

Solution: For a given $x$, the expected cost incurred by a classifier is $\theta$ if it abstains, $\eta(x)$ if the classifier predicts $-1$, and $1-\eta(x)$ if the classifier predicts $1$. The best classifier should choose the option which incurs the minimum of these three costs for each $x$. Hence, it should abstain when $\theta=\min(\theta, \eta(x), 1-\eta(x))$. Therefore,

$$h(x) = \begin{cases} \text{abstain} & \theta < \eta(x) < 1-\theta, \\\ 1 & \eta(x)\geq 1-\theta, \\\ -1 & \eta(x) \leq \theta. \end{cases}$$

CS7545_Sp24_Practice_Exam_1 - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Practice Problems for Exam 1

⚠️ GitHub.com Fallback ⚠️

CS7545_Sp24_Practice_Exam_1 - mltheory/CS7545 GitHub Wiki

CS 7545: Machine Learning Theory -- Spring 2024

Practice Problems for Exam 1

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️