Orders of Growth, Least Squares Approximation - psholtz/MIT-SICP GitHub Wiki

Although the question doesn't ask for this level of depth, I thought it would be interesting to do a little numerical analysis to see how well the result we obtained in Exercise 1.14 -- namely, that the number of operations required to calculate (cc n 5) should scale as O(n^5) -- held up in practice. I took 100 sample points and, perhaps predictably, the matrix obtained was very poorly conditioned -- so poorly, in fact, that any "results" obtained were likely of a highly dubious nature.

Nevertheless, I thought I would briefly touch upon the subject of orthogonal vector spaces here, and on how these can be used to obtain the least squares approximation for what would otherwise be an inconsistent system of linear equations.

Suppose we have an n-dimensional linear system:

img

This system only has a solution if b happens to fall within the range of the operator A.

This raises the question of whether there is anything further we can do if b is not within the range of A?

Suppose we rewrite the linear system as follows:

img

An exercise in optimization immediately suggests itself: even if b is not within the range of A, so that this equation cannot be solved "exactly", it should still be possible to consider the expression above and find a value (or values) for x which minimizes it -- that is, which bring the expression as close as possible to 0.

Geometrical Intuition

If we cycle through all possible vectors x, then Ax is simply the set of all vectors in Range(A). So the minimum of |Ax - b| can be thought of as a "distance" from b to Range(A). Therefore, the value of x which minimizes |Ax - b| will be the solution to the linear system:

img

where P_Ran(A) is the "orthogonal projection" operator that projects its argument onto the linear space defined by Range(A).

If we know an orthogonal basis v_k in Ran(A), we can construct Ran(A) x b as follows:

img

If we simply have a basis (possibly non-orthogonal), we must use a technique as, for instance, Gram-Schmidt, to orthogonalize it.

This establishes the existence of a least squares solution for the linear system, but there is a simpler technique that can be used to actually compute the result numerically.

Normal Equation

First we note that: Ax is in the orthogonal projection P_Ran(A) x b <==> b - Ax is orthogonal to Ran(A).

Suppose that a_1, ..., a_n are the columns of the matrix A. Then the condition

img

can be rewritten as:

img

where k ranges from 1 to n (inclusive).

Rewriting this in the language of inner products, we have:

img

Bearing in mind that the inner product is defined as:

img

That is, we can perform "traditional" matrix multiplication between the (1,n) vector y* and the (n,1) vector x, to obtain the (1,1) "scalar" inner product.

Returning to our equation, we have:

img

img

and moreover, this must hold true for all k as k ranges from 1 to n. Suppose we build an n-dimensional "column" vector as follows:

img

Joining these n rows together, we can now express this succinctly as:

img

where the symbol "0" indicates an n-dimensional column vector.

Simplifying further:

img

which is the so-called "normal equation".

Solution of this equation for x yields the least squares approximation for the linear system Ax = b.

In order that the solution x be unique, it is necessary that the matrix A*A be invertible.

Numerical Example

As stated in the problem set, I collected samples of size 100 for (cc n k), where n ranged from 1 to 5. The goal was to see how well the empirical invocation count correlated with the expected invocation count of being polynomial in degree n. It turns out that for the larger values of k (e.g., k = 4 and 5), the resulting matrices tend to be very poorly conditioned, leading to what are probably highly inconclusive results at best.

Let's look instead at the behavior of (cc 2 k) for a sample of size 20.

The invocation count for (cc 2 k) as k ranges from 1 to 20 inclusive, is as follows:

5 7 9 11 13 19 23 27 31 35 43 49 55 61 67 77 85 93 101 109

Since we are trying to fit a quadratic curve, the matrix A that we set up will look like the following:

img

That is, each row will consist of three columns: 1, k and k^2, as k ranges from 1 to n.

The column vector b will simply be the empirical invocation counts that we measured, and we're going to solve for x:

img

img

Clearly, A is not invertible in the conventional sense, since it is not even square. We proceed by applying the normal equation and solving for x in Matlab:

img

img

That is to say, the quadratic best-fit line we are seeking is given by:

[ y(x) = 0.19733 x^2 + 1.42729 x + 2.76595 ]

The scatter-plot and corresponding quadratic curve, rendered in Octave, look like:

We can see that the O(n^2) quadratic curve fits the scatter plot quite well.

⚠️ **GitHub.com Fallback** ⚠️