Tensors‐and‐Dimensions - TheEvergreenStateCollege/upper-division-cs-23-24 GitHub Wiki

AI Self-Hosting, Spring 2024

(Work in progress)

In working with neural networks and Python frameworks that are meant to operate on large amounts of training data in batches, we have been encountering vectors, matrices, and tensors of higher rank.

This is a short monogram to define some common terms in dealing with these objects and is necessarily brief to cover enough for doing most programming for machine learning and artificial intelligence research.

For more details and high-quality examples, we recommend consulting a linear algebra textbook such as Gilbert Strang's, 6th edition.

Vectors and Linear Combinations

A vector represents an ordered collection of elements where only one coordinate is needed. For example, a vector of $n$ real numbers can be written as

$$ \overrightarrow{v} \in \mathbb{R}^n $$

where we use a traditional notation of $\overrightarrow{v}$ to denote that a vector can be thought of having a magnitude $|\overrightarrow{v}|$ as well as a direction.

We can also write the vector in terms of its integer components as

$$ \arrow{v} = \left[ \begin{matrix} v_1\ v_2\ \ldots\ v_n\ \end{matrix} \right] $$

where $v_i \in \mathbb{R}$. The fact that we can refer to the elements of $v$ with one coordinate $i$ means they lie along one axis.

A vector can be represented as a linear combination of other vectors

$$ \hat{v} = a_1 \hat{b_1} + a_2 \hat{b_2} + \ldots + a_n \hat{b_n} $$

Linear Dependence and Independence

Two vectors $\hat{b_1}$ and $\hat{b_2}$ are linearly independent if one cannot be expressed as a scaled version of the other.

That is, there does not exist a $c_1$ such that $\hat{b_1} = c_1 \hat{b_2}$

or a $c_2$ such that $\hat{b_2} = c_2 \hat{b_1}$

An example of two linearly-independent vectors are

$$ \hat{b_1} = \left[ \begin{matrix} 1\ 2\ \end{matrix} \right] $$

$$ \hat{b_2} = \left[ \begin{matrix} 2\ 1\ \end{matrix} \right] $$

No matter how you scale the two vectors relative to each other, one has a first component that is twice its second component, and the other has a second component that is twice its first component.

If there are such scaling constants, we say that the vectors are linearly dependent.

Vector Spaces and Bases

So far we have not said anything about the constants $a_i$ in linear combinations. We will consider only real numbers for now, although vectors can be defined over any ring with $+$ and $\cdot$ operators, and the resulting vectors will add and multiply isomorphically to the way our real vectors here do.

Since there are infinite real numbers, given a finite set of base vectors

$$ \left{ \hat{b_1}, \hat{b_2}, \cdots, \hat{b_n} \right } $$

We can define an infinite number of vectors formed by choosing all combinations $c_i \in \mathbb{R}$

$$ \left{ \hat{v} : \hat{v} = c_1 \hat{b_1} + \cdots + c_n \hat{b_n} \right } $$

On this vector space on the reals, we can define an operation that takes two vectors to a scalar.

$ \left< \cdot , \cdot \right> \rightarrow \mathbb{R} $

If this operation satisfies four properties described here it is called an inner product. For our purposes, you can think of the inner product as measuring the overlap between two vectors.

If a vector is normalized to its magnitude (it has a magnitude of 1), then its overlap with itself will also be one.

$$ \left< \frac{\hat{v_1}}{|\hat{v_1}|} , \frac{\hat{v_1}}{|\hat{v_1}|} \right> = 1 $$ where an overlap of $1$ means two vectors have identical magnitude and direction.

This infinite vector set, together with the inner product, is called a vector space (somtimes an inner product space).

The standard inner product, called a dot product, on our vectors over $\mathbb{R}^n$ is calculated by summing the squares of each component

$$ \hat{v_1} = \sum_{i=0}^n = a_i \hat{b_i} $$

$$ \hat{v_2} = \sum_{i=0}^n = c_i \hat{b_i} $$

$$ \left< \hat{v_1} , \hat{v_2} \right > = \sum_{i=0} a_i \cdot c_i $$

Orthogonality and Bases

We will not p