Mathematical Background - WilfullMurder/DataStructures-Java GitHub Wiki

Mathematical Background

Introduction:

This section is a brief review of some of the mathematical notations and tools used throughout the analysis, including logarithms, big-Oh notation, and probability theory. It is not intended to be an introduction to mathematics for computing. If you feel you are lost or missing the background (or need a refresher) we encourage you to check out the Resources & References section to read (and do exercises from) the approprite sections of the (free) textbooks on mathematics for computing.

Exponentials & Logarithms

The expression ab denotes the number a to the power of b. If b is a positive integer, then this is the equivelant of b multiplied by itself x-1 times.

When x is a negtive integer, ab = 1/a-b. When b = 0, ab = 1. When a is not an integer, as we will see shortly, we can still define exponentiation in terms of the exponential function ℯx, which is itself defined in terms of the exponential series (check out calculus).

In this study the expression logac denotes the base-a logarithm of c. That being, the unique value b that satisfies ab = c. Most of the logarithms used throughout the study are base 2 (binary logarithms). Therefore, the base is omitted, so that logc is a contraction of log2c.

A handy (if informal) way to think of logarithms is to think of logac as the number of times we have to divide c by a before the result is less than or equal to 1. As an example, when we do a binary search, each comparison reduces the number of possible answers by a factor of 2. This is repeated until there is at most one possible answer. So, the number of comparisons done by a binary search with an initial n+1 possible answers is ⌈log2(n+1)⌉.

Another logarithm that pops up regularly throughout the study is the natural logarithm. For this we use the common notation ln(k) to denote logk, where ℯ (Euler's number) is given by

The natural logarithm crops up frequently because it is the value of a common integral:

Two of the most common manipulations we do with logarithms are removing them from an exponent:

alogab

= b and changing the base of a logarithm

As an example, we can use these two manipulations to compare the natural and binary logarithms

Factorials

The factorial function is used in a couple of places in this study. For a non-negative integer, n, the notation n! (n factorial) is defined to mean n!=1*2*3*...*n. Factorials show up because n! counts the number of distinct permutations, i.e., orderings, of n distinct elements. For the special case n=0, 0! is defined as 1.

The quantity n! can be approximated using Sterling's Approximation

where

Stirling's Approximation also approximates ln(n!):

(Stirling's Approximation is most easily proven by approximating ln(n!)=ln(1)+ln(2)+...+ln(n) by the integral )

In relation to the factorial function are the binomial coefficients. For a non-negative integer, n, and an integer k∈{0,...,n}, the notation denotes:

The binomial coefficient (n choose k) counts the number of subsets of an n element set that have size k, i.e., the number of ways of choosing k distinct elements from the set {1,...,n}.

Asymptotic Notation

When analyzing data structures in this studyk, we want to talk about the running times of various operations. The exact running times will, of course, vary from computer to computer and even from run to run on an individual computer. When we talk about the running time of an operation we are referring to the number of computer instructions performed during the operation. Even for simple code, this quantity can be difficult to compute exactly. Therefore, instead of analyzing running times exactly, we will use the so-called big-Oh notation: For a function f(n), O(f(n)) denotes a set of functions,

. Thinking graphically, this set consists of the functions g(n) where c*f(n) starts to dominate g(n) when n is sufficiently large.

We generally use asymptotic notation to simplify functions. As an example, in place of 5nlog(n)+8n-200 we can write O(nLog(n)). Which we can prove as follows:

5n*log(n)+8n-200 ≤ 5n*log(n)+8n

≤ 5n*log(n)+8n*log(n) for n ≥ 2 (so that log(n) ≥ 1)

≤ 13n*log(n).

This demostrates that the funciton f(n)=5nlogn+8n-200 is in the set O(nlogn) using the constants c = 13 and n0=2.

A number of useful shortcuts can be applied when using asymptotic notation

  1. O(nc1)⊂>O(nc2), for any c1 < c2.
  2. For any consants a,b,c > 0, O(a)⊂O(logn)⊂O(nb)⊂O(cn).
Throughout the study we will abuse this notation by writing things like f1(n)=O(f(n)) when instead we mean f1(n)∈O(f(n)). We will also refer to the running times of operations as 'is' instead of 'is a member of' in order to avoid awkward language and for using asymptotic notation within strings of equations more easily. For example when we write the statement T(n) = 2logn+O(1) we actually mean T(n)≤2logn+[some member of O(1)].

The expression O(1) in itself raises another issue. Since there is no variable in this expression, it may not be clear which variable is getting arbitrarily large. Without context, there is no way to tell. In the example above, since the only variable in the rest of the equation is n, we can assume that this should be read as T(n) = 2logn+O(f(n)), where f(n)=1.

Big-Oh notation has been around for a while, being used by number theorist Paul Bachmann as early as 1894, and is useful for describing the running times of computer algorithms. If we contemplate the following code:

for(int i =0; i < n; i++){ a[i] = i;}

One execution of this loop involves
  • a. 1 assignment (int i = 0)
  • b. n+1 comparisons (i < n)
  • c. n increments (i++)
  • d. n array offset calculations (a[i})
  • e. n indirect assignments (a[i]=i)
So, we could write the running time as T(n) = a + b(n+1) + cn + dn + en, where a, b, c, d, and e are constants that depend on the machine running the code and represent the time to perform assignments, comparisons, increments, array offset calculations, and indirect assignments. However, if this expression represents the running time of two lines of code, then clearly this kind of analysis won't translate very well with more complex code or algorithms. Using big-Oh notation, the running time can be simplified to T(n) = O(n).

So, despite being more compact, it gives us nearly as much information. As the running time of the above example depends on the constants a, b, c, d, and e, in general, it won't be possible to compare two running times to know which is faster without knowing the values of these constants. Even if we make the effort to determine these constants (through timing tests, for example), then our conclusion will only be valid for the machine running the tests.

Big-Oh notation allows us to reason at a much higher level, in turn making it possible to analyse more complicated functions. For example, if two algorithms have the same big-Oh running time, then we will not know which is faster and there might not be an obvious winner. One could be faster on one machine, and the other could be faster on another machine. Yet, if the two algorithms have demonstrably different big-Oh running times, then we can be certain that the one with the smaller running time will be faster for large enough values of n.

We can see a clear comparison of Big-Oh running times with the graph illustrated in Fig.1.

Big-Oh_comparison

As an exapmle of how big-Oh notation allows us to compare two different functions we can compare the rate of growth of f1 = 15n against f2 = 2nlogn. It may be that f1(n) is the running time of a complicated linear time algorithm while f2(n) is the running time of a considerably simpler algorithm based on the divide-and-conquer paradigm. The graph above illustrates that, although f1(n) is greater than f2(n) for small values of n, the opposite is true for large values of n. Eventually f1(n) wins out, by an increasingly wide margin. Analysis using big-Oh notation told us this would happen because O(n)⊂O(nlogn).

In a few cases, we use asymptotic notation on functions with more than one variable. This can be considered to be a fallable methodology, but, seems to be pretty common in comp-sci, however, there seems to be no real standard for this. So, for our purposes, the following definition is sufficient:

This definition captures the situation we actually care about: when arguments n1,...,nk make g take on large values. It also agrees with the univariate definition of O(f(n)) when f(n) is an increasing function of n. Please note that other texts might treat multivariate finctions and asymptotic notation differently.

Randomisation & Probability:

Some of the data structures presented in this study are randomised; they make random choices that are independent of the data being stored in them of the operations being performed on them. Due to this, performing the same operations more than once using these structures could result in different running times. When analysing these data structures we are interested in their average or expected running times.

Customarily, the running time of an operation on a randomised data structure is a random variable, and we want to study its expected value. For a discrete random variable X taking on some countable universe U, the expected value of X, denoted by E[X], is given by the formula

Here Pr{ℰ} denotes the probability that the event ℰ occurs. In all of the examples in this study, these probabilities are only with respect to the random choices made by the randomised data structure; there is no assumption that the data stored in the structure, nor the sequence of operations performed on the data structure is random.

One of the most important properties of expected values is linearity of expectation. For any two random variables X and Y, E[X+Y] = E[X] + E[Y}. More generally, for any random variables X1,...,Xk,

Linearity of expectation allows us to break down complicated random variables (like the left hand sides of the above equations) into sums of simpler random variables (the right hand sides).

A lovely trick that we repeatedly use, is defining indicator random variables. These binary variables are useful when we want to count something. For example, suppose we want to toss a fair coin k times and we want to know the expected number of times the coin turns up heads. Intuition tells us the answer is k/2, but if we try and prove it using the definition of expected value, we get

This requires that we know enough to calculate , and that we know the binomial identities and .

Using indicator variables and linearity of expectation simplifies things. For each i∈{1,...,k}, define the indicator random variable

Then

Now,

Proof_indicatorvariable_simp

This is a bit of a roundabout way, but doesn't require that we know any magical identities or compute any non-trivial probabilities. Better than that, it agrees with the intuition that we expect half of the coin flips to turn up heads because each individual coin turns up heads with a probability of 1/2.
⚠️ **GitHub.com Fallback** ⚠️