BigODetail - Eishaaya/GMRDataStructuresTest GitHub Wiki

It's math time!
Mathematically, O(g(n)) is valid when c * g(n) >= f(n) where n > N0, and where c & N0 are any positive constants; for example:

  • f(n) = n + 5000, g(n) = n:
    • If c > 1, then c * g(n) will eventually meet and surpass f(n), like if c == 2, and N0== 5000, then 5000 + 5000 == 2 * 5000, and so forth
  • f(n) = 1000 * n, g(n) = n:
    • If c => 1000, no matter the value of N0, then c * g(n) will always be >= f(n)
  • f(n) = n2 + 10n, g(n) = n:
    • c * g(n) will be < f(n) at some point no matter what N0 and c happen to be, so this is an invalid upper bound.
    • Let's try again, with g(n) = n2:
      • If c > 1, for example, when c == 2, and N0== 10, the two functions meet at N0== 10, and g(n) will exceed f(n) from that point onward. Therefore, abstracting out 10n, as if it were a constant!
  • f(n) = (0.3n)3, g(n) = n2:
    • For convenience' sake, let's say c == 1, although it really doesn't matter either way. In this case g(n) is ahead of f(n) for awhile, however, after a certain point (~37.4), the former is exceeded by the latter. In this case the exponent value of 3 can not be abstracted out, despite 3 itself being constant.
    • Again, let's give it another shot, with g(n) = n3:
      • Starting off, let's try to abstract out that 0.3:
        • Using exponent rule, we can express (0.3n)3 as 0.33n3, or just 0.027 * n3. Which means we can abstract out that 0.027, so our function must scale by n3!
      • From here, it should be fairly simple: where c >= .027, regardless of whatever N0 may be, g(n) >= f(n)

For all of the problems above, we found a tight upper bound, which is naturally the most useful. However, if you remember, all c * g(n) had to do was be greater than or equal to f(n), meaning for any of these cases, I could have just said O(∞) while still being correct! Beyond all that, all that stuff we abstracted out could be pretty important, like in the second case, with f(n) = 1000 * n, we could possibly be missing a massive modifier, or in the first case, of f(n) = n + 5000, where we could be preforming terribly with smaller quantities of data.

Furthermore, in the previous examples, we were just bounding fairly simple and straightforward cases, however algorithms aren't perfectly expressible as mathematical functions. Take bubble sort, in its best case, being handed fully sorted data, it runs through the data once, obviously making it O(n). In its worst case however, it preforms at O(n2). n has no bearing on the performance of these cases. They are two different functions. Oftentimes, big O is used to refer to the upper bound of the worst case, but that's not a hard rule. Take quicksort, it's generally referred to as O(nlog(n)), when its worst case is also O(n2). The key difference is in the average case: quicksort generally scales in a manner that can be bounded by O(nlog(n)), where bubble sort cannot be bounded by O(n). This sort of thing isn't a fault with big O, but rather human communication skills.

So, other than possibly important variables being abstracted out of these formulas, what else is there to consider? As a notation for measuring algorithms...big O completely disregards implementation. Individual operations across languages and hardware may not have equivalent speeds, which makes it no substitute for a benchmarking test.

One more thing to note, there's actually 5 notations:

  • Big Oh: O
    • The upper bound for a given case of an algorithm, basically: >=. It's often used as a blanket term, but it generally refers to the tightest upper bound of either the average or worst case.
  • Big Omega: Ω
    • The lower bound for a given case of an algorithm, basically: <=. This is generally less useful than big O, since usually how badly something can scale is more useful than knowing how well it can.
  • Big Theta: Θ
    • The tight bound for a given case of an algorithm, basically: ==. Oftentimes, this is misattributed to big O. Big theta is only known when both O & Ω can be expressed as the same function.

Quick thing about explicitly loose bounds: The rule is a bit different for these, instead of any possible c fulfilling the requirements after a chosen N0 for g(n), every possible c must have an N0 that fulfills said requirements.

  • Little Oh: o
    • The loose upper bound for a given case of an algorithm, basically >. There is a large overlap between this and O, but this is useful when a tight upper bound is either unknown or just for simplicity's sake.
    • For example, if f(n) = n then g(n) cannot be n since when b <= 1, there is no point where where c * n > n, much less an N0 where this is true. g(n) could be something like nlog2(n), n2, etc.
  • Little Omega: ω
    • The loose lower bound for a given case of an algorithm, basically <. This has the same relationship to Ω as o does to O.


<--Back

⚠️ **GitHub.com Fallback** ⚠️