Radix Sort - Eishaaya/GMRDataStructuresTest GitHub Wiki

Digit Based Sorting

Radix sort is another O(n) algorithm that is conceptually very similar to counting sort, in fact, it even runs counting sort as a sub-procedure. Now, if you remember, counting sort tends to eat an extremely large amount of memory, which is obviously far from ideal. Radix Sort solves this by running a counting sort for every digit. This lets us only have to allocate 10* buckets no matter how wide our dataset ranges. The concept is fairly simple, just run a counting sort for every digit in increasing significance, since counting sort is stable, the order of lesser significant digits are preserved, while still superseded by more impactful ones.

  • Technically it's completely possible to sort with a radix of any size (radix being how many buckets we have), in this example, it's a radix, of 10, and a counting sort uses a radix of k, where k is the range of inputted data.

    ^An IBM card sorter in 1950 with a radix of 13

Implementation Guide

Simple Radix Sort

Parameter: uint[] or List<uint>

Returns: void

For every digit, beginning with the least significant, count the amount of times each value is present in the dataset. This stores how many slots in our output array need to be allocated for each value. After this, offset each bucket by the previous's value (in ascending order), so that each bucket stores the final index that each value will occupy. We then loop backwards through our input set (since we want to preserve order), and based off the bucket corresponding to each input's focused digit, place the input into the output array. We then decrement the bucket, so we avoid overlap. Afterwards, just copy the output array to the input array, and proceed to the next significant digit.


Visualization


In this example, we start out with 6 values with a range from 3 - 884, in a counting sort, we would have a radix of 885 (in an unoptimized counting sort), or 882 (in an optimized one). Since this is a radix sort with a constant radix of ten however, our radix is just well...10.


So, we begin by counting the values of the least significant digit


Then offsetting by the previous count to avoid different buckets pointing at the same indices.


Now we just have to iterate back through our inputs and place them into the output array using the positions stored in our buckets (after decrementing of course) as indices!
We need to iterate backwards due to the fact that the buckets store the last index a given value will occupy. If we don't, we will not maintain sort stability, and therefore the position of lesser significant digits in later passes.


Continuing, we just need to use our previous outputs as inputs and sort by the next index


And finishing with one more iteration and a fully sorted array


Improvements

  • Negatives, and Range Compression

    Using the exact same method as in counting sort, we can shrink our range and adjust for negatives in radix sort. While negatives are useful for obvious reasons, you might be wondering why decreasing the range matters when it doesn't affect how much storage our sort uses. Well, for every order of magnitude relevant to our radix, we decrease our total iterations. For example: if we have numbers ranging from 5300-5700, and we ran a sort on all four digits, that's basically an extra 2n operations compared to treating our dataset as if it ranged from 0 - 400!

Sorting more than integers (again)

Keyed Radix Sort

Parameter: T[] or List<T> in which T has some sort of accessible integer key

Returns: void

Just like counting sort, Radix sort can be written to work with any type of value expressible as an integer (So, technically, anything). it's the same process, but we use the integers as keys instead of values. Our buckets can also be converted into pigeonholes (lists of values) instead of just storing quantities.


Assignments

ordered in ascending difficulty


  • Create a base 10 radix sort
  • Create a variable base radix sort
  • Create a postman's sort (a base 26 radix sort for strings, that uses every character as a digit)
  • Create a base 2 bitwise radix sort
  • Create an MSD (most significant digit) radix sort
  • Create a Burstsort


Closing Notes


n = number of inputs, k = range of inputs, r = radix

Radix sort is by far an improvement over counting sort, but even so, both of them are categorized as O(n) algorithms. As stated last time, if we're more accurate, counting sort runs at O(n + k), and if we analyze radix sort, we can see it's O(nlogr(k) + rlogr(k)). Which might seem worse at first glance, but it's actually much better the vast majority of the time.
If you notice, when we look at our radix sort equation when r = k, it's actually equivalent to counting sort's cost of O(n + k)

For example, using these formulas, a dataset with 100 items with a range of 10000, our counting sort would run take ~10100 operations, and that same dataset would take ~440 operations in a radix sort with a radix of 10!

Let's compare Radix to a more...let's say usable sort than counting sort, say quick sort: Quick sort has a time complexity of O(nlog(n), which is worse, on paper, to O(n)! However, with our nuanced formula for radix's operational cost, the better algorithm depends vastly on use case. Continuing with our earlier example, Quick sort would take ~664 operations if we follow its time complexity alone. This is worse than ~440, obviously, but, in a case with a larger range, or a case where the cost of converting our data into integer keys is relatively heavy, quick sort (and other comparative sorts for that matter) wins out.

Why Big O can't always be trusted

Now, these numbers should always be taken with more than a grain of salt, because big O is meant to show how an algorithm's performance scales, how it grows given input size. It does this as a mathematical upper bound, which means that our big O function is basically >= the mathematical function corresponding to our algorithm's performance for a given case. Additionally, as a bounding function, loads of pretty important information is just abstracted away by definition, since it just isn't impactful enough as n approaches infinity:
For example, since r and k do not necessarily change as n approaches infinity, their values are just abstracted away, leaving just n behind

If this doesn't fully make sense, or if you find it interesting, click here for a far more detailed explanation!


References


<--Previous | Next-->

⚠️ **GitHub.com Fallback** ⚠️