Why we need efficiency

If you have a bit of experience programming, you should be able to see that the operations supported by the most commonly used data structures are not difficult to correctly implement. We could store data in an array or list and each operation can be implemented by iterating over all elements of the array or list (Brute-forcing).

An easy implementation, but not very efficient. However, considering how fast computers have become, maybe the obvious solution is good enough, so why the need for efficiency?

Number of operations:

Imagining an appliction with a moderately-sized data set of one million items (10⁶), we can (in most applications) reasonably assume that the application will want to look up each item at least once. In the brute-force example that would mean at least one million (10⁶) searches. If each of these 10⁶ searches needs to inspects all 10⁶ items we end up with 10⁶*10⁶=10¹² (one thousand billion) inspections.

Processor speeds:

Computer speeds are at most a few gigahertz (billions of cycles per second), and each operation typically takes a few cycles. So, a single-core 1 GHz desktop computer can not do more than one billion (10⁹) operations per second. Therefore, our application will take at least 10¹²/10⁹ = 1000 seconds, or around 16 minutes and 40 seconds. This might not seem like a very long time but, would you wait sixteen minutes for a web page to load?

Bigger data sets:

The problem gets worse when we consider a company like Google, that indexes 30 to 50 billion web pages. Using the above calculations, any kind of query would take 30 to 50 seconds. Assuming we have used google before, we already know that not to be the case; web searches complete in much less time than 30-50 seconds, and they do much more complicated queries than just asking if a particular page is in their list of indexed pages. At the time of writing, Google processes over 99,000 searches every single second, meaning that they would require at least 99000*30=2,970,000 very fast servers to keep up with demand.

The solution:

These (quite basic) exapmles show us that the most obvious implementations of data structures are not very scalable, especially when the number of items, n, in the data structure and the number of operations, m, performed on the data structure are both large. In these cases, the time is rougly n*m.

The solution is to carefully organise data within the data structure so that not every operation requires every item to be inspected. As impossible as it sounds, we will study data structures where a search requires only looking at two items on average, independent of the number of items stored in the data structure. In our special billion instruction per second computer it would take only 0.000000002 seconds to search in a data structure containing a billion items (or a quintillion for that matter).

We will also study implementations of data structures that keep items in sorted order, where the number of items inspected during an operation grows very slowly as a function of the number of items in the data structure. An example being, we can maintain a sorted set of one billion items while inspecting at most 60 items during any operation. In our special billion instruction per second computer it would take only 0.00000006 seconds per operation.

Why we need efficiency - WilfullMurder/DataStructures-Java GitHub Wiki

Why we need efficiency

Number of operations:

Processor speeds:

Bigger data sets:

The solution:

⚠️ GitHub.com Fallback ⚠️

Why we need efficiency - WilfullMurder/DataStructures-Java GitHub Wiki

Why we need efficiency

Number of operations:

Processor speeds:

Bigger data sets:

The solution:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️