Algorithms - djdupaix/software_engineer GitHub Wiki
The Art of Al-Khwārizmī and Algorithmic Thinking
The term describes a self-contained step-by-step set of operations to be performed and comes from a Persian mathematician and scholar in the House of Wisdom in Baghdad. In the 12th century, Latin translations of his work on the Indian numerals introduced the decimal positional number system to the Western world. Al-Khwārizmī's "The Compendious Book on Calculation by Completion and Balancing" presented the first systematic solution of linear and quadratic equations in Arabic.
Algorithmic thinking is a way of getting to a solution through the clear definition of the steps needed. Rather than coming up with a single answer to a problem, algorithms are instructions or rules that if followed precisely (whether by a person or a computer) leads to answers to both the original and similar problems. The power of algorithmic thinking is that it allows solutions to be automated.
Big O notation is used to describe the performance or complexity of an algorithm. Big O specifically describes the worst-case scenario, and can be used to describe the execution time required or the space used (e.g. in memory or on disk) by an algorithm.
Peak Finding
Models of Computation
Python Cost Model
Python is a high-level programming language, with many powerful primitives. Analyzing the running time of a Python program requires an understanding of the cost of the various Python primitives.
For example, in Python, you can write:
L = L1 + L2
where L, L1, and L2 are lists; the given statement computes L as the concatenation of the two input lists L1 and L2. The running time of this statement will depend on the lengths of lists L1 and L2. (The running time is more-or-less proportional to the sum of those two lengths.)
The goal is to review various Python primitive operations and to determine bounds and/or estimates on their running times. This will involve both a review of the relevant Python implementation code and some experimentation (analysis of actual running times and interpolating a nice curve through the resulting data points).
Python Running Time Experiments and Discussion
The running times for various-sized inputs were measured, and then a least-squares fit was used to find the coefficient for the high-order term in the running time. (Which term is high-order was determined by some experimentation; it could have been automated...)
The least-squares fit was designed to minimize the sum of squares of relative error, using scipy.optimize.leastsq.
(Earlier version of this program worked with more than the high-order term; they also found coefficients for lower-order terms. But the interpolation routines tended to be poor at distinguishing n and n lg n. Also, it was judged to be more interesting to work with relative error than with absolute error. Finally, it seemed that looking at only the high-order term, and studying only the relative error, seemed simplest.)
The following tables analyze python operation costs and were gathered using timing.py by Ronald Rivest:
Cost of Python Integer Operations
x, y, and z are n-bit numbers, w is an 2n-bit number, s is an n-digit string
| Description | Operation | Time | Number | Error |
|---|---|---|---|---|
| Convert string to integer | int(s) | 84 * (n/1000)^2 microseconds | n <= 8000 | 6% rms error |
| Convert integer to string | str(x) | 75 * (n/1000)^2 microseconds | n <= 8000 | 3% rms error |
| Convert integer to hex | "%x"%x | 2.7 * (n/1000) microseconds | n <= 64000 | 19% rms error |
| Addition (or Subtraction) | x+y | 0.75 * (n/1000) microseconds | n <= 512000 | 8% rms error |
| Multiplication | x*y | 13.73 * (n/1000)^1.585 microseconds | n <= 64000 | 10% rms error |
| Division (or Remainder) | w/x | 47 * (n/1000)^2 microseconds | n <= 32000 | 6% rms error |
| Modular Exponentiation | pow(x,y,z) | 60000 * (n/1000)^3 microseconds | n <= 4000 | 8% rms error |
| n-th power of two | 2**n | 0.06 microseconds | n <= 512000 | 10% rms error |
Cost of Python String Operations
s and t are length-n strings, u is length (n/2)
| Description | Operation | Time | Number | Error |
|---|---|---|---|---|
| Extract a byte from a string | s[i] | 0.13 microseconds | n <= 512000 | 29% rms error |
| Concatenate | s+t | 1 * (n/1000) microseconds | n <= 256000 | 19% rms error |
| Extract string of length n/2 | s[0:n/2] | 0.3 * (n/1000) microseconds | n <= 256000 | 28% rms error |
| Translate a string | s.translate(s,T) | 3.2 * (n/1000) microseconds | n <= 256000 | 11% rms error |
Cost of Python List Operations
L and M are length-n lists, P has length n/2
| Description | Operation | Time | Number | Error |
|---|---|---|---|---|
| Create an empty list | list() | 0.40 microseconds | (n=1) | .5% rms error |
| Access | L[i] | 0.10 microseconds | n <= 640000 | 3% rms error |
| Append | L.append(0) | 0.24 microseconds | n <= 640000 | 3% rms error |
| Pop | L.pop() | 0.25 microseconds | n <= 64000 | 0.5% rms error |
| Concatenate | L+M | 22 * (n/1000) microseconds | n <= 64000 | 3% rms error |
| Slice extraction | L[0:n/2] | 5.4 * (n/1000) microseconds | n <= 64000 | 4% rms error |
| Copy | L[:] | 11.5 * (n/1000) microseconds | n <= 64000 | 10% rms error |
| Slice assignment | L[0:n/2] = P | 11 * (n/1000) microseconds | n <= 64000 | 4% rms error |
| Delete first | del L[0] | 1.7 * (n/1000) microseconds | n <= 64000 | 4% rms error |
| Reverse | L.reverse() | 1.3 * (n/1000) microseconds | n <= 64000 | 10% rms error |
| Sort | L.sort() | 0.0039 * n lg(n) microseconds | n <= 64000 | 12% rms error |
The first time one appends to a list, there is additional cost as the list is copied over and extra space, about 1/8 of the list size, is added to the end. Whenever the extra space is used up, the list is re-allocated into a new array with about 1.125 the length of the previous version.
Cost of Python Dictionary Operations
D is a dictionary with n items
| Description | Operation | Time | Number | Error |
|---|---|---|---|---|
| Create an empty dictionary | dict() | 0.36 microseconds | (n=1) | 0% rms error |
| Access | D[i] | 0.12 microseconds | n <= 64000 | 3% rms error |
| Copy | D.copy() | 57 * (n/1000) microseconds | n <= 64000 | 27% rms error |
| List items | D.items() | 0.0096 * n lg(n) microseconds | n <= 64000 | 14% rms error |
What should the right high-order term be for copy and list items? It seems these should be linear, but the data for both looks somewhat super-linear. We've modeled copy here as linear and list items as n lg(n), but these formulas need further work and exploration.
Document distance
===
Algorithms Subtopics:
Sorting and Trees
Hashing
Numerics
Graphs
Shortest Paths
Dynamic Programming
===
Algorithm References
MIT - Introduction to Algorithms