Algorithm Sedgewick - jwyx/ForFun GitHub Wiki
Notation
+: advantage
-: disadvantage / list symbol
Fundamentals
Algorithm
a finite, deterministic, and effective problem-solving method suitable for implementation as a computer program
Online algorithm
即考虑输入,适用于输入对象成比例的空间
[Q] the greatest common divisor => Euclid Algorithm, 辗转相除法
Array
An array is a fixed number of data items that are stored contiguously and that are accessible by an index
+ sequential ordering (by the position in the array)
+ random access
- the size of the array is fixed and must be known before it is used
array <-> vector
two-dimensional array <-> matrix
[Q] prints all the prime numbers less than 1000 => the "sieve of Eratosthenes"
Linked List
+ can grow and shrink in size during its lifetime; its maximum size need not be known in advance.
+ provide flexibility in allowing the items to be rearranged efficiently
+ sequential ordering (a "node" that also contains a "link" to the next node)
- can not quick access to any arbitrary item in the list
operations:
insert
delete
find the Kth item: travel through k links
single linked list
doubly linked list
circular list
最后一个节点,采用如下链接约定:
它是不指向任何节点的空链接NULL
它引用不包含任何项的哑节点Dummy node
它返回来引用第一个节点,使该链表成为循环表
插入和删除正是链表存在的理由
数组要求移动受影响项之后的所有数组内容
但是不适合查找第k个项的运算,也不是适合查找已知项之前的项
[Q] Josephus problem:
We imagine that N people have decided to commit mass suicide by arranging themselves in a circle
and killing the Mth person around the circle, closing ranks as each person drops out of the circle.
The problem is to find out which person is the last to die (through perhaps that person would have
a change of heart at the end), or to find the order in which the people are executed.
可以使用parallel arrays实现linked list: key[], next[] 和 free[] (garbage collection)
Stack: LIFO
array / linked list
top
push(): insert it at the beginning
pop(): remove it from the beginning
[Q] arithmetic expression:
the ideal mechanism for saving intermediate results in such a calculation
the order of calculation require that the operands appear before the operator so that they can be on the stack
when the operator is encountered. [reverse Polish notation/ postfix]
- infix: the customary way of writing arithmetic expressions.
- postfix: parentheses are not required
[Q] convert a legal fully parenthesized infix expression into a postfix expression
Queue: FIFO
array / linked list
head
tail
enqueue()
dequeue()
deque: double-ended queue
Abstract Data Type
describe algorithm and data structure in terms of operations performed, rather than in terms of details of implementations
separate the concept of what the data structure should do from any particular implementation
one-dimensional:
- linear list
array, linked list
operations: insert, delete and access on a basic underlying structure of sequentially ordered items
- stack
- queue
two-dimensional:
- tree: a non-empty collection of vertices and edges that satisfies certain requirements.
vertex/node
edge: a connection between two vertices
path: a list of distinct vertices in which successive vertices are connected by edges
root:
parent:
children:
sibling:
leaf / terminal node/ external node: node with no children
nonterminal node / internal node: node with at least one child
subtree: any node is the root of a subtree consisting of it and the nodes below it
forest: a set of trees
levels: the level of a node is the numbers of nodes on the path from node to the root (not including itself)
height: the maximum level among all nodes in the subtree rooted at this node
path length: the sum of the levels of all the nodes in the tree (or the sum of the lengths of
the paths from each node to the root)
internal path length
external path length
ordered tree: one in which the order of the children at every node is specified
multiway tree: each node must have a specific number of children appearing in a specific order
binary tree: the simplest type of multiway tree; an ordered tree consisting of two types of
nodes: external nodes with no children and internal nodes with exactly two children;
to structure the internal nodes; the external nodes serve only as placeholders. A binary
tree could be "empty", consisting of no internal nodes and one external node.
left child
right child
full binary tree: one in which internal nodes completely fill every level, except the last
complete binary tree: a full binary tree where the internal nodes on the bottom level all
appear to the left of the external nodes on that level
Property:
- There is exactly one path connecting any two nodes in a tree.
least common ancestor: a node that is on the path from both nodes to the root
- A tree with N nodes has N - 1 edges.
- A binary tree with N internal nodes has N + 1 external nodes. ??
- The external path length of any binary tree with N internal nodes is 2N greater than the internal path length. ??
- The height of a full binary tree with N internal nodes is about lgN.
1. Go down the tree:
binary tree:
node { node *left; node *right; }
handle the children of each node without preallocating a specific number for any node
each node contains two links
one to its sibling on the right
the other to its leftmost child
2. Go up the tree:
the array a contains the information associated with each record and the array dad contains the parent links.
the parent of a[i] is in a a[dad[i]]; the root is set to point to itself.
Traversing trees:
preorder: visit the root, visit the left subtree, then visit the right subtree; stack
inorder: visit the left subtree, visit the root, then visit the right subtree; stack
postorder: visit the left subtree, visit the right subtree, then visit the root; stack
level-order: all the nodes on each level appear together, in order; queue
To make the definitions consistent, think of a forest as a tree with an imaginary root
Recursion
A recursive program is one that calls itself and there must be a termination condition when the program can stop
Recurrence relation
[Q] factorial function
[Q] fibonacci numbers: golden ratio; exponential-time, linear time;
Divide-and-Conquer
In general, divide-and-conquer algorithm involve doing some work to split the input into two pieces,
or to merge the results of processing two independent "solved" portions of the input, or to help
things along after half of the input has been processed. That is, there may be code before, after,
or in between the two recursive calls.
Bottom-up vs. Top-down
Recursion => Non-recursion:
traverse(struct node *t) {
if (t != z) {
visit(t);
traverse(t->l);
traverse(t->r);
}
}
End-recursion removal:
The second recursive call can be easily removed because there is no code following it;
whenever the second call is to be executed, traverse is to be called (with the argument t->r);
then, when that call is complete, the current invocation of traverse is also complete.
But this same sequence of events can be implemented with a goto rather than a recursive call:
traverse(struct node *t) {
l: if (t == z) goto x;
visit(t);
traverse(t->l);
t = t->r;
goto l;
x: ;
}
In general, most compilers produce code that goes through the same sequence of actions for any
procedure call: "push the values of local variables and the address of the next instruction on a
stack, set the values of parameters to the procedure and goto the beginning of the procedure." Then,
when a procedure completes, it must "pop the return address and values of local variables from the
stack, reset the variables, and goto the return address."
traverse(struct node *t) {
l: if (t == z) goto s;
visit(t);
push(t); t = t->l; goto l;
r: t = t->r; goto l;
s: if (stackempty()) goto x;
t = pop(); goto r;
x: ;
}
...?? P 80
Analysis of Algorithms
average case
worst case
...??
Implementation of Algorithms
Select an Algorithm
One cannot decide what algorithm to use for a problem without analyzing the needs of the problem
1. first implement the simplest algorithm to solve a given problem
2.
... ??
解决问题的模式
1. 确定完整和特定的问题表述,包括确定该问题本质上的基本抽象运算
2. 对简单算法仔细开发一个简洁的实现
3. 通过逐级细化进行开发以便得到经过改进的实现,通过试验分析,数学分析,或者联合这两种分析方法来验证改进构思的实效
4. 找出数据结构的高级抽象表示,或者运算中具有改进版本的有效高级设计能力的算法
5. 在可能情况下,争取获得最坏情况下的性能保证,但尽可能对于数据保证优良性能
算法分析原理
为了有效使用一个算法,我们都需要理解其性能特征。
理解算法性能的步骤
开展试验分析 (Empirical testing);运行两个算法,看看哪个花的时间长。
挑战
开发一个正确而且完整的实现
判断输入数据以及其他对所进行试验有直接影响的因子的性质
实际数据 (actual data):真实地衡量程序在使用中的开销
随机数据 (random data):确保测试的对象是算法,而不是数据
不合法数据 (perverse data):确保程序可以处理任何提交给它们的输入
如何排除不同实现产生的影响
通过对同一个问题的其他算法进行少量修改派生改进算法;也就是,我们尽量确定本质的抽象运算,而且在算法使用这些运算的基础上开始比较它们。
如何选择算法
不要忽略性能特征,也不要过于注意性能特征。
当本身程序速度已经很快,或调用次数不多,或输入规模不大,或改进程度不大,改进算法或许得不偿失。
数学分析 (Approximate analysis)
做算法数学分析的原因
比较同一任务的不同算法
预测在新环境中的性能
设置算法参数值
数学分析的好处
提供信息更多
付出代价不多
步骤
确定算法所基于的抽象运算,以便分离分析和实现
假定输入是随机的,考虑平均性能;假设输入是构造的,考虑最坏性能
lglgN 看成 常数
Harmonic number
Fibonacci number
Factorial function
使用O记号的目的
当忽略公式中的小项时,限制错误的发生
当忽略对总分析结果有微小贡献的程序部分时,限制错误的发生
允许根据总运行时间的上限将算法分类
recurrence relation
C(N) = C(N-1) + N => C(N) = N(N+1)/2
C(N) = C(N/2) + 1 => C(N) =~ lgN
C(N) = C(N/2) + N => C(N) =~ 2N
C(N) = 2*C(N/2) + N => C(N) =~ NlgN
C(N) = 2*C(N/2) + 1 => C(N) =~ 2N