Algorithm Sedgewick - jwyx/ForFun GitHub Wiki

Notation

+: advantage
-: disadvantage / list symbol

Fundamentals

Algorithm
    a finite, deterministic, and effective problem-solving method suitable for implementation as a computer program

Online algorithm
    即考虑输入,适用于输入对象成比例的空间

[Q] the greatest common divisor => Euclid Algorithm, 辗转相除法

Array
    An array is a fixed number of data items that are stored contiguously and that are accessible by an index
    + sequential ordering (by the position in the array)
    + random access
    - the size of the array is fixed and must be known before it is used

    array                 <-> vector
    two-dimensional array <-> matrix

    [Q] prints all the prime numbers less than 1000 => the "sieve of Eratosthenes"

Linked List
    + can grow and shrink in size during its lifetime; its maximum size need not be known in advance.
    + provide flexibility in allowing the items to be rearranged efficiently
    + sequential ordering (a "node" that also contains a "link" to the next node)
    - can not quick access to any arbitrary item in the list

    operations:
        insert
        delete
        find the Kth item: travel through k links
    
    single linked list
    doubly linked list        
    circular list

    最后一个节点,采用如下链接约定:
        它是不指向任何节点的空链接NULL
        它引用不包含任何项的哑节点Dummy node
        它返回来引用第一个节点,使该链表成为循环表
    插入和删除正是链表存在的理由
        数组要求移动受影响项之后的所有数组内容
        但是不适合查找第k个项的运算,也不是适合查找已知项之前的项

    [Q] Josephus problem: 
    We imagine that N people have decided to commit mass suicide by arranging themselves in a circle
    and killing the Mth person around the circle, closing ranks as each person drops out of the circle.
    The problem is to find out which person is the last to die (through perhaps that person would have
    a change of heart at the end), or to find the order in which the people are executed.

    可以使用parallel arrays实现linked list: key[], next[] 和 free[] (garbage collection)

Stack: LIFO
    array / linked list

    top
    push(): insert it at the beginning
    pop():  remove it from the beginning
    
    [Q] arithmetic expression:
    the ideal mechanism for saving intermediate results in such a calculation
    the order of calculation require that the operands appear before the operator so that they can be on the stack
    when the operator is encountered. [reverse Polish notation/ postfix]
    - infix:   the customary way of writing arithmetic expressions.
    - postfix: parentheses are not required

    [Q] convert a legal fully parenthesized infix expression into a postfix expression
    
Queue: FIFO
    array / linked list

    head
    tail
    enqueue()
    dequeue()
    
    deque: double-ended queue

Abstract Data Type
    describe algorithm and data structure in terms of operations performed, rather than in terms of details of implementations
    separate the concept of what the data structure should do from any particular implementation

    one-dimensional:
    - linear list
        array, linked list
        operations: insert, delete and access on a basic underlying structure of sequentially ordered items
    - stack
    - queue

    two-dimensional:
    - tree: a non-empty collection of vertices and edges that satisfies certain requirements.
          vertex/node
          edge: a connection between two vertices
          path: a list of distinct vertices in which successive vertices are connected by edges
          root: 
          parent:
          children:
          sibling:
          leaf / terminal node/ external node: node with no children
          nonterminal node / internal node: node with at least one child
          subtree: any node is the root of a subtree consisting of it and the nodes below it
          forest: a set of trees
          levels: the level of a node is the numbers of nodes on the path from node to the root (not including itself)
          height: the maximum level among all nodes in the subtree rooted at this node
          path length: the sum of the levels of all the nodes in the tree (or the sum of the lengths of 
              the paths from each node to the root)
          internal path length
          external path length
          ordered tree: one in which the order of the children at every node is specified
          multiway tree: each node must have a specific number of children appearing in a specific order
          binary tree: the simplest type of multiway tree; an ordered tree consisting of two types of 
              nodes: external nodes with no children and internal nodes with exactly two children;
              to structure the internal nodes; the external nodes serve only as placeholders. A binary
              tree could be "empty", consisting of no internal nodes and one external node.
          left child
          right child
          full binary tree: one in which internal nodes completely fill every level, except the last
          complete binary tree: a full binary tree where the internal nodes on the bottom level all 
              appear to the left of the external nodes on that level

          Property:
          - There is exactly one path connecting any two nodes in a tree.
            least common ancestor: a node that is on the path from both nodes to the root
          - A tree with N nodes has N - 1 edges.
          - A binary tree with N internal nodes has N + 1 external nodes. ??
          - The external path length of any binary tree with N internal nodes is 2N greater than the internal path length. ??
          - The height of a full binary tree with N internal nodes is about lgN.
          
          1. Go down the tree:
             binary tree:
                 node { node *left; node *right; }
             handle the children of each node without preallocating a specific number for any node
                 each node contains two links
                 one to its sibling on the right
                 the other to its leftmost child
          2. Go up the tree:
             the array a contains the information associated with each record and the array dad contains the parent links.
             the parent of a[i] is in a a[dad[i]]; the root is set to point to itself.
          
          Traversing trees:
              preorder: visit the root, visit the left subtree, then visit the right subtree; stack
              inorder: visit the left subtree, visit the root, then visit the right subtree; stack
              postorder: visit the left subtree, visit the right subtree, then visit the root; stack
              level-order: all the nodes on each level appear together, in order; queue

          To make the definitions consistent, think of a forest as a tree with an imaginary root

Recursion
    A recursive program is one that calls itself and there must be a termination condition when the program can stop
    Recurrence relation
        [Q] factorial function
        [Q] fibonacci numbers: golden ratio; exponential-time, linear time;
    Divide-and-Conquer
        In general, divide-and-conquer algorithm involve doing some work to split the input into two pieces,
        or to merge the results of processing two independent "solved" portions of the input, or to help 
        things along after half of the input has been processed. That is, there may be code before, after, 
        or in between the two recursive calls.
    Bottom-up vs. Top-down

    Recursion => Non-recursion:
        traverse(struct node *t) {
          if (t != z) {
            visit(t);
            traverse(t->l);
            traverse(t->r);
          }
        }

        End-recursion removal:
            The second recursive call can be easily removed because there is no code following it;
            whenever the second call is to be executed, traverse is to be called (with the argument t->r);
            then, when that call is complete, the current invocation of traverse is also complete.
            But this same sequence of events can be implemented with a goto rather than a recursive call:
        traverse(struct node *t) {
          l: if (t == z) goto x;
             visit(t);
             traverse(t->l);
             t = t->r;
             goto l;
          x: ;
        }

        In general, most compilers produce code that goes through the same sequence of actions for any 
        procedure call: "push the values of local variables and the address of the next instruction on a
        stack, set the values of parameters to the procedure and goto the beginning of the procedure." Then,
        when a procedure completes, it must "pop the return address and values of local variables from the
        stack, reset the variables, and goto the return address."

        traverse(struct node *t) {
          l: if (t == z) goto s;
             visit(t);
             push(t); t = t->l; goto l;
          r: t = t->r; goto l;
          s: if (stackempty()) goto x;
             t = pop(); goto r;
          x: ;
        }

        ...?? P 80

Analysis of Algorithms
    average case
    worst case
    ...??
    
Implementation of Algorithms
    Select an Algorithm
        One cannot decide what algorithm to use for a problem without analyzing the needs of the problem
        1. first implement the simplest algorithm to solve a given problem
        2. 
        ... ??

解决问题的模式
    1. 确定完整和特定的问题表述,包括确定该问题本质上的基本抽象运算
    2. 对简单算法仔细开发一个简洁的实现
    3. 通过逐级细化进行开发以便得到经过改进的实现,通过试验分析,数学分析,或者联合这两种分析方法来验证改进构思的实效
    4. 找出数据结构的高级抽象表示,或者运算中具有改进版本的有效高级设计能力的算法
    5. 在可能情况下,争取获得最坏情况下的性能保证,但尽可能对于数据保证优良性能

算法分析原理

为了有效使用一个算法,我们都需要理解其性能特征。

理解算法性能的步骤
    开展试验分析 (Empirical testing);运行两个算法,看看哪个花的时间长。
        挑战
            开发一个正确而且完整的实现
            判断输入数据以及其他对所进行试验有直接影响的因子的性质
                实际数据 (actual data):真实地衡量程序在使用中的开销
                随机数据 (random data):确保测试的对象是算法,而不是数据
                不合法数据 (perverse data):确保程序可以处理任何提交给它们的输入
        如何排除不同实现产生的影响
            通过对同一个问题的其他算法进行少量修改派生改进算法;也就是,我们尽量确定本质的抽象运算,而且在算法使用这些运算的基础上开始比较它们。
        如何选择算法
            不要忽略性能特征,也不要过于注意性能特征。
            当本身程序速度已经很快,或调用次数不多,或输入规模不大,或改进程度不大,改进算法或许得不偿失。
    数学分析 (Approximate analysis)
        做算法数学分析的原因
            比较同一任务的不同算法
            预测在新环境中的性能
            设置算法参数值
        数学分析的好处
            提供信息更多
            付出代价不多
        步骤
            确定算法所基于的抽象运算,以便分离分析和实现
            假定输入是随机的,考虑平均性能;假设输入是构造的,考虑最坏性能
        lglgN 看成 常数
        Harmonic number
        Fibonacci number
        Factorial function

使用O记号的目的
    当忽略公式中的小项时,限制错误的发生
    当忽略对总分析结果有微小贡献的程序部分时,限制错误的发生
    允许根据总运行时间的上限将算法分类
recurrence relation
    C(N) = C(N-1) + N => C(N) = N(N+1)/2
    C(N) = C(N/2) + 1 => C(N) =~ lgN
    C(N) = C(N/2) + N => C(N) =~ 2N
    C(N) = 2*C(N/2) + N => C(N) =~ NlgN
    C(N) = 2*C(N/2) + 1 => C(N) =~ 2N