Example: Optimal BST - rFronteddu/general_wiki GitHub Wiki

Optimal BST

Description

We are given:

  • A sequence of n sorted keys $k_i$ in K = $[k_1,k_2,\ldots,k_n]$ with associated probabilities $[p_1,p_2,\ldots,p_n]$ that a search is in K.
  • n+1 dummy keys $[d_0,d_1,\ldots,d_n]$ with associated probabilities $q_i$ that searches are not in K. Where:
  • $d_0$ represents searches smaller than $k_1$
  • $d_n$ represents searches larger than $k_n$
  • $d_i$ (for $(1 \leq i \leq n-1)$ represents searches between $k_i$ and $k_{i+1}$

The sum of all probabilities must equal 1:

$$\sum_{i=1}^{n} p_i + \sum_{i=0}^{n} q_i = 1$$

Objective The goal is to construct a BST that minimizes the expected cost of searching for a key or dummy, where the cost is defined by the depth of the node in the tree.

$$E[cost]=\sum_{i=1}^{n} {(depth_{T}(k_i) + 1)*p_i} + \sum_{i=0}^{n} {(depth_{T}(d_i) + 1)*q_i} = \sum_{i=1}^{n} {depth_{T}(k_i)*p_i} + \sum_{i=0}^{n} {depth_{T}(d_i)*q_i} + \sum_{i=1}^{n} {p_i} + \sum_{i=0}^{n} {q_i}$$

From the previous formula, we have that the last part is 1 and we can rewrite the formula as:

$$E[cost]= \sum_{i=1}^{n} {(depth_{T}(k_i)*p_i)} + \sum_{i=0}^{n} {(depth_{T}(d_i)*q_i)} + 1$$

The 1 accounts for the additive constant contributed by all nodes.

The challenge is to find a tree configuration that minimizes the expected cost.

Dynamic Programming Approach We use dynamic programming to solve the problem efficiently by breaking it down into smaller subproblems:

  1. For any subtree T with keys in the range $[k_i,\ldots,k_j]$. The root of the subtree must be one of these keys, and we aim to select the root that minimizes the combined cost of the left and right subtrees.
  2. Define e(i, j) as the expected cost of the optimal subtree for keys in the range $[k_i,\ldots,k_j]$, and w(i,j) as the total probability of searching within this range.

Key Recurrence Relations

  • If i > j, the tree contains only a dummy key, and thus e(i,j) = $q_{i-1}$.
  • Otherwise, the expected cost for subtree $T(k_i,\ldots,k_j)$ is computed by minimizing the cost over all possible root choices r:

$$e(i, j) = \min_{i \leq r \leq j} \left( e(i, r-1) + e(r+1, j) + w(i, j) \right)$$

Where:

  • r is the root of the subtree
  • e(i, r-1) is the cost of the left subtree
  • e(r+1, j) is the cost of the right subtree
  • w(i,j) is the weight (total probability) of searching within the range
  • $$w(i,j)=\sum_{l=i}^{j} p_l + \sum_{l=i-1}^{j} q_l = w(i, r-1) + p_r + q_r + w(r+1, j)$$

Algorithm

// p[1..n]
// q[0..n]
OptimalBST(p, q, n)
    let e[1..n+1,0..n], root[1..n,1..n], and w[1..n+1,0..n] be tables
    for i = 1 to n + 1
        w[i,i-1] = q[i-1]
        e[i,i-1] = q[i-1]
    for l = 1 to n
        for i = 1 to n - l + 1
            j = i + l - 1
            e[i,j] = max
            w[i,j] = w[i,j-1] + q[j] + p[j]
            for k = i to j
                t = e[i,k-1] + w[i, j] + e[k+1,j]
                if t < e[i,j]
                    e[i,j] = t
                    root[i,j] = k
    return e, root

To print the result:

buildTree(root, keys, start, end)
    if start > end
        return null
    index = root[start, end]
    Node n = new Node(keys[index]);
    n.left = buildTree(root, keys, start, index - 1)
    n.right = buildTree(root, keys, index + 1, end)
    return n

We can then inorder print as follows:

printT(root)
    if(root == null) return
    printT(root.left)
    print(root.val)
    printT(root.right)