Example: Optimal BST - rFronteddu/general_wiki GitHub Wiki
Optimal BST
Description
We are given:
- A sequence of
n
sorted keys $k_i
$ in K = $[k_1,k_2,\ldots,k_n]$ with associated probabilities $[p_1,p_2,\ldots,p_n]$ that a search is in K. n+1
dummy keys $[d_0,d_1,\ldots,d_n]$ with associated probabilities $q_i$ that searches are not in K. Where:- $d_0$ represents searches smaller than $k_1$
- $d_n$ represents searches larger than $k_n$
- $d_i$ (for $(1 \leq i \leq n-1)$ represents searches between $k_i$ and $k_{i+1}$
The sum of all probabilities must equal 1:
$$\sum_{i=1}^{n} p_i + \sum_{i=0}^{n} q_i = 1$$
Objective The goal is to construct a BST that minimizes the expected cost of searching for a key or dummy, where the cost is defined by the depth of the node in the tree.
$$E[cost]=\sum_{i=1}^{n} {(depth_{T}(k_i) + 1)*p_i} + \sum_{i=0}^{n} {(depth_{T}(d_i) + 1)*q_i} = \sum_{i=1}^{n} {depth_{T}(k_i)*p_i} + \sum_{i=0}^{n} {depth_{T}(d_i)*q_i} + \sum_{i=1}^{n} {p_i} + \sum_{i=0}^{n} {q_i}$$
From the previous formula, we have that the last part is 1 and we can rewrite the formula as:
$$E[cost]= \sum_{i=1}^{n} {(depth_{T}(k_i)*p_i)} + \sum_{i=0}^{n} {(depth_{T}(d_i)*q_i)} + 1$$
The 1 accounts for the additive constant contributed by all nodes.
The challenge is to find a tree configuration that minimizes the expected cost.
Dynamic Programming Approach We use dynamic programming to solve the problem efficiently by breaking it down into smaller subproblems:
- For any subtree T with keys in the range $[k_i,\ldots,k_j]$. The root of the subtree must be one of these keys, and we aim to select the root that minimizes the combined cost of the left and right subtrees.
- Define e(i, j) as the expected cost of the optimal subtree for keys in the range $[k_i,\ldots,k_j]$, and w(i,j) as the total probability of searching within this range.
Key Recurrence Relations
- If i > j, the tree contains only a dummy key, and thus e(i,j) = $q_{i-1}$.
- Otherwise, the expected cost for subtree $T(k_i,\ldots,k_j)$ is computed by minimizing the cost over all possible root choices r:
$$e(i, j) = \min_{i \leq r \leq j} \left( e(i, r-1) + e(r+1, j) + w(i, j) \right)$$
Where:
- r is the root of the subtree
- e(i, r-1) is the cost of the left subtree
- e(r+1, j) is the cost of the right subtree
- w(i,j) is the weight (total probability) of searching within the range
- $$w(i,j)=\sum_{l=i}^{j} p_l + \sum_{l=i-1}^{j} q_l = w(i, r-1) + p_r + q_r + w(r+1, j)$$
Algorithm
// p[1..n]
// q[0..n]
OptimalBST(p, q, n)
let e[1..n+1,0..n], root[1..n,1..n], and w[1..n+1,0..n] be tables
for i = 1 to n + 1
w[i,i-1] = q[i-1]
e[i,i-1] = q[i-1]
for l = 1 to n
for i = 1 to n - l + 1
j = i + l - 1
e[i,j] = max
w[i,j] = w[i,j-1] + q[j] + p[j]
for k = i to j
t = e[i,k-1] + w[i, j] + e[k+1,j]
if t < e[i,j]
e[i,j] = t
root[i,j] = k
return e, root
To print the result:
buildTree(root, keys, start, end)
if start > end
return null
index = root[start, end]
Node n = new Node(keys[index]);
n.left = buildTree(root, keys, start, index - 1)
n.right = buildTree(root, keys, index + 1, end)
return n
We can then inorder print as follows:
printT(root)
if(root == null) return
printT(root.left)
print(root.val)
printT(root.right)