Expected Cost Of a BST - rFronteddu/general_wiki GitHub Wiki

The expected cost of searching in a Binary Search Tree (BST) is determined by how deep the searched node (whether a key or a dummy) is in the tree. The deeper the node, the more comparisons are required to find it, increasing the cost of the search.

To compute the expected cost, we take into account two things:

Keys $k_i$: The keys are the actual values we're searching for in the BST. For each key $k_i$, we have:

depth(T($k_i$)): the depth of key $k_i$ in the tree, representing how many nodes are checked before finding $k_i$.
$p_i$: the probability that a search is for key $k_i$.

The total expected cost for searching the key is the sum of the depths of each key $k_i$ multiplied by the probability of searching for that key $p_i$:

$$\sum_{i=1}^{n} {(depth_{T}(k_i) + 1)*p_i}$$

Dummies $d_i$: Dummies represent searches that do not find any key. For each dummy $d_i$, we have:

depth(T($k_i$)): the depth of dummy $d_i$ in the tree.
$q_i$: the probability that a search is for a value not present in the key set.

The total expected cost for searching the dummies is the sum of the depths of each dummy $d_i$ multiplied by the probability of searching for that dummy $q_i$:

$$\sum_{i=0}^{n} {(depth_{T}(d_i) + 1)*q_i}$$

Full Formula The total expected cost is the sum of the expected costs for both the keys and the dummies, with an additional constant 1 to account for the base cost of accessing the tree (we need to at least check the root).

$$E[cost]=\sum_{i=1}^{n} {(depth_{T}(k_i) + 1)*p_i} + \sum_{i=0}^{n} {(depth_{T}(d_i) + 1)*q_i} = \sum_{i=1}^{n} {depth_{T}(k_i)*p_i} + \sum_{i=0}^{n} {depth_{T}(d_i)*q_i} + \sum_{i=1}^{n} {p_i} + \sum_{i=0}^{n} {q_i}$$