Example: Longest Common Subsequence - rFronteddu/general_wiki GitHub Wiki
-
Given a sequence X={$
x_1
$,...,$x_m
$} we say that Z={$z_1
$,...,$z_k
$} is a subsequence of X if there is a strictly increasing sequence {$i_1
$,...,$i_k
$} of indices of X such that $X_{i_j}
$ = $Z_j
$ for each j=1...k. -
Given two sequences X and Y={$
y_1
$,...,$y_n
$}, Z is a Common Subsequence if it is a subsequence of both. -
In the Longest Common Subsequence problem we are given X and Y and are tasked to find the Longest Common Subsequence.
-
To formulate the problem, we observe that if Z is the LCS of X and Y then:
- If $
x_m
$ == $y_n
$ then $z_k
$ == $x_m
$ == $y_n
$ and $Z_{k-1}
$ is the LCS of $X_{m-1}
$ and $Y_{n-1}
$.
- If $
The proof is divided in two parts.
-
If $
z_k
$ was different than $x_m
$ then we could append $x_m
$ to Z obtaining a LCS longer than Z, but that is a contradiction because Z is the LCS. -
If $
Z_{k-1}
$ was not the LCS of $X_{m}
$ and $Y_{n-1}
$, then we could find a common subsequence longer than k-1, but the length of the LCS Z was k so that is not possible.- If $
x_{m}
$ != $y_{n}
$ and $x_{m}
$ != $z_{k}
$, then Z is LCS of $X_{m-1}
$ and $Y_{n}
$ - Symmetrically, If $
x_{m}
$ != $y_{n}
$ and $y_{n}
$ != $z_{k}
$, then Z is LCS of $X_{m}
$ and $Y_{n-1}
$
- If $
To prove this, we can observe that if Z doesn't contain $x_{m}
$ but is a LCS of X and Y, then Z must be a LCS of $X_{m-1}
$ and $Y_{n}
$ since there can't be a longer sequence without $x_{m}
$.
-
From these three points we see that we must examine one of two problems:
- If of $
x_{m}
$ != $y_{n}
$ we must find the LCS of $X_{m-1}
$ and $Y_{n-1}
$, appending $x_{m}
$ to that yields the LCS of X and Y - If $
x_{m}
$ != $y_{n}
$ we must pick the longest LCS in $X_{m}
$ $Y_{n-1}
$ and $X_{m-1}
$ $Y_{n}
$
- If of $
-
This formulation shows that we can get the final solution by using solutions of smaller problems.
-
Let c[i,j] be the length of the LCS of $
X_{i}
$ and $Y_{j}
$:- if i == 0 or j == 0 then one of the two sequences is empty thus the len of the LCS is 0
- if i,j>0 and $
x_{i}
$ == $y_{j}
$ then c[i, j] = c[i-1,j-1] + 1 - if i,j>0 and $
x_{i}
$ != $y_{j}
$ then c[i, j] = Max(c[i-1,j], c[i,j-1]
Template:
// X = 1..m
// Y = 1..n
LCS(X,Y)
m = X.len
n = Y.len
c[0..m, 0..n]
b[1..m, 1..n] // we don't really need this
for i = 1 to m
for j = 1 to n
if X[i] == Y[j]
c[i,j] = 1 + c[i-1, j-1]
// b[i, j] "up-left"
else if c[i-1, j] >= c[i, j-1]
c[i, j] = c[i-1,j]
// b[i, j] "up"
else
c[i, j] = c[i,j-1]
// b[i, j] "left"
PrintLCS_1(b, X, i, j)
if b[i,j] == up-left
PrintLCS_1(b, X, i-1, j-1)
print X[i]
else if b[i,j] == "up"
PrintLCS_1(b, X, i-1, j)
else
PrintLCS_1(b, X, i, j-1)
PrintLCS_2(c, X, i, j)
if i == 0 || j == 0 return
if c[i, j] >= c[i-1,j] && c[i, j] >= c[i,j-1]
// up left
PrintLCS_2(c, X, i-1, j-1)
print X[i]
else if c[i-1,j] >= c[i,j-1]
// up
PrintLCS_2(c, X, i-1, j)
else
// left
PrintLCS_2(c, X, i, j-1)