Longest Common Subsequence (LCS) Algorithm
The Longest Common Subsequence (LCS) problem is a classic problem in dynamic programming. Given two sequences, the goal is to find the length of the longest subsequence that appears in both sequences in the same relative order, but not necessarily consecutively.
Problem Definition
Given two sequences X
and Y
, find the longest subsequence that appears in both sequences. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements.
Example
Let’s consider two sequences:
X = "ABCBDAB"
Y = "BDCAB"
The longest common subsequence between X
and Y
is "BCAB"
, with a length of 4.
Dynamic Programming Approach
To solve the LCS problem, we use dynamic programming. The idea is to break down the problem into smaller subproblems and build the solution bottom-up.
Steps
-
Define a 2D table
dp
wheredp[i][j]
represents the length of the LCS between the firsti
characters ofX
and the firstj
characters ofY
. -
Recurrence relation:
- If the characters match (
X[i-1] == Y[j-1]
), then:dp[i][j] = dp[i-1][j-1] + 1
- If the characters do not match:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
- If the characters match (
-
Base case:
- If either
i
orj
is 0, thendp[i][j] = 0
because an empty sequence has an LCS length of 0 with any sequence.
- If either
-
Fill the table following the recurrence relation, and the value of
dp[m][n]
will give the length of the LCS, wherem
andn
are the lengths of the sequencesX
andY
, respectively.
Algorithm in Python
def lcs(X, Y):
m = len(X)
n = len(Y)
# Create a 2D dp array to store lengths of LCS.
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Build dp array from bottom up
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i - 1] == Y[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
# The value of dp[m][n] is the length of LCS
return dp[m][n]
# Example usage:
X = "ABCBDAB"
Y = "BDCAB"
print(f"The length of LCS is {lcs(X, Y)}")
Time Complexity
The time complexity of the dynamic programming approach is O(m * n), where m
and n
are the lengths of the two input sequences X
and Y
, respectively. This is because we are filling an m x n
table.
Space Complexity
The space complexity is O(m * n) as we are using a 2D table to store the LCS lengths for all combinations of substrings.
Optimizing Space Complexity
We can optimize the space complexity to O(n) by only storing the previous and current rows of the DP table instead of the entire table.
Conclusion
The LCS problem is a well-known dynamic programming problem that can be solved efficiently using the table-based approach. Understanding how to break the problem into smaller subproblems and fill the table step by step is the key to mastering this algorithm.