Leetcode 1092. Shortest Common Supersequence

#String #Dynamic Programming

Table of Contents

Problem Informations

Problem Description

Given two strings str1 and str2, return the shortest string that has both str1 and str2 as subsequences. If there are multiple valid strings, return any of them.

A string s is a subsequence of string t if deleting some number of characters from t (possibly 0) results in the string s.

Example 1:

Input: str1 = "abac", str2 = "cab"
Output: "cabac"
Explanation: 
str1 = "abac" is a subsequence of "cabac" because we can delete the first "c".
str2 = "cab" is a subsequence of "cabac" because we can delete the last "ac".
The answer provided is the shortest such string that satisfies these properties.

Example 2:

Input: str1 = "aaaaaaaa", str2 = "aaaaaaaa"
Output: "aaaaaaaa"

Constraints:

  • $1 \leq \text{str1.length}, \text{str2.length} \leq 1000$
  • str1 and str2 consist of lowercase English letters.

Intuition

The problem at hand is to find the shortest string that contains two given strings as subsequences. Recognizing that subsequences do not require consecutive characters but rather a preservation of relative order, this problem bears similarity to finding a common ancestor in terms of sequence alignment. It shares conceptual grounding with the concept of a shortest common supersequence (SCS), which is closely associated with the longest common subsequence (LCS). By determining the LCS, we can strategically interweave the differing characters from both strings around this sequence to achieve the desired SCS length.

Approach

The approach to solving this problem involves dynamic programming to build a table where each entry at index $(i, j)$ represents the length of the shortest common supersequence for the substrings of str1 and str2 up to indices $i$ and $j$, respectively.

  1. Initialization: Create a 2D list dp of dimensions $(\text{len1} + 1) \times (\text{len2} + 1)$, where len1 and len2 are the lengths of str1 and str2. Initialize the base cases where one string is empty:

    • When one string is empty, the SCS is simply the length of the other string. Therefore, dp[i][0] = i and dp[0][j] = j.
  2. DP Table Construction:

    • Iterate over each character index of str1 and str2.
    • If characters at the current positions in both strings match, dp[i][j] = dp[i-1][j-1] + 1 because matching characters can contribute directly to the SCS.
    • If characters do not match, dp[i][j] = \min(dp[i-1][j], dp[i][j-1]) + 1 because the optimal SCS would include the character requiring less deletions either from str1 or str2.
  3. Reconstruction of the Shortest Common Supersequence:

    • Starting from dp[len1][len2], backtrack through dp to construct the SCS.
    • If characters match, include this character and move diagonally.
    • If dp[i][j-1] leads to the current cell, include the character from str2.
    • If dp[i-1][j] leads to the current cell, include the character from str1.
    • Continue until one of the strings is fully traversed, at which point append the remaining characters of the other string.

By following this procedure, the shortest common supersequence is constructed efficiently, capturing both input strings as subsequences.

Code

C++

class Solution {
public:
    string shortestCommonSupersequence(string str1, string str2) {
        int len1 = str1.size();
        int len2 = str2.size();
        vector<vector<int>> dp(len1 + 1, vector<int>(len2 + 1));
        
        // Initialize the base cases: the length of supersequence
        // if one of the strings is empty.
        for (int i = 0; i <= len1; i++) dp[i][0] = i;
        for (int j = 0; j <= len2; j++) dp[0][j] = j;
        
        // Build the dp table, where dp[i][j] represents
        // the length of the shortest common supersequence for
        // substrings str1[0...i-1] and str2[0...j-1].
        for (int i = 1; i <= len1; i++) {
            for (int j = 1; j <= len2; j++) {
                if (str1[i - 1] == str2[j - 1]) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                } else {
                    dp[i][j] = min(dp[i - 1][j], dp[i][j - 1]) + 1;
                }
            }
        }

        // Reconstruct the shortest common supersequence using the dp table.
        int curIndex1 = len1;
        int curIndex2 = len2;
        string result = "";

        while (curIndex1 != 0 || curIndex2 != 0) {
            if (curIndex2 == 0) {
                result = str1[--curIndex1] + result;
            } else if (curIndex1 == 0) {
                result = str2[--curIndex2] + result;
            } else if (dp[curIndex1 - 1][curIndex2 - 1] == dp[curIndex1][curIndex2] - 1 && str1[curIndex1 - 1] == str2[curIndex2 - 1]) {
                result = str1[--curIndex1] + result;
                --curIndex2;
            } else if (dp[curIndex1][curIndex2 - 1] == dp[curIndex1][curIndex2] - 1) {
                result = str2[--curIndex2] + result;
            } else {
                result = str1[--curIndex1] + result;
            }
        }

        return result;
    }
};

Python

class Solution:
    def shortestCommonSupersequence(self, str1, str2):
        len1 = len(str1)
        len2 = len(str2)
        dp = [[0] * (len2 + 1) for _ in range(len1 + 1)]
        
        # Initialize the base cases: the length of supersequence
        # if one of the strings is empty.
        for i in range(len1 + 1):
            dp[i][0] = i
        for j in range(len2 + 1):
            dp[0][j] = j
        
        # Build the dp table, where dp[i][j] represents
        # the length of the shortest common supersequence for
        # substrings str1[0...i-1] and str2[0...j-1].
        for i in range(1, len1 + 1):
            for j in range(1, len2 + 1):
                if str1[i - 1] == str2[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1] + 1
                else:
                    dp[i][j] = min(dp[i - 1][j], dp[i][j - 1]) + 1

        # Reconstruct the shortest common supersequence using the dp table.
        curIndex1 = len1
        curIndex2 = len2
        result = ""

        while curIndex1 != 0 or curIndex2 != 0:
            if curIndex2 == 0:
                result = str1[curIndex1 - 1] + result
                curIndex1 -= 1
            elif curIndex1 == 0:
                result = str2[curIndex2 - 1] + result
                curIndex2 -= 1
            elif dp[curIndex1 - 1][curIndex2 - 1] == dp[curIndex1][curIndex2] - 1 and str1[curIndex1 - 1] == str2[curIndex2 - 1]:
                result = str1[curIndex1 - 1] + result
                curIndex1 -= 1
                curIndex2 -= 1
            elif dp[curIndex1][curIndex2 - 1] == dp[curIndex1][curIndex2] - 1:
                result = str2[curIndex2 - 1] + result
                curIndex2 -= 1
            else:
                result = str1[curIndex1 - 1] + result
                curIndex1 -= 1

        return result

Complexity

  • Time complexity: The time complexity is $O(\text{len1} \times \text{len2})$, where $\text{len1}$ and $\text{len2}$ are the lengths of str1 and str2, respectively. The solution constructs a dynamic programming table (dp) which has dimensions $(\text{len1} + 1) \times (\text{len2} + 1)$. Both the building of the table and the reconstruction process of the shortest common supersequence take $O(\text{len1} \times \text{len2})$ time.

  • Space complexity: The space complexity is $O(\text{len1} \times \text{len2})$. This space is used to maintain the dp table to store the lengths of shortest common supersequences for all possible substrings of str1 and str2.

Disclaimer: All reference materials on this website are sourced from the internet and are intended for learning purposes only. If you believe any content infringes upon your rights, please contact me at csnote.cc@gmail.com, and I will remove the relevant content promptly.


Feedback Welcome: If you notice any errors or areas for improvement in the articles, I warmly welcome your feedback and corrections. Your input will help this blog provide better learning resources. This is an ongoing process of learning and improvement, and your suggestions are valuable to me. You can reach me at csnote.cc@gmail.com.