Problem Informations

Problem Index: 1055
Problem Link: Shortest Way to Form String
Topics: Two Pointers, String, Binary Search, Greedy

Problem Description

A subsequence of a string is a new string that is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (i.e., “ace” is a subsequence of “abcde” while “aec” is not).

Given two strings source and target, return the minimum number of subsequences of source such that their concatenation equals target. If the task is impossible, return -1.

Example 1:

Input: source = “abc”, target = “abcbc”
Output: 2
Explanation: The target “abcbc” can be formed by “abc” and “bc”, which are subsequences of source “abc”.

Example 2:

Input: source = “abc”, target = “acdbc”
Output: -1
Explanation: The target string cannot be constructed from the subsequences of source string due to the character “d” in target string.

Example 3:

Input: source = “xyz”, target = “xzyxz”
Output: 3
Explanation: The target string can be constructed as follows “xz” + “y” + “xz”.

Constraints:

$1 \leq \text{source.length}, \text{target.length} \leq 1000$
source and target consist of lowercase English letters.

Intuition

The problem involves forming the target string by concatenating subsequences of the source string. A subsequence can be formed by deleting characters, while preserving the order of the remaining characters. The crux of the problem is to determine how many such subsequences are required at a minimum. If any character in the target does not exist in the source, it is impossible to form the target string, and the solution should return -1. Otherwise, the task is to navigate through both strings and count how many complete traversals or partial traversals of the source are needed to match the target.

Approach

The solution leverages a greedy and two-pointer approach to solve the problem efficiently.

Set Initialization: First, store each unique character from the source string into a set. This helps in quickly determining if a character in the target is absent from the source.
Immediate Impossibility Check: Before proceeding, verify if every character in the target is present in the source. If the set does not contain any character from the target, return -1 as it’s impossible to form such a character with any subsequences of source.
Two-Pointer Technique: Utilize two pointers: sourceIdx, which traverses the source, and targetIdx, which traverses the target.
- Begin by initializing subsequencesCount to 1, signifying that at least one subsequence will be needed.
- Loop through until the targetIdx reaches the length of target.
Traversal Logic:
- Increment the sourceIdx to progress through the source string.
- If sourceIdx surpasses the length of the source, increment the subsequencesCount, indicating the need for a new subsequence, and reset sourceIdx to 0.
Matching Characters: Use another loop within the main loop to locate the next character in the source that matches the current character of the target. If a match is found, advance the targetIdx.
Subsequence Count: Continue advancing through the source and resetting as needed until all characters of target can be formed. The value of subsequencesCount at the end of this traversal will represent the minimal number of subsequences of the source required to form the target.

Overall, this approach efficiently combines scanning and matching tasks under the control of two pointers, while accounting for full resets whenever a traversal of the source is exhausted, ensuring optimal subsequences enumeration.

Code

C++

class Solution {
public:
    int shortestWay(string source, string target) {
        // Create a set to hold unique characters from the `source` string.
        set<char> sourceSet;

        // Insert each character from `source` into the set for quick lookup.
        for (char c : source) {
            sourceSet.insert(c);
        }

        // Check if every character in `target` exists in `source`. If not, return -1.
        for (char c : target) {
            if (sourceSet.count(c) == 0) {
                return -1;
            }
        }

        // Initialize the number of subsequences needed.
        int subsequencesCount = 1;
        int sourceIdx = -1;  // Index in the `source` string.
        int targetIdx = 0;   // Index in the `target` string.

        // Traverse the `target` string.
        while (targetIdx < target.size()) {
            sourceIdx++;
            
            // If `sourceIdx` exceeds the length of `source`, we need an additional subsequence.
            if (sourceIdx == source.size()) {
                subsequencesCount++;
                sourceIdx = 0;
            }

            // Move through `source` until we find a matching character for `target[targetIdx]`.
            while (source[sourceIdx] != target[targetIdx]) {
                sourceIdx++;
                if (sourceIdx == source.size()) {
                    subsequencesCount++;
                    sourceIdx = 0;
                }
            }
            
            // Move to the next character in `target`.
            targetIdx++;
        }

        // Return the total number of subsequences needed.
        return subsequencesCount;
    }
};

Python

class Solution:
    def shortestWay(self, source: str, target: str) -> int:
        # Create a set to hold unique characters from the `source` string.
        sourceSet = set()

        # Insert each character from `source` into the set for quick lookup.
        for c in source:
            sourceSet.add(c)

        # Check if every character in `target` exists in `source`. If not, return -1.
        for c in target:
            if c not in sourceSet:
                return -1

        # Initialize the number of subsequences needed.
        subsequencesCount = 1
        sourceIdx = -1  # Index in the `source` string.
        targetIdx = 0   # Index in the `target` string.

        # Traverse the `target` string.
        while targetIdx < len(target):
            sourceIdx += 1
            
            # If `sourceIdx` exceeds the length of `source`, we need an additional subsequence.
            if sourceIdx == len(source):
                subsequencesCount += 1
                sourceIdx = 0

            # Move through `source` until we find a matching character for `target[targetIdx]`.
            while source[sourceIdx] != target[targetIdx]:
                sourceIdx += 1
                if sourceIdx == len(source):
                    subsequencesCount += 1
                    sourceIdx = 0
            
            # Move to the next character in `target`.
            targetIdx += 1

        # Return the total number of subsequences needed.
        return subsequencesCount

Complexity

Time complexity: The time complexity is $O(n \times m)$, where $n$ is the length of the source string and $m$ is the length of the target string. This is because for each character in the target, the algorithm may need to traverse the entire source string to find a matching character.
Space complexity: The space complexity is $O(n)$, where $n$ is the length of the source string. This is due to the space used by the set sourceSet to store unique characters from the source string. The set can contain at most all characters of the source.

CS Note
中

Leetcode 1055. Shortest Way to Form String