Problem Informations

Problem Index: 2901
Problem Link: Longest Unequal Adjacent Groups Subsequence II
Topics: Array, String, Dynamic Programming

Problem Description

You are given a string array words, and an array groups, both arrays having length $n$.

The hamming distance between two strings of equal length is the number of positions at which the corresponding characters are different.

You need to select the longest subsequence from an array of indices $[0, 1, …, n - 1]$, such that for the subsequence denoted as $[i_0, i_1, …, i_{k-1}]$ having length $k$, the following holds:

For adjacent indices in the subsequence, their corresponding groups are unequal, i.e., $groups[i_j] \neq groups[i_{j+1}]$, for each $j$ where $0 < j + 1 < k$.
$words[i_j]$ and $words[i_{j+1}]$ are equal in length, and the hamming distance between them is $1$, where $0 < j + 1 < k$, for all indices in the subsequence.

Return a string array containing the words corresponding to the indices (in order) in the selected subsequence. If there are multiple answers, return any of them.

Note: strings in words may be unequal in length.

Example 1:

Input: words = ["bab","dab","cab"], groups = [1,2,2]

Output: ["bab","cab"]

Explanation: A subsequence that can be selected is [0,2].

$groups[0] \neq groups[2]$
$words[0].length == words[2].length$, and the hamming distance between them is 1.

So, a valid answer is [words[0],words[2]] = ["bab","cab"].

Another subsequence that can be selected is [0,1].

$groups[0] \neq groups[1]$
$words[0].length == words[1].length$, and the hamming distance between them is $1$.

So, another valid answer is [words[0],words[1]] = ["bab","dab"].

It can be shown that the length of the longest subsequence of indices that satisfies the conditions is $2$.

Example 2:

Input: words = ["a","b","c","d"], groups = [1,2,3,4]

Output: ["a","b","c","d"]

Explanation: We can select the subsequence [0,1,2,3].

It satisfies both conditions.

Hence, the answer is [words[0],words[1],words[2],words[3]] = ["a","b","c","d"].

It has the longest length among all subsequences of indices that satisfy the conditions.

Hence, it is the only answer.

Constraints:

$1 \leq n == words.length == groups.length \leq 1000$
$1 \leq words[i].length \leq 10$
$1 \leq groups[i] \leq n$
words consists of distinct strings.
$words[i]$ consists of lowercase English letters.

Intuition

The problem requires finding a subsequence of indices from a word array such that each consecutive pair of words in the subsequence satisfy two conditions: each belongs to different groups, and the Hamming distance between the words is exactly 1. This problem can be approached by considering each word as a node in a graph, where an edge exists between nodes if they satisfy the given conditions. The task then becomes finding the longest path in this graph.

Approach

The implemented approach uses a dynamic programming technique to systematically identify valid subsequences that satisfy the given conditions. Here is a detailed breakdown of the approach:

Helper Function - hammingDistanceCheck:
A helper function hammingDistanceCheck is defined to determine whether two words have a Hamming distance of exactly 1. This function checks if the words are of equal length and iterates over each character pair to count differences. If exactly one character differs, it returns true; otherwise, it returns false.
Initialization:
- A DP array dp is initialized, where each element dp[i] represents the longest subsequence ending at index i. Initially, each element is set to 1, since each word alone can be considered a trivial subsequence.
- An array ans is utilized to keep track of the words that make up the subsequences corresponding to each index. Each index is initialized with the word at that position.
Dynamic Programming Transition:
- Iterate through the array of words using two nested loops. For each i from 1 to len-1 and for each j from i-1 to 0:
  - Check if the group indices are different (groups[i] != groups[j]).
  - Verify that adding the current word at index i to the subsequence ending at j increases the length of the subsequence (dp[j] + 1 > dp[i]).
  - Ensure the words at i and j have a Hamming distance of exactly 1 using the hammingDistanceCheck function.
- If all conditions are satisfied, update the subsequence length and the actual subsequence:
  - Set dp[i] = dp[j] + 1.
  - Update ans[i] to be the subsequence starting from ans[j] and add words[i] to it.
Result Extraction:
- After processing all pairs, determine the index of the longest subsequence by finding the maximum value in the dp array.
- Return the subsequence of words stored in ans at the index of the maximum value found in dp.

This approach efficiently builds possible subsequences using dynamic programming and ensures the constraints related to group inequality and Hamming distance are respected in every step. The result is the longest subsequence of words that meet the problem’s conditions.

Code

C++

class Solution {
private:
    // Helper function to check the Hamming distance between two words
    bool hammingDistanceCheck(const string& a, const string& b) {
        // If the words are of different lengths, the Hamming distance can't be 1
        if (a.size() != b.size()) return false;

        int distance = 0; // Initialize Hamming distance
        for (size_t i = 0; i < a.size(); ++i) {
            // If characters at current position are different, increment the distance
            if (a[i] != b[i]) {
                if (++distance == 2) return false; // More than one difference makes distance greater than 1
            }
        }
        return (distance == 1); // Return true if exactly one difference
    }

public:
    // Main function to find the longest subsequence of words
    vector<string> getWordsInLongestSubsequence(vector<string>& words, vector<int>& groups) {
        int len = words.size(); // Length of the words list

        vector<int> dp(len, 1); // Dynamic Programming array to store longest subsequence length ending at each index
        vector<vector<string>> ans(len); // Store potential results (subsequences of words)

        // Initialize each subsequence with the corresponding word
        for (int i = 0; i < len; i++) {
            ans[i].emplace_back(words[i]);
        }

        // Iterate over each pair of words to find valid subsequences
        for (int i = 1; i < len; i++) {
            for (int j = i - 1; j >= 0; j--) {
                // Check conditions for forming a valid subsequence
                if (groups[i] != groups[j] && dp[j] + 1 > dp[i] && hammingDistanceCheck(words[i], words[j])) {
                    dp[i] = dp[j] + 1; // Update the max length of subsequence ending at index i
                    ans[i] = ans[j]; // Copy valid subsequence
                    ans[i].emplace_back(words[i]); // Append the current word
                }
            }
        }

        // Find the index of the longest subsequence
        int maxIndex = max_element(dp.begin(), dp.end()) - dp.begin();
        return ans[maxIndex]; // Return the longest valid subsequence
    }
};

Python

class Solution:
    # Helper function to check the Hamming distance between two words
    def hammingDistanceCheck(self, a: str, b: str) -> bool:
        # If the words are of different lengths, the Hamming distance can't be 1
        if len(a) != len(b):
            return False

        distance = 0  # Initialize Hamming distance
        for i in range(len(a)):
            # If characters at current position are different, increment the distance
            if a[i] != b[i]:
                if distance == 1:
                    return False  # More than one difference makes distance greater than 1
                distance += 1
        return distance == 1  # Return true if exactly one difference

    # Main function to find the longest subsequence of words
    def getWordsInLongestSubsequence(self, words: List[str], groups: List[int]) -> List[str]:
        len_words = len(words)  # Length of the words list

        dp = [1] * len_words  # Dynamic Programming array to store longest subsequence length ending at each index
        ans = [[] for _ in range(len_words)]  # Store potential results (subsequences of words)

        # Initialize each subsequence with the corresponding word
        for i in range(len_words):
            ans[i].append(words[i])

        # Iterate over each pair of words to find valid subsequences
        for i in range(1, len_words):
            for j in range(i - 1, -1, -1):
                # Check conditions for forming a valid subsequence
                if groups[i] != groups[j] and dp[j] + 1 > dp[i] and self.hammingDistanceCheck(words[i], words[j]):
                    dp[i] = dp[j] + 1  # Update the max length of subsequence ending at index i
                    ans[i] = ans[j][:]  # Copy valid subsequence
                    ans[i].append(words[i])  # Append the current word

        # Find the index of the longest subsequence
        maxIndex = dp.index(max(dp))
        return ans[maxIndex]  # Return the longest valid subsequence

Complexity

Time complexity: $O(n^2 \cdot m)$
The algorithm uses a nested loop structure to determine the longest subsequence of words that satisfy given conditions. Here is the step-by-step breakdown:
1. Initialization: Initializing the dp array and ans vector takes $O(n)$ time.
2. Nested Loops: The main computation involves two nested loops iterating over the indices of the words. The outer loop runs for $n$ iterations, and for each iteration, the inner loop can iterate up to $i$ times, which in the worst-case scenario is $n$. This gives us an approximate time complexity of $O(n^2)$ for these nested loops.
3. Hamming Distance Check: For each pair of words checked in the inner loop, the hammingDistanceCheck function is called. This function itself iterates over the length of the words which is at most $m = 10$ (from the constraints $1 \leq words[i].length \leq 10$). Therefore, the hamming distance calculation contributes a factor of $O(m)$.
Combining the above steps, the time complexity is $O(n^2 \cdot m)$.
Space complexity: $O(n \cdot m)$
1. DP Array: The dp array stores an integer for each word to record the longest subsequence ending at that index. This requires $O(n)$ space.
2. Subsequence Storage: The ans vector stores subsequences of words (vectors of strings). In the worst case, each word is its own subsequence, making the space complexity $O(n \cdot m)$, where $m$ is the maximum length of any word in words.
In total, the space complexity is $O(n \cdot m)$ due to the need to store the potential longest subsequence for each index.

CS Note
中

Leetcode 2901. Longest Unequal Adjacent Groups Subsequence II