Problem Informations

Problem Index: 1100
Problem Link: Find K-Length Substrings With No Repeated Characters
Topics: Hash Table, String, Sliding Window

Problem Description

Given a string $s$ and an integer $k$, return the number of substrings in $s$ of length $k$ with no repeated characters.

Example 1:

Input: $s = “havefunonleetcode”$, $k = 5$
Output: 6
Explanation: There are 6 substrings they are: ‘havef’, ‘avefu’, ‘vefun’, ’efuno’, ’etcod’, ’tcode’.

Example 2:

Input: $s = “home”$, $k = 5$
Output: 0
Explanation: Notice $k$ can be larger than the length of $s$. In this case, it is not possible to find any substring.

Constraints:

$1 \leq s.\text{length} \leq 10^4$
$s$ consists of lowercase English letters.
$1 \leq k \leq 10^4$

Intuition

The task is to determine the number of substrings of length $k$ in a given string $s$ that contain no repeated characters. This requires efficient handling of substrings to ensure uniqueness of characters. Using a sliding window approach allows us to efficiently manage and update the window as we progress through the string. The sliding window technique leverages the properties of continuous segments in an array or string, making it optimal for this situation where overlap and progression are key.

Approach

The algorithm employs a sliding window paradigm using two pointers to define the start (left) and the end (right) of the current window, which represents a substring of $s$. Here are the steps involved:

Initialization:
- Use a boolean vector is_char_used of size 26 (the total number of lowercase English letters) initialized to false to track the usage of each character within the current substring.
- Initialize left, right, and result_count to zero. left and right pointers manage the sliding window’s boundaries, while result_count tracks the number of valid substrings found.
Window Expansion:
- Traverse the string using the right pointer while it is less than the length of the string. On each iteration, check if the character at s[right] has already been used in the current window.
Character Repetition Handling:
- If the character at s[right] is already marked as used, increment the left pointer to shrink the window from the left until the character is unique. During this process, mark the characters at s[left] as not used in is_char_used.
Character Usage Marking:
- Mark the character at s[right] as used by setting the corresponding index in is_char_used to true.
Window Size Validation:
- Check if the window size $(\text{right} - \text{left} + 1)$ is at least $k$. If so, it indicates a valid substring, and result_count is incremented.
Result Return:
- Once the loop ends, the result_count holding the number of valid substrings found is returned.

This process ensures that each character is only ever considered once within any potential $k$-length substring, satisfying the requirement of non-repetition efficiently through the use of a sliding window and tracking mechanism.

Code

C++

class Solution {
public:
    int numKLenSubstrNoRepeats(string s, int k) {
        // A vector to track used characters, initialized to 26 false values
        vector<bool> is_char_used(26, false);
        
        int left = 0;  // The start index of the sliding window
        int right = 0; // The end index of the sliding window
        int result_count = 0; // To count the number of valid substrings
        int string_length = s.size(); // The length of the input string
        
        // Iterate over each character using the right index as the window expands
        for (; right < string_length; right++) {
            // If the current character is already used in the current window
            if (is_char_used[s[right] - 'a']) {
                // Move the left index to the right until the character at right is not duplicated
                do { 
                    is_char_used[s[left] - 'a'] = false; 
                } while (s[left++] != s[right]);
            }
            
            // Mark the current character as used
            is_char_used[s[right] - 'a'] = true;
            
            // If the window size is at least k, increment the result count
            if (right - left + 1 >= k) {
                result_count++;
            }
        }
        
        return result_count; // Return the number of valid substrings found
    }
};

Python

class Solution:
    def numKLenSubstrNoRepeats(self, s: str, k: int) -> int:
        # A list to track used characters, initialized to 26 False values
        is_char_used = [False] * 26
        
        left = 0  # The start index of the sliding window
        right = 0 # The end index of the sliding window
        result_count = 0 # To count the number of valid substrings
        string_length = len(s) # The length of the input string
        
        # Iterate over each character using the right index as the window expands
        while right < string_length:
            # If the current character is already used in the current window
            if is_char_used[ord(s[right]) - ord('a')]:
                # Move the left index to the right until the character at right is not duplicated
                while s[left] != s[right]:
                    is_char_used[ord(s[left]) - ord('a')] = False
                    left += 1
                left += 1
            
            # Mark the current character as used
            is_char_used[ord(s[right]) - ord('a')] = True
            
            # If the window size is at least k, increment the result count
            if right - left + 1 >= k:
                result_count += 1
            
            right += 1
        
        return result_count # Return the number of valid substrings found

Complexity

Time complexity: $O(n)$
The algorithm utilizes a sliding window approach across the string. The right pointer iterates over each character in the string exactly once, which gives a complexity of $O(n)$ where $n$ is the length of the string $s$. Although there is an inner loop that moves the left pointer, each character is handled at most twice (once when added to the window and once when removed), ensuring the cumulative operations inside this loop remain $O(n)$.
Space complexity: $O(1)$
The algorithm uses a fixed-size vector is_char_used of length 26 to track the usage of each lowercase English letter. This space requirement does not scale with the input size, making the space complexity $O(1)$. The integers for left, right, result_count, and string_length also contribute a constant amount of space.

CS Note
中

Leetcode 1100. Find K-Length Substrings With No Repeated Characters