Longest Substring With at Least K Repeating Characters - codepath/compsci_guides GitHub Wiki

Unit 7 Session 2 (Click for link to problem statements)

Problem Highlights

  • 💡 Difficulty: Hard
  • Time to complete: 30 mins
  • 🛠️ Topics: Divide and Conquer, Recursion, String Manipulation

1: U-nderstand

Understand what the interviewer is asking for by using test cases and questions about the problem.

  • Established a set (2-3) of test cases to verify their own solution later.
  • Established a set (1-2) of edge cases to verify their solution handles complexities.
  • Have fully understood the problem and have no clarifying questions.
  • Have you verified any Time/Space Constraints for this problem?
  • What should be returned if no substring meets the criteria?
    • Return 0.
  • What should be returned if s is empty?
    • Return 0 since there are no substrings.
  • Are the characters in the string case-sensitive?
    • Yes, treat uppercase and lowercase letters as distinct characters.
HAPPY CASE
Input: s = "tatooine", k = 2
Output: 2
Explanation: The longest substring is "oo" as 'o' is repeated 2 times.

Input: s = "chewbacca", k = 2
Output: 4
Explanation: The longest substring is "acca" as both 'a' and 'c' are repeated 2 times.

EDGE CASE
Input: s = "abcde", k = 2
Output: 0
Explanation: No character repeats 2 times.

Input: s = ", k = 3
Output: 0
Explanation: The string is empty, so return 0.

2: M-atch

Match what this problem looks like to known categories of problems, e.g., Linked List or Dynamic Programming, and strategies or patterns in those categories.

For Divide and Conquer problems, we want to consider the following approaches:

  • Divide and Conquer: The problem can be solved by recursively dividing the string based on characters that do not meet the frequency criteria, then finding the maximum valid substring in each part.

3: P-lan

Plan the solution with appropriate visualizations and pseudocode.

General Idea: Use a divide-and-conquer approach by recursively splitting the string into smaller substrings whenever a character does not meet the frequency requirement. The base case is when the substring length is less than k.

Divide and Conquer Implementation

Pseudocode:

1) Define a helper function `divide_and_conquer(s, k)`:
    a) Base Case: If the length of `s` is less than `k`, return 0.
    b) Create a frequency map to count the occurrences of each character in `s`.
    c) For each character in `s`, if its frequency is less than `k`, split the string by this character and recursively apply `divide_and_conquer` to each part.
    d) If every character in `s` meets the frequency requirement, return the length of `s`.

2) The main function `longest_substring(s, k)` will call the `divide_and_conquer` function and return the result.

⚠️ Common Mistakes

  • Forgetting to handle the base case where the string length is less than k.
  • Incorrectly managing the split and merge operations, leading to missed valid substrings.

4: I-mplement

Implement the code to solve the algorithm.

def longest_substring(s, k):
    def divide_and_conquer(s, k):
        # Base case: if the length of the string is less than k, no valid substring exists
        if len(s) < k:
            return 0
        
        # Create a frequency map
        freq_map = {}
        for char in s:
            freq_map[char] = freq_map.get(char, 0) + 1
        
        # Split string by characters that appear less than k times
        for char in freq_map:
            if freq_map[char] < k:
                # Split by the character and recursively apply on each part
                return max(divide_and_conquer(sub_str, k) for sub_str in s.split(char))
        
        # If every character appears at least k times, the whole string is valid
        return len(s)
    
    return divide_and_conquer(s, k)

5: R-eview

Review the code by running specific example(s) and recording values (watchlist) of your code's variables along the way.

  • Trace through your code with the input "tatooine", k = 2:
    • The divide-and-conquer approach should correctly identify "oo" as the longest valid substring.

6: E-valuate

Evaluate the performance of your algorithm and state any strong/weak or future potential work.

Assume N represents the length of the string s.

  • Time Complexity: O(N * log N) in the worst case, as the string is recursively divided.
  • Space Complexity: O(N) due to the recursive call stack and frequency map storage.