Longest Substring With at Least K Repeating Characters - codepath/compsci_guides GitHub Wiki
Unit 7 Session 2 (Click for link to problem statements)
Problem Highlights
- 💡 Difficulty: Hard
- ⏰ Time to complete: 30 mins
- 🛠️ Topics: Divide and Conquer, Recursion, String Manipulation
1: U-nderstand
Understand what the interviewer is asking for by using test cases and questions about the problem.
- Established a set (2-3) of test cases to verify their own solution later.
- Established a set (1-2) of edge cases to verify their solution handles complexities.
- Have fully understood the problem and have no clarifying questions.
- Have you verified any Time/Space Constraints for this problem?
- What should be returned if no substring meets the criteria?
- Return
0
.
- Return
- What should be returned if
s
is empty?- Return
0
since there are no substrings.
- Return
- Are the characters in the string case-sensitive?
- Yes, treat uppercase and lowercase letters as distinct characters.
HAPPY CASE
Input: s = "tatooine", k = 2
Output: 2
Explanation: The longest substring is "oo" as 'o' is repeated 2 times.
Input: s = "chewbacca", k = 2
Output: 4
Explanation: The longest substring is "acca" as both 'a' and 'c' are repeated 2 times.
EDGE CASE
Input: s = "abcde", k = 2
Output: 0
Explanation: No character repeats 2 times.
Input: s = ", k = 3
Output: 0
Explanation: The string is empty, so return 0.
2: M-atch
Match what this problem looks like to known categories of problems, e.g., Linked List or Dynamic Programming, and strategies or patterns in those categories.
For Divide and Conquer problems, we want to consider the following approaches:
- Divide and Conquer: The problem can be solved by recursively dividing the string based on characters that do not meet the frequency criteria, then finding the maximum valid substring in each part.
3: P-lan
Plan the solution with appropriate visualizations and pseudocode.
General Idea: Use a divide-and-conquer approach by recursively splitting the string into smaller substrings whenever a character does not meet the frequency requirement. The base case is when the substring length is less than k
.
Divide and Conquer Implementation
Pseudocode:
1) Define a helper function `divide_and_conquer(s, k)`:
a) Base Case: If the length of `s` is less than `k`, return 0.
b) Create a frequency map to count the occurrences of each character in `s`.
c) For each character in `s`, if its frequency is less than `k`, split the string by this character and recursively apply `divide_and_conquer` to each part.
d) If every character in `s` meets the frequency requirement, return the length of `s`.
2) The main function `longest_substring(s, k)` will call the `divide_and_conquer` function and return the result.
⚠️ Common Mistakes
- Forgetting to handle the base case where the string length is less than
k
. - Incorrectly managing the split and merge operations, leading to missed valid substrings.
4: I-mplement
Implement the code to solve the algorithm.
def longest_substring(s, k):
def divide_and_conquer(s, k):
# Base case: if the length of the string is less than k, no valid substring exists
if len(s) < k:
return 0
# Create a frequency map
freq_map = {}
for char in s:
freq_map[char] = freq_map.get(char, 0) + 1
# Split string by characters that appear less than k times
for char in freq_map:
if freq_map[char] < k:
# Split by the character and recursively apply on each part
return max(divide_and_conquer(sub_str, k) for sub_str in s.split(char))
# If every character appears at least k times, the whole string is valid
return len(s)
return divide_and_conquer(s, k)
5: R-eview
Review the code by running specific example(s) and recording values (watchlist) of your code's variables along the way.
- Trace through your code with the input
"tatooine", k = 2
:- The divide-and-conquer approach should correctly identify "oo" as the longest valid substring.
6: E-valuate
Evaluate the performance of your algorithm and state any strong/weak or future potential work.
Assume N
represents the length of the string s
.
- Time Complexity:
O(N * log N)
in the worst case, as the string is recursively divided. - Space Complexity:
O(N)
due to the recursive call stack and frequency map storage.