Find Most Frequent Keywords - codepath/compsci_guides GitHub Wiki

Unit 4 Session 2 Advanced (Click for link to problem statements)

U-nderstand

Understand what the interviewer is asking for by using test cases and questions about the problem.

  • Q: What is the goal of the problem?
    • A: The goal is to identify the most frequently used keywords across all scenes in a dictionary.
  • Q: What are the inputs?
    • A: The input is a dictionary where the keys are scene names and the values are lists of keywords used in each scene.
  • Q: What are the outputs?
    • A: The output is a list of keywords that appear most frequently across all scenes. If there is a tie, return all the keywords with the highest frequency.
  • Q: How should ties be handled?
    • A: If multiple keywords have the same highest frequency, return all of them.
  • Q: Are there any assumptions about the input?
    • A: The input dictionary is well-formed, with string keys and values being lists of strings.

P-lan

Plan the solution with appropriate visualizations and pseudocode.

General Idea: Use a dictionary to count the frequency of each keyword across all scenes. Then, identify the maximum frequency and return all keywords that have this frequency.

1) Initialize an empty dictionary `keyword_freq` to store the frequency of each keyword.
2) Iterate over each list of keywords in the `scenes` dictionary:
   a) For each keyword, increment its count in `keyword_freq`.
3) Identify the maximum frequency in `keyword_freq`.
4) Create a list `most_frequent_keywords` containing all keywords with the maximum frequency.
5) Return `most_frequent_keywords`.

**⚠️ Common Mistakes**

- Forgetting to handle ties correctly, leading to only one keyword being returned instead of all keywords with the highest frequency.
- Not correctly updating the frequency of keywords, which could lead to inaccurate results.
- Assuming the input dictionary is always non-empty or well-formed without considering edge cases.

I-mplement

def find_most_frequent_keywords(scenes):
    # Create a dictionary to store keyword frequencies
    keyword_freq = {}

    # Iterate over each scene and update keyword frequencies
    for keywords in scenes.values():
        for keyword in keywords:
            if keyword in keyword_freq:
                keyword_freq[keyword] += 1
            else:
                keyword_freq[keyword] = 1

    # Find the maximum frequency
    max_freq = max(keyword_freq.values())

    # Find all keywords with the maximum frequency
    most_frequent_keywords = [keyword for keyword, freq in keyword_freq.items() if freq == max_freq]

    return most_frequent_keywords
Example Usage:

scenes = {
    "Scene 1": ["action", "hero", "battle"],
    "Scene 2": ["hero", "action", "quest"],
    "Scene 3": ["battle", "strategy", "hero"],
    "Scene 4": ["action", "strategy"]
}
print(find_most_frequent_keywords(scenes))  
# Output: ['action', 'hero']

scenes = {
    "Scene A": ["love", "drama"],
    "Scene B": ["drama", "love"],
    "Scene C": ["comedy", "love"],
    "Scene D": ["comedy", "drama"]
}
print(find_most_frequent_keywords(scenes))  
# Output: ['love', 'drama']