Lingual Frequencies - codepath/compsci_guides GitHub Wiki

U-nderstand

Understand what the interviewer is asking for by using test cases and questions about the problem.

  • Q
    • What is the desired outcome?
      • To find the most frequent word in a text that is not an illegible word.
    • What input is provided?
      • A string text and a list of illegible words illegibles.

P-lan

Plan the solution with appropriate visualizations and pseudocode.

General Idea: Clean the text, remove illegible words, and find the most frequent remaining word.

1) Convert the `text` to lowercase.
2) Remove punctuation and split the text into words.
3) Remove any words that are in the `illegibles` list.
4) Use `Counter` to count the frequency of the remaining words.
5) Return the word with the highest frequency.

⚠️ Common Mistakes

  • Not handling punctuation properly or not correctly filtering out illegible words.

I-mplement

from collections import Counter

def find_most_frequent_word(text, illegibles):
    # Convert the text to lowercase
    text = text.lower()
    
    # Create a set of illegible words for quick lookup
    illegible_set = set(illegibles)
    
    # Remove punctuation by replacing them with spaces
    cleaned_text = "
    for char in text:
        if char.isalnum() or char.isspace():
            cleaned_text += char
        else:
            cleaned_text += " "
    
    # Split the cleaned text into words
    words = cleaned_text.split()
    
    # Remove illegible words
    words = [word for word in words if word not in illegible_set]
    
    # Use Counter to count the frequency of each word
    word_counts = Counter(words)
    
    # Find the word with the maximum frequency
    most_frequent_word = word_counts.most_common(1)[0][0] if word_counts else "
    
    return most_frequent_word