Lingual Frequencies - codepath/compsci_guides GitHub Wiki
U-nderstand
Understand what the interviewer is asking for by using test cases and questions about the problem.
- Q
- What is the desired outcome?
- To find the most frequent word in a text that is not an illegible word.
- What input is provided?
- A string
text
and a list of illegible wordsillegibles
.
- A string
- What is the desired outcome?
P-lan
Plan the solution with appropriate visualizations and pseudocode.
General Idea: Clean the text, remove illegible words, and find the most frequent remaining word.
1) Convert the `text` to lowercase.
2) Remove punctuation and split the text into words.
3) Remove any words that are in the `illegibles` list.
4) Use `Counter` to count the frequency of the remaining words.
5) Return the word with the highest frequency.
⚠️ Common Mistakes
- Not handling punctuation properly or not correctly filtering out illegible words.
I-mplement
from collections import Counter
def find_most_frequent_word(text, illegibles):
# Convert the text to lowercase
text = text.lower()
# Create a set of illegible words for quick lookup
illegible_set = set(illegibles)
# Remove punctuation by replacing them with spaces
cleaned_text = "
for char in text:
if char.isalnum() or char.isspace():
cleaned_text += char
else:
cleaned_text += " "
# Split the cleaned text into words
words = cleaned_text.split()
# Remove illegible words
words = [word for word in words if word not in illegible_set]
# Use Counter to count the frequency of each word
word_counts = Counter(words)
# Find the word with the maximum frequency
most_frequent_word = word_counts.most_common(1)[0][0] if word_counts else "
return most_frequent_word