Module 2 3 Sequence Probability and Chain Rule - iffatAGheyas/NLP-handbook GitHub Wiki
Module 2.3: Sequence Probability & Chain Rule
Language models assign probabilities to entire sequences of words. The chain rule decomposes the joint probability into a product of conditional probabilities.
Key Concepts
-
Chain Rule
-
N-gram Approximation
1. Computing Sequence Probability with a Bigram Model
The following code builds a bigram model with Laplace smoothing and computes the probability of a test sentence via the chain rule.
```python
import math
from collections import Counter, defaultdict
# 1. Toy corpus
corpus = [
"the cat sat on the mat",
"the dog sat on the log",
"the cat saw the dog"
]
# 2. Count unigrams & bigrams
unigrams = Counter()
bigrams = Counter()
for sent in corpus:
tokens = sent.split()
unigrams.update(tokens)
bigrams.update(zip(tokens, tokens[1:]))
V = len(unigrams) # vocabulary size
# 3. Build Laplace-smoothed bigram probabilities
laplace = defaultdict(float)
for prev in unigrams:
for curr in unigrams:
count_bg = bigrams.get((prev, curr), 0)
laplace[(prev, curr)] = (count_bg + 1) / (unigrams[prev] + V)
# 4. Function to compute sequence probability
def sequence_probability(sentence, bi_probs, unigrams, V):
tokens = sentence.split()
# P(w1) approximated by unigram MLE
p1 = unigrams[tokens[0]] / sum(unigrams.values())
log_prob = math.log(p1)
# chain rule for the rest
for prev, curr in zip(tokens, tokens[1:]):
p = bi_probs.get((prev, curr), 1/V)
log_prob += math.log(p)
return math.exp(log_prob), log_prob # returns (probability, log-probability)
# 5. Demo
test_sentence = "the cat sat on the log"
prob, logp = sequence_probability(test_sentence, laplace, unigrams, V)
print(f"Sentence: '{test_sentence}'")
print(f"Probability = {prob:.6e}")
print(f"Log-Probability = {logp:.4f}")
Output:
Continue to Module 2.4: Language Model Evaluation – Cross-Entropy & Perplexity