Module 2 3 Sequence Probability and Chain Rule - iffatAGheyas/NLP-handbook GitHub Wiki

Module 2.3: Sequence Probability & Chain Rule

Language models assign probabilities to entire sequences of words. The chain rule decomposes the joint probability into a product of conditional probabilities.


Key Concepts

  • Chain Rule
    image

  • N-gram Approximation image

1. Computing Sequence Probability with a Bigram Model

The following code builds a bigram model with Laplace smoothing and computes the probability of a test sentence via the chain rule.

```python
import math
from collections import Counter, defaultdict

# 1. Toy corpus
corpus = [
    "the cat sat on the mat",
    "the dog sat on the log",
    "the cat saw the dog"
]

# 2. Count unigrams & bigrams
unigrams = Counter()
bigrams  = Counter()
for sent in corpus:
    tokens = sent.split()
    unigrams.update(tokens)
    bigrams.update(zip(tokens, tokens[1:]))

V = len(unigrams)  # vocabulary size

# 3. Build Laplace-smoothed bigram probabilities
laplace = defaultdict(float)
for prev in unigrams:
    for curr in unigrams:
        count_bg = bigrams.get((prev, curr), 0)
        laplace[(prev, curr)] = (count_bg + 1) / (unigrams[prev] + V)

# 4. Function to compute sequence probability
def sequence_probability(sentence, bi_probs, unigrams, V):
    tokens = sentence.split()
    # P(w1) approximated by unigram MLE
    p1 = unigrams[tokens[0]] / sum(unigrams.values())
    log_prob = math.log(p1)
    # chain rule for the rest
    for prev, curr in zip(tokens, tokens[1:]):
        p = bi_probs.get((prev, curr), 1/V)
        log_prob += math.log(p)
    return math.exp(log_prob), log_prob  # returns (probability, log-probability)

# 5. Demo
test_sentence = "the cat sat on the log"
prob, logp = sequence_probability(test_sentence, laplace, unigrams, V)
print(f"Sentence: '{test_sentence}'")
print(f"Probability = {prob:.6e}")
print(f"Log-Probability = {logp:.4f}")

Output:

image

Continue to Module 2.4: Language Model Evaluation – Cross-Entropy & Perplexity