Auto-Complete: Perplexity

Perplexity

In the previous post we implemented the N-Gram Language Model for the auto-complete system that we began here.

In this section, you will generate the perplexity score to evaluate your model on the test set.

  • You will also use back-off when needed.
  • Perplexity is used as an evaluation metric of your language model.
  • To calculate the the perplexity score of the test set on an n-gram model, use:

\[ PP(W) =\sqrt[N]{ \prod_{t=n+1}^N \frac{1}{P(w_t | w_{t-n} \cdots w_{t-1})} } \tag{4} \]

  • where N is the length of the sentence.
  • n is the number of words in the n-gram (e.g. 2 for a bigram).
  • In math, the numbering starts at one and not zero.

In code, array indexing starts at zero, so the code will use ranges for t according to this formula:

\[ PP(W) =\sqrt[N]{ \prod_{t=n}^{N-1} \frac{1}{P(w_t | w_{t-n} \cdots w_{t-1})} } \tag{4.1} \]

The higher the probabilities are, the lower the perplexity will be.

  • The more the n-grams tell us about the sentence, the lower the perplexity score will be.

Imports

# python
import math

# pypi
from expects import expect, be_true

import attr

# this project
from neurotic.nlp.autocomplete import NGrams,NGramProbability

Middle

The Probability Function

This was already defined in the previous post, but the function following it assumes its existence so I'm temporarily re-defining it here.

def estimate_probability(word: str,
                         previous_n_gram: tuple, 
                         n_gram_counts: dict,
                         n_plus1_gram_counts: dict,
                         vocabulary_size: int,
                         k: float=1.0) -> float:
    """
    Estimate the probabilities of a next word using the n-gram counts with k-smoothing

    Args:
       word: next word
       previous_n_gram: A sequence of words of length n
       n_gram_counts: Dictionary of counts of n-grams
       n_plus1_gram_counts: Dictionary of counts of (n+1)-grams
       vocabulary_size: number of words in the vocabulary
       k: positive constant, smoothing parameter

    Returns:
       A probability
    """
    previous_n_gram = tuple(previous_n_gram)
    previous_n_gram_count = n_gram_counts.get(previous_n_gram, 0)

    n_plus1_gram = previous_n_gram + (word,)  
    n_plus1_gram_count = n_plus1_gram_counts.get(n_plus1_gram, 0)       
    return (n_plus1_gram_count + k)/(previous_n_gram_count + k * vocabulary_size)

Calculating the Perplexity

# UNQ_C10 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
# GRADED FUNCTION: calculate_perplexity
def calculate_perplexity(sentence: list,
                         n_gram_counts: dict,
                         n_plus1_gram_counts: dict,
                         vocabulary_size: int,
                         k: float=1.0):
    """
    Calculate perplexity for a list of sentences

    Args:
       sentence: List of strings
       n_gram_counts: Dictionary of counts of (n+1)-grams
       n_plus1_gram_counts: Dictionary of counts of (n+1)-grams
       vocabulary_size: number of unique words in the vocabulary
       k: Positive smoothing constant

    Returns:
       Perplexity score
    """
    # length of previous words
    n = len(list(n_gram_counts.keys())[0]) 

    # prepend <s> and append <e>
    sentence = ["<s>"] * n + sentence + ["<e>"]

    # Cast the sentence from a list to a tuple
    sentence = tuple(sentence)

    # length of sentence (after adding <s> and <e> tokens)
    N = len(sentence)

    # The variable p will hold the product
    # that is calculated inside the n-root
    # Update this in the code below
    product_pi = 1.0

    ### START CODE HERE (Replace instances of 'None' with your code) ###

    # Index t ranges from n to N - 1, inclusive on both ends
    for t in range(n, N): # complete this line

        # get the n-gram preceding the word at position t
        n_gram = sentence[t - n: t]

        # get the word at position t
        word = sentence[t]

        # Estimate the probability of the word given the n-gram
        # using the n-gram counts, n-plus1-gram counts,
        # vocabulary size, and smoothing constant
        probability = estimate_probability(
            word=word, previous_n_gram=n_gram,
            vocabulary_size=vocabulary_size,
            n_gram_counts=n_gram_counts,
            n_plus1_gram_counts=n_plus1_gram_counts, k=k)

        # Update the product of the probabilities
        # This 'product_pi' is a cumulative product 
        # of the (1/P) factors that are calculated in the loop
        product_pi *= 1/probability

    # Take the Nth root of the product
    perplexity = product_pi**(1/N)

    ### END CODE HERE ### 
    return perplexity

Test It

sentences = [['i', 'like', 'a', 'cat'],
                 ['this', 'dog', 'is', 'like', 'a', 'cat']]
unique_words = list(set(sentences[0] + sentences[1]))

unigram_counts = NGrams(sentences, 1).counts
bigram_counts = NGrams(sentences, 2).counts


perplexity_train1 = calculate_perplexity(sentences[0],
                                         unigram_counts, bigram_counts,
                                         len(unique_words), k=1.0)
expected = 2.8040
print(f"Perplexity for first train sample: {perplexity_train1:.4f}")

expect(math.isclose(perplexity_train1, expected, abs_tol=1e-4)).to(be_true)
test_sentence = ['i', 'like', 'a', 'dog']
perplexity_test = calculate_perplexity(test_sentence,
                                       unigram_counts, bigram_counts,
                                       len(unique_words), k=1.0)
print(f"Perplexity for test sample: {perplexity_test:.4f}")
expected = 3.9654
expect(math.isclose(perplexity_test, expected, abs_tol=1e-4)).to(be_true)
Perplexity for first train sample: 2.8040
Perplexity for test sample: 3.9654

Note: If your sentence is really long, there will be underflow when multiplying many fractions.

  • To handle longer sentences, modify your implementation to take the sum of the log of the probabilities.

Using the Class-Based Version

@attr.s(auto_attribs=True)
class Perplexity:
    """Calculate perplexity

    Args:
     data: the tokenized training input
     n: the size of the n-grams
     augment_vocabulary: whether to augment the vocabulary for toy examples
    """
    data: list
    n: int
    augment_vocabulary: bool=False
    _probabilifier: NGramProbability=None

    @property
    def probabilifier(self) -> NGramProbability:
        """Probability Calculator"""
        if self._probabilifier is None:
            self._probabilifier = NGramProbability(
                self.data, self.n,
                augment_vocabulary=self.augment_vocabulary)
        return self._probabilifier

    def perplexity(self, sentence: list) -> float:
        """Calculates the perplexity for the sentence"""
        sentence = tuple(["<s>"] * self.n + sentence + ["<e>"])
        N = len(sentence)

        n_grams = (sentence[position - self.n: position]
                   for position in range(self.n, N))
        words = (sentence[position]
                 for position in range(self.n, N))
        words_n_grams = zip(words, n_grams)
        probabilities = (self.probabilifier.probability(word, n_gram)
                         for word, n_gram in words_n_grams)
        product = math.prod((1/probability for probability in probabilities))
        return product**(1/N)
sentences = [['i', 'like', 'a', 'cat'],
                 ['this', 'dog', 'is', 'like', 'a', 'cat']]

model = Perplexity(sentences, 1, augment_vocabulary=False)

actual = model.perplexity(sentences[0])

expected = 2.8040
print(f"Perplexity for first train sample: {actual:.4f}")

expect(math.isclose(actual, expected, abs_tol=1e-4)).to(be_true)
test_sentence = ['i', 'like', 'a', 'dog']
model
perplexity_test = model.perplexity(test_sentence)

print(f"Perplexity for test sample: {perplexity_test:.4f}")
expected = 3.9654
expect(math.isclose(perplexity_test, expected, abs_tol=1e-4)).to(be_true)

End

In the next part we'll build our completed auto-complete system.