Deep N-Grams: Generating Sentences
Table of Contents
Generating New Sentences
Now we'll use the language model to generate new sentences for that we need to make draws from a Gumble distribution.
The Gumbel Probability Density Function (PDF) is defined as: \[ f(z) = {1\over{\beta}}e^{\left(-z+e^{(-z)}\right)} \]
Where: \[ z = {(x - \mu)\over{\beta}} \]
The maximum value is what we choose as the prediction in the last step of a Recursive Neural Network RNN
we are using for text generation. A sample of a random variable from an exponential distribution approaches the Gumbel distribution when the sample increases asymptotically. For that reason, the Gumbel distribution is used to sample from a categorical distribution.
Imports
# python
from pathlib import Path
# from pypi
import numpy
# this project
from neurotic.nlp.deep_rnn import GRUModel
Set Up
gru = GRUModel()
model = gru.model
ours = Path("~/models/gru-shakespeare-model/model.pkl.gz").expanduser()
model.init_from_file(ours)
Middle
The Gumbel Sample
def gumbel_sample(log_probabilities: numpy.array,
temperature: float=1.0) -> float:
"""Gumbel sampling from a categorical distribution
Args:
log_probabilities: model predictions for a given input
temperature: fudge
Returns:
the maximum sample
"""
u = numpy.random.uniform(low=1e-6, high=1.0 - 1e-6,
size=log_probabilities.shape)
g = -numpy.log(-numpy.log(u))
return numpy.argmax(log_probabilities + g * temperature, axis=-1)
A Predictor
END_OF_SENTENCE = 1
def predict(number_of_characters: int, prefix: str,
break_on: int=END_OF_SENTENCE) -> str:
"""Predicts characters
Args:
number_of_characters: how many characters to predict
prefix: character to prompt the predictions
break_on: identifier for character to prematurely stop on
Returns:
prefix followed by predicted characters
"""
inputs = [ord(character) for character in prefix]
result = list(prefix)
maximum_length = len(prefix) + number_of_characters
for _ in range(number_of_characters):
current_inputs = numpy.array(inputs + [0] * (maximum_length - len(inputs)))
output = model(current_inputs[None, :]) # Add batch dim.
next_character = gumbel_sample(output[0, len(inputs)])
inputs += [int(next_character)]
if inputs[-1] == break_on:
break # EOS
result.append(chr(int(next_character)))
return "".join(result)
Some Predictions
print(predict(32, ""))
you would not live at essenomed
Yes, but I don't know anyone who would. Note that we are using a random sample, so repeatedly making predictions won't necessarily get you the same result.
print(predict(32, ""))
print(predict(32, ""))
print(predict(32, ""))
[exeunt] katharine yes, you are like the le beau where's some of my prett
print(predict(64, "falstaff"))
falstaff yea, marry, lady, she hath bianced three months.
bianced?
print(predict(64, "beast"))
beastly, and god forbid, sir! our revenue's cannon,
start = "finger"
for word in range(5):
start = predict(10, start)
print(start)
finger, iago, an finger, iago, and ask. finger, iago, and ask. finger, iago, and ask. finger, iago, and ask.
So, if you feed it enough text, it becomes more deterministic.
SPACE = ord(" ")
start = "iago"
output = start
for word in range(10):
tokens = predict(32, start).split()
start = tokens[1] if len(tokens) > 1 else tokens[0]
output = f"{output} {start}"
print(output)
iago your husband if there never for you need no never
In the generated text above, you can see that the model generates text that makes sense capturing dependencies between words and without any input. A simple n-gram model would have not been able to capture all of that in one sentence.
On statistical methods
Using a statistical method will not give you results that are as good. The model would not be able to encode information seen previously in the data set and as a result, the perplexity will increase. The higher the perplexity, the worse your model is. Furthermore, statistical N-Gram models take up too much space and memory. As a result, it would be inefficient and too slow. Conversely, with deep neural networks, you can get a better perplexity. Note though, that learning about n-gram language models is still important and leads to a better understanding of deep neural networks.