Deep N-Grams: Creating the Model
Table of Contents
Defining the GRU Model
We're going to build a GRU
model using trax. We'll do this by passing in "layers" to the Serial
class:
Serial
: Class that applies layers serially (by function composition).- You can pass in the layers as arguments to
Serial
, separated by commas. - For example:
Serial(Embeddings(...), Mean(...), Dense(...), LogSoftmax(...))
- You can pass in the layers as arguments to
These are the layers that we'll be using:
ShiftRight
: A layer that adds padding to shift the input. (note that this is one of the Trax methods that has re-named the arguments)ShiftRight(n_positions=1, mode
'train')= layer to shift the tensor to the right n_positions times- Here in the exercise you only need to specify the mode and not worry about n_positions
Embedding
: Initializes the embedding layer which maps tokens/IDs to vectorsEmbedding(vocab_size, d_feature)
. In this case it is the size of the vocabulary by the dimension of the model.vocab_size
is the number of unique words in the given vocabulary.d_feature
is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
GRU
: TheTrax
GRU layer.GRU(n_units)
Builds a traditional GRU of n_cells with dense internal transformations.- An academic paper looking at the GRU: Empirical Evaluation of Gated Neural Networks On Sequence Modeling
Dense
: A dense (fully-connected) layer.Dense(n_units)
: The parametern_units
is the number of units chosen for this dense layer.
LogSoftmax
: Log of the output probabilities.- Here, you don't need to set any parameters for
LogSoftMax()
.
- Here, you don't need to set any parameters for
Imports
# pypi
from trax import layers
Middle
The GRU Model
def GRULM(vocab_size: int=256, d_model: int=512, n_layers: int=2, mode:str='train') -> layers.Serial:
"""Returns a GRU language model.
Args:
vocab_size (int, optional): Size of the vocabulary. Defaults to 256.
d_model (int, optional): Depth of embedding (n_units in the GRU cell). Defaults to 512.
n_layers (int, optional): Number of GRU layers. Defaults to 2.
mode (str, optional): 'train', 'eval' or 'predict', predict mode is for fast inference. Defaults to "train".
Returns:
trax.layers.combinators.Serial: A GRU language model as a layer that maps from a tensor of tokens to activations over a vocab set.
"""
model = layers.Serial(
# the ``n_shifts`` argument seems to have changed to ``n_positions``,
# don't use it remain be backwards compatible
layers.ShiftRight(1, mode=mode),
layers.Embedding(vocab_size, d_model),
*[layers.GRU(d_model) for unit in range(n_layers)],
layers.Dense(vocab_size),
layers.LogSoftmax()
)
return model
Will It Build?
model = GRULM()
print(model)
Serial[ Serial[ ShiftRight(1) ] Embedding_256_512 GRU_512 GRU_512 Dense_256 LogSoftmax ]
Saving it for Later
It seems a little goofy to do this, but since I might forget some of the values, might as well.
Imports
# from pypi
from trax import layers
import attr
Model Builder
@attr.s(auto_attribs=True)
class GRUModel:
"""Builds the layers for the GRU model
Args:
shift_positions: amount of padding to add to the front of input
vocabulary_size: the size of our learned vocabulary
model_dimensions: the GRU and Embeddings dimensions
gru_layers: how many GRU layers to create
mode: train, eval, or predict
"""
shift_positions: int=1
vocabulary_size: int=256
model_dimensions: int=512
gru_layers: int=2
mode: str="train"
_model: layers.Serial=None
- The Model
@property def model(self) -> layers.Serial: """The GRU Model""" if self._model is None: self._model = layers.Serial( layers.ShiftRight(self.shift_positions, mode=self.mode), layers.Embedding(self.vocabulary_size, self.model_dimensions), *[layers.GRU(self.model_dimensions) for gru_layer in range(self.gru_layers)], layers.Dense(self.vocabulary_size), layers.LogSoftmax() ) return self._model
Check It Out
from neurotic.nlp.deep_rnn import GRUModel
gru = GRUModel()
print(gru.model)
Serial[ Serial[ ShiftRight(1) ] Embedding_256_512 GRU_512 GRU_512 Dense_256 LogSoftmax ]