Deep N-Grams: Creating the Model

Defining the GRU Model

We're going to build a GRU model using trax. We'll do this by passing in "layers" to the Serial class:

  • Serial: Class that applies layers serially (by function composition).
    • You can pass in the layers as arguments to Serial, separated by commas.
    • For example: Serial(Embeddings(...), Mean(...), Dense(...), LogSoftmax(...))

These are the layers that we'll be using:

  • ShiftRight: A layer that adds padding to shift the input. (note that this is one of the Trax methods that has re-named the arguments)
    • ShiftRight(n_positions=1, mode'train')= layer to shift the tensor to the right n_positions times
    • Here in the exercise you only need to specify the mode and not worry about n_positions
  • Embedding: Initializes the embedding layer which maps tokens/IDs to vectors
    • Embedding(vocab_size, d_feature). In this case it is the size of the vocabulary by the dimension of the model.
    • vocab_size is the number of unique words in the given vocabulary.
    • d_feature is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
  • GRU: The Trax GRU layer.
  • Dense: A dense (fully-connected) layer.
    • Dense(n_units): The parameter n_units is the number of units chosen for this dense layer.
  • LogSoftmax: Log of the output probabilities.
    • Here, you don't need to set any parameters for LogSoftMax().

Imports

# pypi
from trax import layers

Middle

The GRU Model

def GRULM(vocab_size: int=256, d_model: int=512, n_layers: int=2, mode:str='train') -> layers.Serial:
    """Returns a GRU language model.

    Args:
       vocab_size (int, optional): Size of the vocabulary. Defaults to 256.
       d_model (int, optional): Depth of embedding (n_units in the GRU cell). Defaults to 512.
       n_layers (int, optional): Number of GRU layers. Defaults to 2.
       mode (str, optional): 'train', 'eval' or 'predict', predict mode is for fast inference. Defaults to "train".

    Returns:
       trax.layers.combinators.Serial: A GRU language model as a layer that maps from a tensor of tokens to activations over a vocab set.
    """
    model = layers.Serial(
        # the ``n_shifts`` argument seems to have changed to ``n_positions``,
        # don't use it remain be backwards compatible
        layers.ShiftRight(1, mode=mode),
        layers.Embedding(vocab_size, d_model),
        *[layers.GRU(d_model) for unit in range(n_layers)],
        layers.Dense(vocab_size),
        layers.LogSoftmax()
    )
    return model

Will It Build?

model = GRULM()
print(model)
Serial[
  Serial[
    ShiftRight(1)
  ]
  Embedding_256_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]

Saving it for Later

It seems a little goofy to do this, but since I might forget some of the values, might as well.

Imports

# from pypi
from trax import layers

import attr

Model Builder

@attr.s(auto_attribs=True)
class GRUModel:
    """Builds the layers for the GRU model

    Args:
     shift_positions: amount of padding to add to the front of input
     vocabulary_size: the size of our learned vocabulary
     model_dimensions: the GRU and Embeddings dimensions
     gru_layers: how many GRU layers to create
     mode: train, eval, or predict
    """
    shift_positions: int=1
    vocabulary_size: int=256
    model_dimensions: int=512
    gru_layers: int=2
    mode: str="train"
    _model: layers.Serial=None
  • The Model
    @property
    def model(self) -> layers.Serial:
        """The GRU Model"""
        if self._model is None:
            self._model = layers.Serial(
                layers.ShiftRight(self.shift_positions, mode=self.mode),
                layers.Embedding(self.vocabulary_size, self.model_dimensions),
                *[layers.GRU(self.model_dimensions)
                  for gru_layer in range(self.gru_layers)],
                layers.Dense(self.vocabulary_size),
                layers.LogSoftmax()
            )
        return self._model
    

Check It Out

from neurotic.nlp.deep_rnn import GRUModel

gru = GRUModel()
print(gru.model)
Serial[
  Serial[
    ShiftRight(1)
  ]
  Embedding_256_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]