Deep N-Grams: Creating the Model

Cloistered Monkey

2021-01-05 16:48

Source

Defining the GRU Model

We're going to build a GRU model using trax. We'll do this by passing in "layers" to the Serial class:

Serial: Class that applies layers serially (by function composition).
- You can pass in the layers as arguments to Serial, separated by commas.
- For example: Serial(Embeddings(...), Mean(...), Dense(...), LogSoftmax(...))

These are the layers that we'll be using:

ShiftRight: A layer that adds padding to shift the input. (note that this is one of the Trax methods that has re-named the arguments)
- ShiftRight(n_positions=1, mode'train')= layer to shift the tensor to the right n_positions times
- Here in the exercise you only need to specify the mode and not worry about n_positions
Embedding: Initializes the embedding layer which maps tokens/IDs to vectors
- Embedding(vocab_size, d_feature). In this case it is the size of the vocabulary by the dimension of the model.
- vocab_size is the number of unique words in the given vocabulary.
- d_feature is the number of elements in the word embedding (some choices for a word embedding size range from 150 to 300, for example).
GRU: The Trax GRU layer.
- GRU(n_units) Builds a traditional GRU of n_cells with dense internal transformations.
- An academic paper looking at the GRU: Empirical Evaluation of Gated Neural Networks On Sequence Modeling
Dense: A dense (fully-connected) layer.
- Dense(n_units): The parameter n_units is the number of units chosen for this dense layer.
LogSoftmax: Log of the output probabilities.
- Here, you don't need to set any parameters for LogSoftMax().

Imports

# pypi
from trax import layers

Middle

The GRU Model

def GRULM(vocab_size: int=256, d_model: int=512, n_layers: int=2, mode:str='train') -> layers.Serial:
    """Returns a GRU language model.

    Args:
       vocab_size (int, optional): Size of the vocabulary. Defaults to 256.
       d_model (int, optional): Depth of embedding (n_units in the GRU cell). Defaults to 512.
       n_layers (int, optional): Number of GRU layers. Defaults to 2.
       mode (str, optional): 'train', 'eval' or 'predict', predict mode is for fast inference. Defaults to "train".

    Returns:
       trax.layers.combinators.Serial: A GRU language model as a layer that maps from a tensor of tokens to activations over a vocab set.
    """
    model = layers.Serial(
        # the ``n_shifts`` argument seems to have changed to ``n_positions``,
        # don't use it remain be backwards compatible
        layers.ShiftRight(1, mode=mode),
        layers.Embedding(vocab_size, d_model),
        *[layers.GRU(d_model) for unit in range(n_layers)],
        layers.Dense(vocab_size),
        layers.LogSoftmax()
    )
    return model

Will It Build?

model = GRULM()
print(model)

Serial[
  Serial[
    ShiftRight(1)
  ]
  Embedding_256_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]

Saving it for Later

It seems a little goofy to do this, but since I might forget some of the values, might as well.

Imports

# from pypi
from trax import layers

import attr

Model Builder

@attr.s(auto_attribs=True)
class GRUModel:
    """Builds the layers for the GRU model

    Args:
     shift_positions: amount of padding to add to the front of input
     vocabulary_size: the size of our learned vocabulary
     model_dimensions: the GRU and Embeddings dimensions
     gru_layers: how many GRU layers to create
     mode: train, eval, or predict
    """
    shift_positions: int=1
    vocabulary_size: int=256
    model_dimensions: int=512
    gru_layers: int=2
    mode: str="train"
    _model: layers.Serial=None

The Model

@property
def model(self) -> layers.Serial:
    """The GRU Model"""
    if self._model is None:
        self._model = layers.Serial(
            layers.ShiftRight(self.shift_positions, mode=self.mode),
            layers.Embedding(self.vocabulary_size, self.model_dimensions),
            *[layers.GRU(self.model_dimensions)
              for gru_layer in range(self.gru_layers)],
            layers.Dense(self.vocabulary_size),
            layers.LogSoftmax()
        )
    return self._model

Check It Out

from neurotic.nlp.deep_rnn import GRUModel

gru = GRUModel()
print(gru.model)

Serial[
  Serial[
    ShiftRight(1)
  ]
  Embedding_256_512
  GRU_512
  GRU_512
  Dense_256
  LogSoftmax
]

Table of Contents

Defining the GRU Model

Imports

Middle

The GRU Model

Will It Build?

Saving it for Later

Imports

Model Builder

Check It Out