Introducing the CBOW Model

The Continuous Bag-Of-Words (CBOW) Model

In the previous post we prepared our data, now we'll look at how the CBOW model is constructed.

Imports

# from pypi
from expects import (
    be_true,
    equal,
    expect,
)
import numpy

Activation Functions

Let's start by implementing the activation functions, ReLU and softmax.

ReLU

ReLU is used to calculate the values of the hidden layer, in the following formulas:

\begin{align} \mathbf{z_1} &= \mathbf{W_1}\mathbf{x} + \mathbf{b_1} \tag{1} \\ \mathbf{h} &= \mathrm{ReLU}(\mathbf{z_1}) \tag{2} \\ \end{align}

Let's fix a value for \(\mathbf{z_1}\) as a working example.

numpy.random.seed(10)

# Define a 5X1 column vector using numpy
z_1 = 10 * numpy.random.rand(5, 1) - 5

# Print the vector
print(z_1)
[[ 2.71320643]
 [-4.79248051]
 [ 1.33648235]
 [ 2.48803883]
 [-0.01492988]]

Notice that using numpy's random.rand function returns a numpy array filled with values taken from a uniform distribution over [0, 1). Numpy allows vectorization so each value is multiplied by 10 and then 5 is subtracted from them.

To get the ReLU of this vector, you want all the negative values to become zeros.

First create a copy of this vector.

h = z_1.copy()

Now determine which of its values are negative.

print(h < 0)
[[False]
 [ True]
 [False]
 [False]
 [ True]]

You can now simply set all of the values which are negative to 0.

h[h < 0] = 0

And that's it: you have the ReLU of \(\mathbf{z_1}\).

print(h)
[[2.71320643]
 [0.        ]
 [1.33648235]
 [2.48803883]
 [0.        ]]

Now implement ReLU as a function.

def relu(z: numpy.ndarray) -> numpy.ndarray:
    """Get the ReLU for the input array

    Args:
     z: an array of numbers

    Returns:
     ReLU of z
    """
    result = z.copy()
    result[result < 0] = 0
    return result

And check that it's working.

z = numpy.array([[-1.25459881],
              [ 4.50714306],
              [ 2.31993942],
              [ 0.98658484],
              [-3.4398136 ]])

# Apply ReLU to it
actual = relu(z)
expected = numpy.array([[0.        ],
                        [4.50714306],
                        [2.31993942],
                        [0.98658484],
                        [0.        ]])

print(actual)

expect(numpy.allclose(actual, expected)).to(be_true)
[[0.        ]
 [4.50714306]
 [2.31993942]
 [0.98658484]
 [0.        ]]

SoftMax

The second activation function that you need is softmax. This function is used to calculate the values of the output layer of the neural network, using the following formulas:

\begin{align} \mathbf{z_2} &= \mathbf{W_2}\mathbf{h} + \mathbf{b_2} \tag{3} \\ \mathbf{\hat y} &= \mathrm{softmax}(\mathbf{z_2}) \tag{4} \\ \end{align}

To calculate softmax of a vector \(\mathbf{z}\), the i-th component of the resulting vector is given by:

\[ \textrm{softmax}(\textbf{z})_i = \frac{e^{z_i} }{\sum\limits_{j=1}^{V} e^{z_j} } \tag{5} \]

Let's work through an example.

z = numpy.array([9, 8, 11, 10, 8.5])
print(z)
[ 9.   8.  11.  10.   8.5]

You'll need to calculate the exponentials of each element, both for the numerator and for the denominator.

e_z = numpy.exp(z)

print(e_z)
[ 8103.08392758  2980.95798704 59874.1417152  22026.46579481
  4914.7688403 ]

The denominator is equal to the sum of these exponentials.

sum_e_z = numpy.sum(e_z)
print(f"{sum_e_z:,.2f}")
97,899.42

And the value of the first element of \(\textrm{softmax}(\textbf{z})\) is given by:

print(f"{e_z[0]/sum_e_z:0.4f}")
0.0828

This is for one element. You can use numpy's vectorized operations to calculate the values of all the elements of the \(\textrm{softmax}(\textbf{z})\) vector in one go.

Implement the softmax function.

def softmax(z: numpy.ndarray) -> numpy.ndarray:
    """Calculate Softmax for the input

    Args:
     v: array of values

    Returns:
     array of probabilities
    """
    e_z = numpy.exp(z)
    sum_e_z = numpy.sum(e_z)
    return e_z / sum_e_z

Now check that it works.

actual = softmax([9, 8, 11, 10, 8.5])
print(actual)
expected = numpy.array([0.08276948,
                        0.03044919,
                        0.61158833,
                        0.22499077,
                        0.05020223])

expect(numpy.allclose(actual, expected)).to(be_true)
[0.08276948 0.03044919 0.61158833 0.22499077 0.05020223]

Notice that the sum of all these values is equal to 1.

expect(numpy.sum(softmax([9, 8, 11, 10, 8.5]))).to(equal(1))

Dimensions: 1-D arrays vs 2-D column vectors

Before moving on to implement forward propagation, backpropagation, and gradient descent in the next lecture notebook, let's have a look at the dimensions of the vectors you've been handling until now.

Create a vector of length V filled with zeros.

Define V. Remember this was the size of the vocabulary in the previous lecture notebook

V = 5

Define vector of length V filled with zeros

x_array = numpy.zeros(V)
print(x_array)
[0. 0. 0. 0. 0.]

This is a 1-dimensional array, as revealed by the .shape property of the array.

print(x_array.shape)
(5,)

To perform matrix multiplication in the next steps, you actually need your column vectors to be represented as a matrix with one column. In numpy, this matrix is represented as a 2-dimensional array.

The easiest way to convert a 1D vector to a 2D column matrix is to set its `.shape` property to the number of rows and one column, as shown in the next cell.

# Copy vector
x_column_vector = x_array.copy()

# Reshape copy of vector
x_column_vector.shape = (V, 1)  # alternatively ... = (x_array.shape[0], 1)

# Print vector
print(x_column_vector)
[[0.]
 [0.]
 [0.]
 [0.]
 [0.]]

The shape of the resulting "vector" is:

print(x_column_vector.shape)
(5, 1)

End

Now that we have the basics of the model we can move on to training the model.