Word Embeddings: Build a Model

Introduction

This is and introduction to a series of posts that look at how to create word embeddings using a Continuous Bag Of Words (CBOW) model.

The Continuous Bag Of Words Model (CBOW)

Let's take a look at the following sentence: 'I am happy because I am learning'.

  • In continuous bag of words (CBOW) modeling, we try to predict the center word given a few context words (the words around the center word).
  • For example, if you were to choose a context half-size of say C = 2, then you would try to predict the word happy given the context that includes 2 words before and 2 words after the center word:
    • C words before: [I, am]
    • C words after: [because, I]
  • In other words:
context = [I,am, because, I]
target = happy

The model will be a three-layer one. The input layer (\(\bar x\)) is the average of all the one hot vectors of the context words. There will be one hidden layer, and the output layer (\(hat y\)) will be the softmax layer.

The architecture you will be implementing is as follows:

\begin{align} h &= W_1 \ X + b_1 \tag{1} \\ a &= ReLU(h) \tag{2} \\ z &= W_2 \ a + b_2 \tag{3} \\ \hat y &= softmax(z) \tag{4} \\ \end{align}

The Parts

This is just and introductory post, the following are the posts in the series where things will actually be implemented.