Hidden State Activation
Table of Contents
Hidden State Activation
This is the hidden state activation function for a vanilla RNN.
\[ h^{\langle t\rangle}=g(W_{h}[h^{\langle t-1\rangle},x^{\langle t\rangle}] + b_h) \]
Which is another way of writing this:
\[ h^{\langle t\rangle}=g(W_{hh}h^{\langle t-1\rangle} \oplus W_{hx}x^{\langle t\rangle} + b_h) \]
Where
- \(W_{h}\) in the first formula is denotes the horizontal concatenation of \(W_{hh}\) and \(W_{hx}\) from the second formula.
- \(W_{h}\) in the first formula is then multiplied by \([h^{\langle t-1\rangle},x^{\langle t\rangle}]\), another concatenation of parameters from the second formula but this time in a different direction, i.e vertical.
Let us see what this means computationally.
Imports
# from pypi
import numpy
Middle
Joining
Weights: Horizontal Concatenation
A join along the vertical boundary is called a horizontal concatenation or horizontal stack.
Visually, it looks like this:- \(W_h = \left [ W_{hh} \ | \ W_{hx} \right ]\).
We'll look at two different ways to achieve this using numpy.
Note: The values used to populate the arrays, below, have been chosen to aid in visual illustration only. They are NOT what you'd expect to use building a model, which would typically be random variables instead.
First create some dummy data. The numpy.full function creates an array of a given shape that all has the same values. Our first array is almost like numpy.ones
except it uses the dtype of the number you pass in so it will be integers, not floats.
w_hh = numpy.full((3, 2), 1)
w_hx = numpy.full((3, 3), 9)
We could use some random initializations, but it would make it harder to see the joins.
print("-- Data --\n")
print("w_hh :")
print(w_hh)
print("w_hh shape :", w_hh.shape, "\n")
print("w_hx :")
print(w_hx)
print("w_hx shape :", w_hx.shape, "\n")
-- Data -- w_hh : [[1 1] [1 1] [1 1]] w_hh shape : (3, 2) w_hx : [[9 9 9] [9 9 9] [9 9 9]] w_hx shape : (3, 3)
- Option 1: concatenate - horizontal
First we'll use numpy.concatenate.
ROWS, COLUMNS = 0, 1 w_h1 = numpy.concatenate((w_hh, w_hx), axis=COLUMNS) print("option 1 : concatenate\n") print("w_h :") print(w_h1) print("w_h shape :", w_h1.shape, "\n")
option 1 : concatenate w_h : [[1 1 9 9 9] [1 1 9 9 9] [1 1 9 9 9]] w_h shape : (3, 5)
- Option 2: hstack
Now we'll try numpy.hstack.
w_h2 = numpy.hstack((w_hh, w_hx)) print("option 2 : hstack\n") print("w_h :") print(w_h2) print("w_h shape :", w_h2.shape)
option 2 : hstack w_h : [[1 1 9 9 9] [1 1 9 9 9] [1 1 9 9 9]] w_h shape : (3, 5)
As you can see,
hstack
gives you the same thing asconcatenate
along columns,concatenate
also allows you to concatenate along rows and is more general thanhstack
. Althoughhstack
might be more intuitive.
Hidden State & Inputs: Vertical Concatenation
Joining along a horizontal boundary is called a vertical concatenation or vertical stack. Visually it looks like this:
\[ [h^{\langle t-1\rangle},x^{\langle t\rangle}] = \left[ \frac{h^{\langle t-1\rangle}}{x^{\langle t\rangle}} \right] \]
We'll look at two different ways to achieve this using numpy.
First create some more dummy data.
h_t_prev = numpy.full((2, 1), 1)
x_t = numpy.full((3, 1), 9)
print("-- Data --\n")
print("h_t_prev :")
print(h_t_prev)
print("h_t_prev shape :", h_t_prev.shape, "\n")
print("x_t :")
print(x_t)
print("x_t shape :", x_t.shape, "\n")
-- Data -- h_t_prev : [[1] [1]] h_t_prev shape : (2, 1) x_t : [[9] [9] [9]] x_t shape : (3, 1)
Option 1: concatenate - Rows
ax_1 = numpy.concatenate(
(h_t_prev, x_t), axis=ROWS
)
print("option 1 : concatenate\n")
print("ax_1 :")
print(ax_1)
print("ax_1 shape :", ax_1.shape, "\n")
option 1 : concatenate ax_1 : [[1] [1] [9] [9] [9]] ax_1 shape : (5, 1)
Option 2: vstack
vstack is much like hstack
except instead of inserting columns it appends rows, more of what the word stack would seem to suggest.
ax_2 = numpy.vstack((h_t_prev, x_t))
print("option 2 : vstack\n")
print("ax_2 :")
print(ax_2)
print("ax_2 shape :", ax_2.shape)
option 2 : vstack ax_2 : [[1] [1] [9] [9] [9]] ax_2 shape : (5, 1)
Verify Formulas
Now that we know how to do the concatenations, horizontal and vertical, let's verify that the two formulas produce the same result.
- Formula 1: \(h^{\langle t\rangle}=g(W_{h}[h^{\langle t-1\rangle},x^{\langle t\rangle}] + b_h)\)
- Formula 2: \(h^{\langle t\rangle}=g(W_{hh}h^{\langle t-1\rangle} \oplus W_{hx}x^{\langle t\rangle} + b_h)\)
We want to assure ourselves that Formula 1 \(\Leftrightarrow\) Formula 2.
We will initially ignore the bias term \(b_h\) and the activation function g( ) because the transformation will be identical for each formula. So what we really want to compare is the result of the following parameters inside each formula:
\[ W_{h}[h^{\langle t-1\rangle},x^{\langle t\rangle}] \quad \Leftrightarrow \quad W_{hh}h^{\langle t-1\rangle} \oplus W_{hx}x^{\langle t\rangle} \]
We'll see how to do this using matrix multiplication combined with the data and techniques (stacking/concatenating) from above.
The Data
w_hh = numpy.full((3, 2), 1)
w_hx = numpy.full((3, 3), 9)
h_t_prev = numpy.full((2, 1), 1)
x_t = numpy.full((3, 1), 9)
Formula 1
stack_1 = numpy.hstack((w_hh, w_hx))
stack_2 = numpy.vstack((h_t_prev, x_t))
print("\nFormula 1")
print("Term1:\n",stack_1)
print("Term2:\n",stack_2)
formula_1 = numpy.matmul(stack_1,
stack_2)
print("Output:")
print(formula_1)
Formula 1 Term1: [[1 1 9 9 9] [1 1 9 9 9] [1 1 9 9 9]] Term2: [[1] [1] [9] [9] [9]] Output: [[245] [245] [245]]
Formula 2
term_1 = numpy.matmul(w_hh, h_t_prev)
term_2 = numpy.matmul(w_hx, x_t)
print("\nFormula 2")
print("Term1:\n", term_1)
print("Term2:\n", term_2)
formula_2 = term_1 + term_2
print("\nOutput:")
print(formula_2, "\n")
Formula 2 Term1: [[2] [2] [2]] Term2: [[243] [243] [243]] Output: [[245] [245] [245]]
Verification
np.allclose checks that each entry in one array is within a certain tolerance of the corresponding entry in another. For this example we're using integers, so you could probably use all(a == b)
but otherwise, when you have floats, it's better to use allclose
since floats won't always be exact.
print("-- Verify --")
print("Results are the same :", numpy.allclose(formula_1, formula_2))
print(f"Also the same: {all(formula_1==formula_2)}")
-- Verify -- Results are the same : True Also the same: True
Now we'll add a sigmoid activation function and bias term as a final check so we can see how this would work in action.
def sigmoid(x: numpy.ndarray) -> numpy.ndarray:
"""Calculates the sigmoid of x
Args:
x: numpy array or list or float
"""
return 1 / (1 + numpy.exp(-x))
bias = numpy.random.standard_normal((formula_1.shape[0], 1))
print("Formula 1 Output:\n", sigmoid(formula_1 + bias))
print("Formula 2 Output:\n", sigmoid(formula_2 + bias))
assert numpy.allclose(sigmoid(formula_1 + bias), sigmoid(formula_2 + bias))
Formula 1 Output: [[1.] [1.] [1.]] Formula 2 Output: [[1.] [1.] [1.]]