Gradient Descent (Again)
Table of Contents
Some Math
One weight update for gradient descent is calculated as:
\[ \Delta w_i = \eta \delta x_i \]
And the error term \(\delta\) is calculated as:
\begin{align} \delta &= (y - \hat{y}) f'(h)\\ &= (y - \hat{y})f'\left(\sum w_i x_i\right) \end{align}If we are using the sigmoid activation function as \(f(x)\):
\[ \sigma(x) = \frac{1}{1 - e^{-x}} \]
Then its derivative \(f'(x)\) is:
\[ \sigma(x) (1 - \sigma(x)) \]
An Implementation
Imports
import numpy
The Sigmoid
def sigmoid(x): numpy.ndarray -> numpy.ndarray:
"""
Our activation function
Args:
x: the input array
Returns:
the sigmoid of x
"""
return 1/(1 + numpy.exp(-x))
The Sigmoid Derivative
def sigmoid_prime(x: numpy.ndarray) -> numpy.ndarray:
"""
The derivative of the sigmoid
Args:
x: the input
Returns:
the sigmoid derivative of x
"""
return sigmoid(x) * (1 - sigmoid(x))
Setup The Network
learning_rate = 0.5
x = numpy.array([1, 2, 3, 4])
y = numpy.array(0.5)
# Initial weights
w = numpy.array([0.5, -0.5, 0.3, 0.1])
The Network
This will calculate a single gradient descent step.
The Fordward pass
hidden_layer = x.dot(w)
y_hat = sigmoid(hidden_layer)
Backwards Propagation
error = y - y_hat
error_term = error * sigmoid_prime(hidden_layer)
delta_w = learning_rate * error_term * x
print('Neural Network output:')
print(y_hat)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(delta_w)
Neural Network output: 0.6899744811276125 Amount of Error: -0.1899744811276125 Change in Weights: [-0.02031869 -0.04063738 -0.06095608 -0.08127477]