Gradient Descent (Again)

Table of Contents

Some Math

One weight update for gradient descent is calculated as:

\[ \Delta w_i = \eta \delta x_i \]

And the error term \(\delta\) is calculated as:

\begin{align} \delta &= (y - \hat{y}) f'(h)\\ &= (y - \hat{y})f'\left(\sum w_i x_i\right) \end{align}

If we are using the sigmoid activation function as \(f(x)\):

\[ \sigma(x) = \frac{1}{1 - e^{-x}} \]

Then its derivative \(f'(x)\) is:

\[ \sigma(x) (1 - \sigma(x)) \]

An Implementation

Imports

import numpy

The Sigmoid

def sigmoid(x): numpy.ndarray -> numpy.ndarray:
    """
    Our activation function

    Args:
     x: the input array

    Returns:
     the sigmoid of x
    """
    return 1/(1 + numpy.exp(-x))

The Sigmoid Derivative

def sigmoid_prime(x: numpy.ndarray) -> numpy.ndarray:
    """
    The derivative of the sigmoid

    Args:
     x: the input

    Returns:
     the sigmoid derivative of x
    """
    return sigmoid(x) * (1 - sigmoid(x))

Setup The Network

learning_rate = 0.5
x = numpy.array([1, 2, 3, 4])
y = numpy.array(0.5)

# Initial weights
w = numpy.array([0.5, -0.5, 0.3, 0.1])

The Network

This will calculate a single gradient descent step.

The Fordward pass

hidden_layer = x.dot(w)
y_hat = sigmoid(hidden_layer)

Backwards Propagation

error = y - y_hat

error_term = error * sigmoid_prime(hidden_layer)

delta_w = learning_rate * error_term * x

print('Neural Network output:')
print(y_hat)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(delta_w)
Neural Network output:
0.6899744811276125
Amount of Error:
-0.1899744811276125
Change in Weights:
[-0.02031869 -0.04063738 -0.06095608 -0.08127477]