Maximum Likelihood

What is this about?

We want a way to train our neural network based on the data we have - how do we do this? One way is to use maximum likelihood where we give weights based on the past occurrences for each score.

Yeah, okay, but how do you do this?

First, remember that our probability for any point being 0 or 1 is based on the sigmoid.

\[ \hat{y} = \sigma(Wx+b) \]

Where \(\hat{y}\) is the probability that a point is non-negative.

So we can take the product of the sigmoid of all the points in our data set and find the probability that any point is a 1. If we were to find a model that maximized this probability, we would have a model that separated our categories - this is the Maximum Likelihood Model.

To be more specific, we calculate \(\hat{y}\) for all of our training set points and multiply them to get the total probability (multiplication is an AND operation - \(p(a) \land p(b) \land p(c) = p(a) \times p(b) \time p(c)\)) then we adjust our moder to maximize this probability.

The Problem With Products

Each of our probablilities is less than 1, so the more of them you have, the smaller their product will become. What we want to do is use addition - which is where the logarithm comes in.

\[ p(a) * p(b) * p(c) = \log(a) + \log p(b) + \log p(4) \]

Cross Entropy

Our logarithms give us the values that we want to maximize, but another way to look at is as "we want to minimize the error". We can do this by using the negatives of the logarithms to find the error and trying to minimize their sums.

\[ \textit{cross entropy} = -\log p(a) - \log p(b) - \log p(4) \]

More generally:

\[ \textit{Cross Entropy} = -\sum_{i=1}^m y_i \ln(p_i) + (1 - y_i)\ln(1-p_i) \]

Where y is vector of 1's and 0's. When y is 0, the left term is 0 and when y is 1 the right term is 0 so it works as sort of a conditional to choose which term to use.

Okay, but how do we implement this?

Write a function that takes as input two lists Y, P, and returns the float corresponding to their cross-entropy.

import numpy
def cross_entropy(Y, P):
    """calculates the cross entropy of two lists

    Args:
     Y: lists of 1s and 0s
     P: lists of probabilities that Y is 1
    Returns:
     cross-entropy: the cross entropy of the two lists
    """
    Y = numpy.array(Y)
    not_Y = 1 - Y    
    P = numpy.array(P)
    not_P = 1 - P
    return -(Y * numpy.log(P) + not_Y * numpy.log(not_P)).sum()
Y=[1,0,1,1] 
P=[0.4, 0.6, 0.1, 0.5]
expected =  4.8283137373
entropy = cross_entropy(Y, P)
print(entropy)
assert abs(entropy - expected) < 0.1**5
4.828313737302301