Trax Hello World

Beginning

This 'Hello World' takes data created by a simple linear model and trains a neural network to model it. The actual model will take this form:

\[ y = mx + b \]

Where m is the slope and b is the y-intercept. In statistics this is sometimes written using betas:

\[ y = \beta_0 + \beta_1 x \]

And in machine learning it's sometimes written with weights:

\[ y = w_0 + w_1 x \]

Or with a bias and a weight:

\[ y = bias + w_0 x \]

Imports

Trax Imports

Since this is about trax I'll separate out the imports to make it more obvious.

from trax import layers
from trax.supervised import training

import trax

Not-Trax

The rest of this is just to support the data creation, plotting, etc.

# python
from collections import namedtuple
from functools import partial
from pathlib import Path
from tempfile import TemporaryFile

import random
import shutil
import sys

# pypi
from holoviews import opts
from sklearn.metrics import r2_score

import holoviews
import hvplot.pandas
import numpy
import pandas
import statsmodels.api as statsmodels

# my stuff
from graeae import EmbedHoloviews

Set Up

The Random Generator

This is the newer (to me) way to generate random numbers with numpy.

See: numpy.random.default_rng

random_generator = numpy.random.default_rng(seed=2021)

The Plotting

Just some helpers for later on when I do some plotting.

slug = "trax-hello-world"
Embed = partial(EmbedHoloviews, folder_path=f"files/posts/trax/{slug}")

Plot = namedtuple("Plot", ["width", "height", "fontscale", "tan", "blue", "red"])
PLOT = Plot(
    width=900,
    height=750,
    fontscale=2,
    tan="#ddb377",
    blue="#4687b7",
    red="#ce7b6d",
 )

The Linear Regression

The Data

Trax is pretty invested in using generators, but I also want to be able to plot it, so rather than just generate the data on the fly I'll create a numpy array and then generate the data from it.

Sample

See:

def sample(start: float, stop: float, shape: tuple, uniform: bool=True) -> numpy.ndarray:
    """Create a random sample

    Args:
     start: lowest allowed value
     stop: highest allowed value
     shape: shape for the final array (just an int for single values)
     uniform: use the uniform distribution instead of the standard normal
    """
    if uniform:
        return (stop - start) * random_generator.random(shape) + start
    return (stop - start) * random_generator.standard_normal(shape) + start

The Samples

He're I'll make the linear-ish data with some noise added to it.

SAMPLES = 200
X_RANGE = 5
x_values = sample(-X_RANGE, X_RANGE, SAMPLES)
SLOPE = sample(-5, 5, 1)
INTERCEPT = sample(-5, 5, 1)
noise = sample(-2, 2, SAMPLES, uniform=False)
y_values = SLOPE * x_values + INTERCEPT + noise

Plotting the Data

data_frame = pandas.DataFrame.from_dict(dict(X=x_values, Y=y_values))
first, last = x_values.min(), x_values.max()
line_frame = pandas.DataFrame.from_dict(
    dict(X=[first, last],
         Y=[slope * first + intercept,
            slope * last + intercept]))
line_plot = line_frame.hvplot(x="X", y="Y", color=PLOT.blue)
data_plot = data_frame.hvplot.scatter(x="X", y="Y", title="Sample Data", color=PLOT.tan)
plot = (data_plot * line_plot).opts(
    height=PLOT.height,
    width=PLOT.width,
    fontscale=PLOT.fontscale
)
output = Embed(plot=plot, file_name="data_sample")()
print(output)

Figure Missing

Data Generator

This will generate the data for the trax batch generator.

def linear_generator(x: numpy.ndarray, y: numpy.ndarray) -> tuple:
    """Generator of linear data

    Args:
     x: vector of input data
     y: vector of output data

    Yields:
     (x, y): single instance of x and single instance of y
    """
    total = len(x)
    assert x.shape == y.shape
    index = 0
    while True:
        yield (numpy.array([x[index]]), numpy.array([y[index]]))
        index = index % total
    return
generator = linear_generator(x_values, y_values)
print(next(generator))
(array([2.56947828]), array([10.52443023]))

The Data Pipeline

data_pipeline = trax.data.Serial(trax.data.Batch(50), trax.data.AddLossWeights(),)
data_stream = data_pipeline(generator)

The Model

model = layers.Serial(layers.Dense(1))

Train the Model

We're going to train the model using Stochastic Gradient Descent with L2 Loss as a metric.

Set It Up

The online documentation doesn't cover the TrainTask and EvalTask, for some reason.

train_task = training.TrainTask(
    labeled_data=data_stream,
    loss_layer=layers.L2Loss(),
    optimizer=trax.optimizers.SGD(0.01),
    n_steps_per_checkpoint=10,
)

eval_task = training.EvalTask(
    labeled_data=data_stream, metrics=[layers.L2Loss()],
    n_eval_batches=10,
)

Run the Training

I use the TemporaryFile because I can't figure out how to prevent the training loop printing to standard out and making this file way too long.

TRAIN_STEPS = 200
path = Path("~/models/linear_model").expanduser()
if path.exists():
    shutil.rmtree(path)

training_loop = training.Loop(
    model, train_task, eval_tasks=[eval_task], output_dir=path
)

real_stdout = sys.stdout
with TemporaryFile("w") as temp_file:
    sys.stdout = temp_file
    training_loop.run(TRAIN_STEPS)
    sys.stdout = real_stdout

Plotting the Loss

frame = pandas.DataFrame(training_loop.history.get("eval", "metrics/L2Loss"),
                         columns=["Batch", "L2 Loss"])

minimum = frame.loc[frame["L2 Loss"].idxmin()]
vline = holoviews.VLine(minimum.Batch).opts(opts.VLine(color=PLOT.red))
hline = holoviews.HLine(minimum["L2 Loss"]).opts(opts.HLine(color=PLOT.red))
line = frame.hvplot(x="Batch", y="L2 Loss").opts(opts.Curve(color=PLOT.blue))

plot = (line * hline * vline).opts(
    width=PLOT.width, height=PLOT.height,
    title="Evaluation Batch L2 Loss",
                                   )
output = Embed(plot=plot, file_name="evaluation_l2_loss")()
print(output)

Figure Missing

It looks like it fits pretty quickly.

Statsmodels Model

As a comparison, I'll fit a statsmodels Ordinary Least Squares. See: statsmodels.regression.linear_model.OLS

x_stats = statsmodels.add_constant(x_values)
ols_model = statsmodels.OLS(y_values, x_stats)
regression = ols_model.fit()

regression_predictions = regression.predict(x_stats)

Plotting the Model

When we make a prediction, the x-values have to be a matrix, not a vector. So in this case we want one column with all the rows in it which you can get using reshape.

See: numpy.reshape

One way to do this would be te pass in the length of the vector as the number of rows.

x_values.reshape(len(x_values), 1)

But the convention seems to be to use -1 instead of the number of rows.

x_values.reshape(-1, 1)

Which is cleaner, if not as obvious in meaning.

ALL_ROWS, ONE_COLUMN = -1, 1
TWO_DIMENSIONS = (ALL_ROWS, ONE_COLUMN)

predictions = model(x_values.reshape(TWO_DIMENSIONS))
data_frame["Predicted"] = predictions[:, 0]
data_frame["OLS"] = regression_predictions
actual = data_frame.hvplot.scatter(x="X", y="Y", color=PLOT.tan, label="Data")
predicted = data_frame.hvplot.scatter(x="X", y="Predicted", color=PLOT.red, label="Predicted")
line_plot = line_frame.hvplot(x="X", y="Y", color=PLOT.blue, label="Actual")
ols_plot = data_frame.hvplot(x="X", y="OLS", label="OLS")
plot = (actual * predicted * line_plot * ols_plot).opts(
    height=PLOT.height,
    width=PLOT.width,
    fontscale=PLOT.fontscale
)
output = Embed(plot=plot, file_name="predictions")()
print(output)

Figure Missing

Looking at \(R^2\)

SKLearn has an r2_score function to calculate \(R^2\) for us.

print(f"OLS R2: {r2_score(y_values, regression_predictions): 0.3f}")
print(f"Trax R2: {r2_score(y_values, predictions): 0.3f}")
OLS R2:  0.926
Trax R2:  0.724

An \(R^2\) of 1 means our model is a strong fit and an \(R^2\) of 0 means it doesn't fit at all. It looks like the Neural Network linear model didn't do so great compared to Ordinary Least Squares, although looking at the line I would have guessed that it did even worse.

Parameters

Let's look at the found parameters.

print("|Model| Slope |y-intercept |")
print("|-+-+-|")
print(f"|Actual| {SLOPE[0]:0.2f}| {INTERCEPT[0]:0.2f}|")
intercept, slope = regression.params
print(f"|OLS| {slope: 0.2f}|{intercept: 0.2f}|")

slope, intercept = model.weights[0]
print(f"| Trax|{float(slope): 0.2f} | {float(intercept):0.2f}|")
Model Slope y-intercept
Actual 4.97 -2.45
OLS 4.91 -4.09
Trax 3.66 1.12

The OLS got the slope pretty close , but not so much the y-intercept, while trax was further off for both.

End