Trax Hello World
Table of Contents
Beginning
This 'Hello World' takes data created by a simple linear model and trains a neural network to model it. The actual model will take this form:
\[ y = mx + b \]
Where m is the slope and b is the y-intercept. In statistics this is sometimes written using betas:
\[ y = \beta_0 + \beta_1 x \]
And in machine learning it's sometimes written with weights:
\[ y = w_0 + w_1 x \]
Or with a bias and a weight:
\[ y = bias + w_0 x \]
Imports
Trax Imports
Since this is about trax I'll separate out the imports to make it more obvious.
from trax import layers
from trax.supervised import training
import trax
Not-Trax
The rest of this is just to support the data creation, plotting, etc.
# python
from collections import namedtuple
from functools import partial
from pathlib import Path
from tempfile import TemporaryFile
import random
import shutil
import sys
# pypi
from holoviews import opts
from sklearn.metrics import r2_score
import holoviews
import hvplot.pandas
import numpy
import pandas
import statsmodels.api as statsmodels
# my stuff
from graeae import EmbedHoloviews
Set Up
The Random Generator
This is the newer (to me) way to generate random numbers with numpy.
random_generator = numpy.random.default_rng(seed=2021)
The Plotting
Just some helpers for later on when I do some plotting.
slug = "trax-hello-world"
Embed = partial(EmbedHoloviews, folder_path=f"files/posts/trax/{slug}")
Plot = namedtuple("Plot", ["width", "height", "fontscale", "tan", "blue", "red"])
PLOT = Plot(
width=900,
height=750,
fontscale=2,
tan="#ddb377",
blue="#4687b7",
red="#ce7b6d",
)
The Linear Regression
The Data
Trax is pretty invested in using generators, but I also want to be able to plot it, so rather than just generate the data on the fly I'll create a numpy array and then generate the data from it.
Sample
See:
def sample(start: float, stop: float, shape: tuple, uniform: bool=True) -> numpy.ndarray:
"""Create a random sample
Args:
start: lowest allowed value
stop: highest allowed value
shape: shape for the final array (just an int for single values)
uniform: use the uniform distribution instead of the standard normal
"""
if uniform:
return (stop - start) * random_generator.random(shape) + start
return (stop - start) * random_generator.standard_normal(shape) + start
The Samples
He're I'll make the linear-ish data with some noise added to it.
SAMPLES = 200
X_RANGE = 5
x_values = sample(-X_RANGE, X_RANGE, SAMPLES)
SLOPE = sample(-5, 5, 1)
INTERCEPT = sample(-5, 5, 1)
noise = sample(-2, 2, SAMPLES, uniform=False)
y_values = SLOPE * x_values + INTERCEPT + noise
Plotting the Data
data_frame = pandas.DataFrame.from_dict(dict(X=x_values, Y=y_values))
first, last = x_values.min(), x_values.max()
line_frame = pandas.DataFrame.from_dict(
dict(X=[first, last],
Y=[slope * first + intercept,
slope * last + intercept]))
line_plot = line_frame.hvplot(x="X", y="Y", color=PLOT.blue)
data_plot = data_frame.hvplot.scatter(x="X", y="Y", title="Sample Data", color=PLOT.tan)
plot = (data_plot * line_plot).opts(
height=PLOT.height,
width=PLOT.width,
fontscale=PLOT.fontscale
)
output = Embed(plot=plot, file_name="data_sample")()
print(output)
Data Generator
This will generate the data for the trax batch generator.
def linear_generator(x: numpy.ndarray, y: numpy.ndarray) -> tuple:
"""Generator of linear data
Args:
x: vector of input data
y: vector of output data
Yields:
(x, y): single instance of x and single instance of y
"""
total = len(x)
assert x.shape == y.shape
index = 0
while True:
yield (numpy.array([x[index]]), numpy.array([y[index]]))
index = index % total
return
generator = linear_generator(x_values, y_values)
print(next(generator))
(array([2.56947828]), array([10.52443023]))
The Data Pipeline
data_pipeline = trax.data.Serial(trax.data.Batch(50), trax.data.AddLossWeights(),)
data_stream = data_pipeline(generator)
The Model
model = layers.Serial(layers.Dense(1))
Train the Model
We're going to train the model using Stochastic Gradient Descent with L2 Loss as a metric.
Set It Up
The online documentation doesn't cover the TrainTask
and EvalTask
, for some reason.
train_task = training.TrainTask(
labeled_data=data_stream,
loss_layer=layers.L2Loss(),
optimizer=trax.optimizers.SGD(0.01),
n_steps_per_checkpoint=10,
)
eval_task = training.EvalTask(
labeled_data=data_stream, metrics=[layers.L2Loss()],
n_eval_batches=10,
)
Run the Training
I use the TemporaryFile
because I can't figure out how to prevent the training loop printing to standard out and making this file way too long.
TRAIN_STEPS = 200
path = Path("~/models/linear_model").expanduser()
if path.exists():
shutil.rmtree(path)
training_loop = training.Loop(
model, train_task, eval_tasks=[eval_task], output_dir=path
)
real_stdout = sys.stdout
with TemporaryFile("w") as temp_file:
sys.stdout = temp_file
training_loop.run(TRAIN_STEPS)
sys.stdout = real_stdout
Plotting the Loss
frame = pandas.DataFrame(training_loop.history.get("eval", "metrics/L2Loss"),
columns=["Batch", "L2 Loss"])
minimum = frame.loc[frame["L2 Loss"].idxmin()]
vline = holoviews.VLine(minimum.Batch).opts(opts.VLine(color=PLOT.red))
hline = holoviews.HLine(minimum["L2 Loss"]).opts(opts.HLine(color=PLOT.red))
line = frame.hvplot(x="Batch", y="L2 Loss").opts(opts.Curve(color=PLOT.blue))
plot = (line * hline * vline).opts(
width=PLOT.width, height=PLOT.height,
title="Evaluation Batch L2 Loss",
)
output = Embed(plot=plot, file_name="evaluation_l2_loss")()
print(output)
It looks like it fits pretty quickly.
Statsmodels Model
As a comparison, I'll fit a statsmodels Ordinary Least Squares. See: statsmodels.regression.linear_model.OLS
x_stats = statsmodels.add_constant(x_values)
ols_model = statsmodels.OLS(y_values, x_stats)
regression = ols_model.fit()
regression_predictions = regression.predict(x_stats)
Plotting the Model
When we make a prediction, the x-values have to be a matrix, not a vector. So in this case we want one column with all the rows in it which you can get using reshape.
See: numpy.reshape
One way to do this would be te pass in the length of the vector as the number of rows.
x_values.reshape(len(x_values), 1)
But the convention seems to be to use -1
instead of the number of rows.
x_values.reshape(-1, 1)
Which is cleaner, if not as obvious in meaning.
ALL_ROWS, ONE_COLUMN = -1, 1
TWO_DIMENSIONS = (ALL_ROWS, ONE_COLUMN)
predictions = model(x_values.reshape(TWO_DIMENSIONS))
data_frame["Predicted"] = predictions[:, 0]
data_frame["OLS"] = regression_predictions
actual = data_frame.hvplot.scatter(x="X", y="Y", color=PLOT.tan, label="Data")
predicted = data_frame.hvplot.scatter(x="X", y="Predicted", color=PLOT.red, label="Predicted")
line_plot = line_frame.hvplot(x="X", y="Y", color=PLOT.blue, label="Actual")
ols_plot = data_frame.hvplot(x="X", y="OLS", label="OLS")
plot = (actual * predicted * line_plot * ols_plot).opts(
height=PLOT.height,
width=PLOT.width,
fontscale=PLOT.fontscale
)
output = Embed(plot=plot, file_name="predictions")()
print(output)
Looking at \(R^2\)
SKLearn has an r2_score
function to calculate \(R^2\) for us.
print(f"OLS R2: {r2_score(y_values, regression_predictions): 0.3f}")
print(f"Trax R2: {r2_score(y_values, predictions): 0.3f}")
OLS R2: 0.926 Trax R2: 0.724
An \(R^2\) of 1 means our model is a strong fit and an \(R^2\) of 0 means it doesn't fit at all. It looks like the Neural Network linear model didn't do so great compared to Ordinary Least Squares, although looking at the line I would have guessed that it did even worse.
Parameters
Let's look at the found parameters.
print("|Model| Slope |y-intercept |")
print("|-+-+-|")
print(f"|Actual| {SLOPE[0]:0.2f}| {INTERCEPT[0]:0.2f}|")
intercept, slope = regression.params
print(f"|OLS| {slope: 0.2f}|{intercept: 0.2f}|")
slope, intercept = model.weights[0]
print(f"| Trax|{float(slope): 0.2f} | {float(intercept):0.2f}|")
Model | Slope | y-intercept |
---|---|---|
Actual | 4.97 | -2.45 |
OLS | 4.91 | -4.09 |
Trax | 3.66 | 1.12 |
The OLS got the slope pretty close , but not so much the y-intercept, while trax was further off for both.
End
- The trax code was taken from an example notebook in their Github Repository.