# Trax Hello World

## Beginning

This 'Hello World' takes data created by a simple linear model and trains a neural network to model it. The actual model will take this form:

$y = mx + b$

Where m is the slope and b is the y-intercept. In statistics this is sometimes written using betas:

$y = \beta_0 + \beta_1 x$

And in machine learning it's sometimes written with weights:

$y = w_0 + w_1 x$

Or with a bias and a weight:

$y = bias + w_0 x$

### Imports

#### Trax Imports

Since this is about trax I'll separate out the imports to make it more obvious.

from trax import layers
from trax.supervised import training

import trax


#### Not-Trax

The rest of this is just to support the data creation, plotting, etc.

# python
from collections import namedtuple
from functools import partial
from pathlib import Path
from tempfile import TemporaryFile

import random
import shutil
import sys

# pypi
from holoviews import opts
from sklearn.metrics import r2_score

import holoviews
import hvplot.pandas
import numpy
import pandas
import statsmodels.api as statsmodels

# my stuff
from graeae import EmbedHoloviews


### Set Up

#### The Random Generator

This is the newer (to me) way to generate random numbers with numpy.

random_generator = numpy.random.default_rng(seed=2021)


#### The Plotting

Just some helpers for later on when I do some plotting.

slug = "trax-hello-world"
Embed = partial(EmbedHoloviews, folder_path=f"files/posts/trax/{slug}")

Plot = namedtuple("Plot", ["width", "height", "fontscale", "tan", "blue", "red"])
PLOT = Plot(
width=900,
height=750,
fontscale=2,
tan="#ddb377",
blue="#4687b7",
red="#ce7b6d",
)


## The Linear Regression

### The Data

Trax is pretty invested in using generators, but I also want to be able to plot it, so rather than just generate the data on the fly I'll create a numpy array and then generate the data from it.

#### Sample

See:

def sample(start: float, stop: float, shape: tuple, uniform: bool=True) -> numpy.ndarray:
"""Create a random sample

Args:
start: lowest allowed value
stop: highest allowed value
shape: shape for the final array (just an int for single values)
uniform: use the uniform distribution instead of the standard normal
"""
if uniform:
return (stop - start) * random_generator.random(shape) + start
return (stop - start) * random_generator.standard_normal(shape) + start


#### The Samples

He're I'll make the linear-ish data with some noise added to it.

SAMPLES = 200
X_RANGE = 5
x_values = sample(-X_RANGE, X_RANGE, SAMPLES)
SLOPE = sample(-5, 5, 1)
INTERCEPT = sample(-5, 5, 1)
noise = sample(-2, 2, SAMPLES, uniform=False)
y_values = SLOPE * x_values + INTERCEPT + noise


#### Plotting the Data

data_frame = pandas.DataFrame.from_dict(dict(X=x_values, Y=y_values))
first, last = x_values.min(), x_values.max()
line_frame = pandas.DataFrame.from_dict(
dict(X=[first, last],
Y=[slope * first + intercept,
slope * last + intercept]))
line_plot = line_frame.hvplot(x="X", y="Y", color=PLOT.blue)
data_plot = data_frame.hvplot.scatter(x="X", y="Y", title="Sample Data", color=PLOT.tan)
plot = (data_plot * line_plot).opts(
height=PLOT.height,
width=PLOT.width,
fontscale=PLOT.fontscale
)
output = Embed(plot=plot, file_name="data_sample")()

print(output)


#### Data Generator

This will generate the data for the trax batch generator.

def linear_generator(x: numpy.ndarray, y: numpy.ndarray) -> tuple:
"""Generator of linear data

Args:
x: vector of input data
y: vector of output data

Yields:
(x, y): single instance of x and single instance of y
"""
total = len(x)
assert x.shape == y.shape
index = 0
while True:
yield (numpy.array([x[index]]), numpy.array([y[index]]))
index = index % total
return

generator = linear_generator(x_values, y_values)
print(next(generator))

(array([2.56947828]), array([10.52443023]))


#### The Data Pipeline

data_pipeline = trax.data.Serial(trax.data.Batch(50), trax.data.AddLossWeights(),)
data_stream = data_pipeline(generator)


### The Model

model = layers.Serial(layers.Dense(1))


### Train the Model

We're going to train the model using Stochastic Gradient Descent with L2 Loss as a metric.

#### Set It Up

The online documentation doesn't cover the TrainTask and EvalTask, for some reason.

train_task = training.TrainTask(
labeled_data=data_stream,
loss_layer=layers.L2Loss(),
optimizer=trax.optimizers.SGD(0.01),
n_steps_per_checkpoint=10,
)

labeled_data=data_stream, metrics=[layers.L2Loss()],
n_eval_batches=10,
)


#### Run the Training

I use the TemporaryFile because I can't figure out how to prevent the training loop printing to standard out and making this file way too long.

TRAIN_STEPS = 200
path = Path("~/models/linear_model").expanduser()
if path.exists():
shutil.rmtree(path)

training_loop = training.Loop(
)

real_stdout = sys.stdout
with TemporaryFile("w") as temp_file:
sys.stdout = temp_file
training_loop.run(TRAIN_STEPS)
sys.stdout = real_stdout


### Plotting the Loss

frame = pandas.DataFrame(training_loop.history.get("eval", "metrics/L2Loss"),
columns=["Batch", "L2 Loss"])

minimum = frame.loc[frame["L2 Loss"].idxmin()]
vline = holoviews.VLine(minimum.Batch).opts(opts.VLine(color=PLOT.red))
hline = holoviews.HLine(minimum["L2 Loss"]).opts(opts.HLine(color=PLOT.red))
line = frame.hvplot(x="Batch", y="L2 Loss").opts(opts.Curve(color=PLOT.blue))

plot = (line * hline * vline).opts(
width=PLOT.width, height=PLOT.height,
title="Evaluation Batch L2 Loss",
)
output = Embed(plot=plot, file_name="evaluation_l2_loss")()

print(output)


It looks like it fits pretty quickly.

### Statsmodels Model

As a comparison, I'll fit a statsmodels Ordinary Least Squares. See: statsmodels.regression.linear_model.OLS

x_stats = statsmodels.add_constant(x_values)
ols_model = statsmodels.OLS(y_values, x_stats)
regression = ols_model.fit()

regression_predictions = regression.predict(x_stats)


### Plotting the Model

When we make a prediction, the x-values have to be a matrix, not a vector. So in this case we want one column with all the rows in it which you can get using reshape.

See: numpy.reshape

One way to do this would be te pass in the length of the vector as the number of rows.

x_values.reshape(len(x_values), 1)


But the convention seems to be to use -1 instead of the number of rows.

x_values.reshape(-1, 1)


Which is cleaner, if not as obvious in meaning.

ALL_ROWS, ONE_COLUMN = -1, 1
TWO_DIMENSIONS = (ALL_ROWS, ONE_COLUMN)

predictions = model(x_values.reshape(TWO_DIMENSIONS))
data_frame["Predicted"] = predictions[:, 0]
data_frame["OLS"] = regression_predictions

actual = data_frame.hvplot.scatter(x="X", y="Y", color=PLOT.tan, label="Data")
predicted = data_frame.hvplot.scatter(x="X", y="Predicted", color=PLOT.red, label="Predicted")
line_plot = line_frame.hvplot(x="X", y="Y", color=PLOT.blue, label="Actual")
ols_plot = data_frame.hvplot(x="X", y="OLS", label="OLS")
plot = (actual * predicted * line_plot * ols_plot).opts(
height=PLOT.height,
width=PLOT.width,
fontscale=PLOT.fontscale
)
output = Embed(plot=plot, file_name="predictions")()

print(output)


### Looking at $R^2$

SKLearn has an r2_score function to calculate $R^2$ for us.

print(f"OLS R2: {r2_score(y_values, regression_predictions): 0.3f}")
print(f"Trax R2: {r2_score(y_values, predictions): 0.3f}")

OLS R2:  0.926
Trax R2:  0.724


An $R^2$ of 1 means our model is a strong fit and an $R^2$ of 0 means it doesn't fit at all. It looks like the Neural Network linear model didn't do so great compared to Ordinary Least Squares, although looking at the line I would have guessed that it did even worse.

### Parameters

Let's look at the found parameters.

print("|Model| Slope |y-intercept |")
print("|-+-+-|")
print(f"|Actual| {SLOPE:0.2f}| {INTERCEPT:0.2f}|")
intercept, slope = regression.params
print(f"|OLS| {slope: 0.2f}|{intercept: 0.2f}|")

slope, intercept = model.weights
print(f"| Trax|{float(slope): 0.2f} | {float(intercept):0.2f}|")

Model Slope y-intercept
Actual 4.97 -2.45
OLS 4.91 -4.09
Trax 3.66 1.12

The OLS got the slope pretty close , but not so much the y-intercept, while trax was further off for both.