# Trax Hello World

## Table of Contents

## Beginning

This 'Hello World' takes data created by a simple linear model and trains a neural network to model it. The actual model will take this form:

\[ y = mx + b \]

Where *m* is the slope and *b* is the y-intercept. In statistics this is sometimes written using betas:

\[ y = \beta_0 + \beta_1 x \]

And in machine learning it's sometimes written with weights:

\[ y = w_0 + w_1 x \]

Or with a bias and a weight:

\[ y = bias + w_0 x \]

### Imports

#### Trax Imports

Since this is about trax I'll separate out the imports to make it more obvious.

```
from trax import layers
from trax.supervised import training
import trax
```

#### Not-Trax

The rest of this is just to support the data creation, plotting, etc.

```
# python
from collections import namedtuple
from functools import partial
from pathlib import Path
from tempfile import TemporaryFile
import random
import shutil
import sys
# pypi
from holoviews import opts
from sklearn.metrics import r2_score
import holoviews
import hvplot.pandas
import numpy
import pandas
import statsmodels.api as statsmodels
# my stuff
from graeae import EmbedHoloviews
```

### Set Up

#### The Random Generator

This is the newer (to me) way to generate random numbers with numpy.

```
random_generator = numpy.random.default_rng(seed=2021)
```

#### The Plotting

Just some helpers for later on when I do some plotting.

```
slug = "trax-hello-world"
Embed = partial(EmbedHoloviews, folder_path=f"files/posts/trax/{slug}")
Plot = namedtuple("Plot", ["width", "height", "fontscale", "tan", "blue", "red"])
PLOT = Plot(
width=900,
height=750,
fontscale=2,
tan="#ddb377",
blue="#4687b7",
red="#ce7b6d",
)
```

## The Linear Regression

### The Data

Trax is pretty invested in using generators, but I also want to be able to plot it, so rather than just generate the data on the fly I'll create a numpy array and then generate the data from it.

#### Sample

See:

```
def sample(start: float, stop: float, shape: tuple, uniform: bool=True) -> numpy.ndarray:
"""Create a random sample
Args:
start: lowest allowed value
stop: highest allowed value
shape: shape for the final array (just an int for single values)
uniform: use the uniform distribution instead of the standard normal
"""
if uniform:
return (stop - start) * random_generator.random(shape) + start
return (stop - start) * random_generator.standard_normal(shape) + start
```

#### The Samples

He're I'll make the linear-ish data with some noise added to it.

```
SAMPLES = 200
X_RANGE = 5
x_values = sample(-X_RANGE, X_RANGE, SAMPLES)
SLOPE = sample(-5, 5, 1)
INTERCEPT = sample(-5, 5, 1)
noise = sample(-2, 2, SAMPLES, uniform=False)
y_values = SLOPE * x_values + INTERCEPT + noise
```

#### Plotting the Data

```
data_frame = pandas.DataFrame.from_dict(dict(X=x_values, Y=y_values))
first, last = x_values.min(), x_values.max()
line_frame = pandas.DataFrame.from_dict(
dict(X=[first, last],
Y=[slope * first + intercept,
slope * last + intercept]))
line_plot = line_frame.hvplot(x="X", y="Y", color=PLOT.blue)
data_plot = data_frame.hvplot.scatter(x="X", y="Y", title="Sample Data", color=PLOT.tan)
plot = (data_plot * line_plot).opts(
height=PLOT.height,
width=PLOT.width,
fontscale=PLOT.fontscale
)
output = Embed(plot=plot, file_name="data_sample")()
```

```
print(output)
```

#### Data Generator

This will generate the data for the trax batch generator.

```
def linear_generator(x: numpy.ndarray, y: numpy.ndarray) -> tuple:
"""Generator of linear data
Args:
x: vector of input data
y: vector of output data
Yields:
(x, y): single instance of x and single instance of y
"""
total = len(x)
assert x.shape == y.shape
index = 0
while True:
yield (numpy.array([x[index]]), numpy.array([y[index]]))
index = index % total
return
```

```
generator = linear_generator(x_values, y_values)
print(next(generator))
```

(array([2.56947828]), array([10.52443023]))

#### The Data Pipeline

```
data_pipeline = trax.data.Serial(trax.data.Batch(50), trax.data.AddLossWeights(),)
data_stream = data_pipeline(generator)
```

### The Model

```
model = layers.Serial(layers.Dense(1))
```

### Train the Model

We're going to train the model using Stochastic Gradient Descent with L2 Loss as a metric.

#### Set It Up

The online documentation doesn't cover the `TrainTask`

and `EvalTask`

, for some reason.

```
train_task = training.TrainTask(
labeled_data=data_stream,
loss_layer=layers.L2Loss(),
optimizer=trax.optimizers.SGD(0.01),
n_steps_per_checkpoint=10,
)
eval_task = training.EvalTask(
labeled_data=data_stream, metrics=[layers.L2Loss()],
n_eval_batches=10,
)
```

#### Run the Training

I use the `TemporaryFile`

because I can't figure out how to prevent the training loop printing to standard out and making this file way too long.

```
TRAIN_STEPS = 200
path = Path("~/models/linear_model").expanduser()
if path.exists():
shutil.rmtree(path)
training_loop = training.Loop(
model, train_task, eval_tasks=[eval_task], output_dir=path
)
real_stdout = sys.stdout
with TemporaryFile("w") as temp_file:
sys.stdout = temp_file
training_loop.run(TRAIN_STEPS)
sys.stdout = real_stdout
```

### Plotting the Loss

```
frame = pandas.DataFrame(training_loop.history.get("eval", "metrics/L2Loss"),
columns=["Batch", "L2 Loss"])
minimum = frame.loc[frame["L2 Loss"].idxmin()]
vline = holoviews.VLine(minimum.Batch).opts(opts.VLine(color=PLOT.red))
hline = holoviews.HLine(minimum["L2 Loss"]).opts(opts.HLine(color=PLOT.red))
line = frame.hvplot(x="Batch", y="L2 Loss").opts(opts.Curve(color=PLOT.blue))
plot = (line * hline * vline).opts(
width=PLOT.width, height=PLOT.height,
title="Evaluation Batch L2 Loss",
)
output = Embed(plot=plot, file_name="evaluation_l2_loss")()
```

```
print(output)
```

It looks like it fits pretty quickly.

### Statsmodels Model

As a comparison, I'll fit a statsmodels Ordinary Least Squares. See: statsmodels.regression.linear_model.OLS

```
x_stats = statsmodels.add_constant(x_values)
ols_model = statsmodels.OLS(y_values, x_stats)
regression = ols_model.fit()
regression_predictions = regression.predict(x_stats)
```

### Plotting the Model

When we make a prediction, the x-values have to be a matrix, not a vector. So in this case we want one column with all the rows in it which you can get using reshape.

See: numpy.reshape

One way to do this would be te pass in the length of the vector as the number of rows.

x_values.reshape(len(x_values), 1)

But the convention seems to be to use `-1`

instead of the number of rows.

x_values.reshape(-1, 1)

Which is cleaner, if not as obvious in meaning.

```
ALL_ROWS, ONE_COLUMN = -1, 1
TWO_DIMENSIONS = (ALL_ROWS, ONE_COLUMN)
predictions = model(x_values.reshape(TWO_DIMENSIONS))
data_frame["Predicted"] = predictions[:, 0]
data_frame["OLS"] = regression_predictions
```

```
actual = data_frame.hvplot.scatter(x="X", y="Y", color=PLOT.tan, label="Data")
predicted = data_frame.hvplot.scatter(x="X", y="Predicted", color=PLOT.red, label="Predicted")
line_plot = line_frame.hvplot(x="X", y="Y", color=PLOT.blue, label="Actual")
ols_plot = data_frame.hvplot(x="X", y="OLS", label="OLS")
plot = (actual * predicted * line_plot * ols_plot).opts(
height=PLOT.height,
width=PLOT.width,
fontscale=PLOT.fontscale
)
output = Embed(plot=plot, file_name="predictions")()
```

```
print(output)
```

### Looking at \(R^2\)

SKLearn has an `r2_score`

function to calculate \(R^2\) for us.

```
print(f"OLS R2: {r2_score(y_values, regression_predictions): 0.3f}")
print(f"Trax R2: {r2_score(y_values, predictions): 0.3f}")
```

OLS R2: 0.926 Trax R2: 0.724

An \(R^2\) of 1 means our model is a strong fit and an \(R^2\) of 0 means it doesn't fit at all. It looks like the Neural Network linear model didn't do so great compared to Ordinary Least Squares, although looking at the line I would have guessed that it did even worse.

### Parameters

Let's look at the found parameters.

```
print("|Model| Slope |y-intercept |")
print("|-+-+-|")
print(f"|Actual| {SLOPE[0]:0.2f}| {INTERCEPT[0]:0.2f}|")
intercept, slope = regression.params
print(f"|OLS| {slope: 0.2f}|{intercept: 0.2f}|")
slope, intercept = model.weights[0]
print(f"| Trax|{float(slope): 0.2f} | {float(intercept):0.2f}|")
```

Model | Slope | y-intercept |
---|---|---|

Actual | 4.97 | -2.45 |

OLS | 4.91 | -4.09 |

Trax | 3.66 | 1.12 |

The OLS got the slope pretty close , but not so much the y-intercept, while trax was further off for both.

## End

- The trax code was taken from an example notebook in their Github Repository.