FastAI: Picking the Best Model

In the Beginning

In this notebook we'll go over the fastai course lesson 3 - "Which image models are best?". We'll use the benchmarking data from timm, a collection of pyTorch IMage Models to compare how different computer vision models performed using time-per-image and accuracy as our metrics.

Imports and Setup

# from python
from functools import partial
from pathlib import Path

# from pypi
from tabulate import tabulate

import altair
import pandas

# monkey
from graeae.visualization.altair_helpers import output_path, save_chart
TABLE = partial(tabulate, tablefmt="orgtbl", headers=["Column", "Value"] )

SLUG = "fastai-picking-the-best-model"
OUTPUT_PATH = output_path(SLUG)
save_it = partial(save_chart, output_path=OUTPUT_PATH)

The Validation Data

We'll be using data that's part of the git repository for timm . Once you clone the repository the first file within it that we want will be results/results-imagenet.csv. This is the result of using the Imagenet Validation set to validate the models.

RESULTS = Path("~/projects/third-party/"
DATA = RESULTS/"results-imagenet.csv"
validation = pandas.read_csv(DATA)

Column Value
model beit_large_patch16_512
top1 88.602
top1_er r 11.398
top5 98.656
top5_err 1.344
param_count 305.67
img_size 512
crop_pct 1.0
interpolation bicubic

This table shows the first row of the results-imagenet CSV. Each row represents a computer vision model and some information about how it performed during validation. The documentation says that top1 and top5 are "top-1/top-5 differences from clean validation." Which means… what? Looking at the file it appears that top1 and top5 are measures of accuracy. Looking in the module the function accuracy has a docstring that says: Computes the accuracy over the k top predictions for the specified values of k. The top1 and top5 are AverageMeter objects that keep a running average of their accuracies.

This seems straightforward enough, but if you look at that first row the top1 is smaller than the top5 and has a larger error…

Guessing by the name, the model in our row is an instance of "BEIT: BERT Pre-Training of Image Transformers (" found in timm's module.

(668, 9)

The model column is the string you use when creating a model and also refers to a function in one of the pytorch-image-models/timm/models modules. If you want to see how the model in our example row is defined, look in the timm/models/ module for a function named "beit_large_patch16_512". You should find something like this.

def beit_large_patch16_512(pretrained=False, **kwargs):
    model_kwargs = dict(
        img_size=512, patch_size=16, embed_dim=1024, depth=24, num_heads=16, mlp_ratio=4, qkv_bias=True,
        use_abs_pos_emb=False, use_rel_pos_bias=True, init_values=1e-5, **kwargs)
    model = _create_beit('beit_large_patch16_512', pretrained=pretrained, **model_kwargs)
    return model

So we can now see that besides being a BEIT model the name tells us that it used an image size of 512 and a patch size of 16. Further up the file is this configuration:

'beit_large_patch16_512': _cfg(
        input_size=(3, 512, 512), crop_pct=1.0,

Which tells you where the pretrained weights came from.

The Benchmark Data

We're going to merge our "validation" data with two "benchmark" files (also in the "results" folder) doing some cryptic filtering and data wrangling. It's not obvious what everything is doing so let's use it first and maybe figure out most of it later. The main things to note is that we're adding a family column made by taking the first token from the model name (e.g. the model beit_large_patch16_512 gets the family beit), we're adding a secs column by inverting the samples-per-second column, and filtering the models down to a subset that are useful to look at.

BENCHMARK_FILE = ("benchmark-{infer_or_train}"
SAMPLE_RATE = "{infer_or_train}_samples_per_sec"
FAMILY_REGEX = r'^([a-z]+?(?:v2)?)(?:\d|_|$)'
FAMILY_FILTER = r'^re[sg]netd?|beit|convnext|levit|efficient|vit|vgg'

def get_data(infer_or_train: str,
             validation: pandas.DataFrame=validation) -> pandas.DataFrame:
    """Load a benchmark dataframe

     infer_or_train: part of filename with label (infer or train)
     validation: DataFrame created from validation results file

     benchmark data merged with validation

    frame = pandas.read_csv(
        validation, on='model')
    frame['secs'] = 1. / frame[SAMPLE_RATE.format(infer_or_train=infer_or_train)]
    frame['family'] = frame.model.str.extract(FAMILY_REGEX)
    frame = frame[~frame.model.str.endswith('gn')]
    IN_FILTERER = frame.model.str.contains('in22'), "family"
    frame.loc[IN_FILTERER] = frame.loc[IN_FILTERER] + '_in22'

    RESNET_FILTERER = frame.model.str.contains('resnet.*d'),'family'
    frame.loc[RESNET_FILTERER] = frame.loc[RESNET_FILTERER] + 'd'
    return frame[]

Build The Base Chart

The build_chart function is going to help us build the basic chart to compare the merged validation and benchmark values for the models.

SELECTION = altair.selection_multi(fields=["family"], bind="legend")
COLUMNS = ["secs", "top1", "family", "model"]

def build_chart(frame: pandas.DataFrame, infer_or_train: str,
                add_selection: bool=True) -> altair.Chart:
    """Build the basic chart for our benchmarks

     the ``add_selection`` function can only be called once on a chart so to
     add more layers don't add it here, add it later to the end

     frame: benchmark frame to plot
     infer_or_train: which image size column (infer | train)
     add_selection: whether to add the selection at the end
    # altair includes all the data even if it's not used in the plot
    # reducing the dataframe to just the data you need
    # makes the file smaller
    SIZE = f"{infer_or_train}_img_size"
    frame = frame[COLUMNS + [SIZE]]
    chart = altair.Chart(frame).mark_circle().encode(
        x=altair.X("secs", scale=altair.Scale(type="log"),
                   axis=altair.Axis(title="Seconds Per Image (log)")),
                   axis=altair.Axis(title="Imagenet Accuracy")),
                             type="pow", exponent=2)),
        tooltip=[altair.Tooltip("family", title="Architecture Family"),
                 altair.Tooltip("model", title="Model"),
                 altair.Tooltip(SIZE, format=",", title="Image Size"),
                 altair.Tooltip("top1", title="Accuracy"),
                 altair.Tooltip("secs", title="Time (sec)", format=".2e")
    if add_selection:
        chart = chart.encode(opacity=altair.condition(
    return chart

Plot All the Architectures

Our first chart for the benchmarking data will plot all the models left in the data-frame after our filtering and merging to show us how they compare for accuracy and average time to process a sample.

def plot_it(frame: pandas.DataFrame,
            title: str,
            filename: str,
            infer_or_train: str,
            width: int=PLOT_WIDTH,
            height: int=PLOT_HEIGHT) -> None:
    """Make an altair plot of the frame

     frame: benchmark frame to plot
     title: title to give the plot
     filename: name of file to save the chart to
     infer_or_train: which image size column (infer or train)
     width: width of plot in pixels
     height: height of plot in pixels
    chart = build_chart(frame, infer_or_train).properties(

    save_it(chart, filename)

Plot Some of the Architectures

To make it easier to understand, the author of the fastai lesson chose a subset of the families to plot.

  • beit
  • convnext
  • efficientnetv2
  • levit
  • regnetx
  • resnetd
  • vgg

Note: The fastai notebook points out that because of the different sample sizes used to train the models it isn't a simple case of picking the "best" performing model (given a speed vs accuracy trade off). The pytorch-image-models repository has information to help research what went into the training.

FAMILIES = 'levit|resnetd?|regnetx|vgg|convnext.*|efficientnetv2|beit'

def subset_regression(frame: pandas.DataFrame,
                      title: str,
                      filename: str,
                      infer_or_train: str,
                      width: int=PLOT_WIDTH,
                      height: int=PLOT_HEIGHT) -> None:
    """Plot subset of model-families

     frame: frame with benchmark data
     title: title to give the plot
     filename: name to save the file
     infer_or_train: which image size column
     width: width of plot in pixels
     height: height of plot in pixels
    subset = frame[]

    base = build_chart(subset, infer_or_train, add_selection=False)

    line = base.transform_regression(
        "secs", "top1",

    chart = base.encode(

    chart = altair.layer(chart, line).properties(

    save_it(chart, filename)


The first benchmarking data we're going to add is the inference data. Unfortunately I haven't been able to find out what this means, exactly - was this a test of categorizing a test set? It only adds the average sample time to what we're going to plot, which perhaps isn't as interesting as the accuracy anyway.

inference = get_data('infer')
Column Value
model levit_128s
infer_samples_per_sec 21485.8
infer_step_time 47.648
infer_batch_size 1024
infer_img_size 224
param_count_x 7.78
top1 76.514
top1_err 23.486
top5 92.87
top5_err 7.13
param_count_y 7.78
img_size 224
crop_pct 0.9
interpolation bicubic
secs 4.654236751715086e-05
family levit

Let's look at a row of what was added to our original validation data.

added = inference[list(set(inference.columns) - set(validation.columns))].iloc[0]
Column Value
secs 4.654236751715086e-05
family levit
param_count_y 7.78
infer_batch_size 1024
param_count_x 7.78
infer_samples_per_sec 21485.8
infer_step_time 47.648
infer_img_size 224

If you look back at get_data you'll see that we added the sec column which is defined as \(\frac{1}{\textit{samples per second}}\). So it's the averaged(?) seconds per sample. I think.

Let's see how evenly distributed the families are.

counts =
    columns = {"index": "Family", "family": "Count"})

chart = altair.Chart(counts).mark_bar().encode(
    x="Count", y=altair.Y("Family", sort="-x"), tooltip=["Count"],
    title="Inference Family Counts"

save_it(chart, "inference-family-counts")

Figure Missing

There doesn't seem to be an even representation of model families. Let's look at the accuracy vs the speed for the models.

plot_it(inference, title="Inference", 

Figure Missing

While we still don't have an explanation of exactly what we're looking at, in the broadest it's a plot of the time it takes for a model to process an image (in seconds on a logarithmic scale) versus the accuracy when categorizing the Imagenet dataset.

  • The color matches the family in the legend.
  • The size is proportional to the number of seconds it took.
  • Clicking on a family in the legend will highlight it and suppress the other families.
  • Hovering over a circle gives the exact information for that point.

I believe that the accuracy is the best performance for a model, so even though a family might have multiple points in the plot, each model will only have one point to represent its best accuracy and the time it took.

A Subset

To make it easier to see what's going on the author(s) of the fastai lesson paired down the dataset to a subset of families and then added regression lines to compare them.

                  title="Inference Subset",

Figure Missing


training = get_data("train")
plot_it(training, title="Training", 

Figure Missing

                  title="Training Subset",

Figure Missing

Parameters Vs Time

The fastai notebook plots the model parameters vs time (speed), saying that parameters are sometimes used as a proxy for speed and memory use (to make it machine independent, presumably), but then says that it isn't always a good proxy. Once more they give us a tool and then tell us it isn't necessarily what to use.

plotter = inference[["param_count_x", "secs", "infer_img_size", "family", "model", "top1"]]
chart = altair.Chart(plotter).mark_circle().encode(
    x=altair.X("param_count_x", scale=altair.Scale(type="log"),
               axis=altair.Axis(title="Parameters (log)")),
    y=altair.Y("secs", scale=altair.Scale(type="log", zero=False),
               axis=altair.Axis(title="Seconds Per Image (log)")),
    tooltip=[altair.Tooltip("family", title="Architecture Family"),
             altair.Tooltip("model", title="Model"),
             altair.Tooltip("infer_img_size", format=",", title="Image Size"),
             altair.Tooltip("top1", title="Accuracy"),
             altair.Tooltip("secs", title="Time (sec)", format=".2e")
    title="Parameters Vs Time",

save_it(chart, "inference-parameters-vs-time")

Figure Missing

In this case it looks like parameters and speed are correlated, as it takes more time the more parameters there are, but it's confounded by the fact that the models with more parameters seem to be handling bigger images.

Accuracy Vs Size

The fastai

plotter = inference[["param_count_x", "img_size",
                     "family", "model", "secs", "top1"]]
chart = altair.Chart(plotter).mark_circle().encode(
    x=altair.X("img_size", scale=altair.Scale(zero=False),
               axis=altair.Axis(title="Image Size")),
    size=altair.Size("secs", scale=altair.Scale(type="log")),
    tooltip=[altair.Tooltip("family", title="Architecture Family"),
             altair.Tooltip("model", title="Model"),
             altair.Tooltip("img_size", format=",", title="Image Size"),
             altair.Tooltip("top1", title="Accuracy"),
             altair.Tooltip("secs", title="Time (sec)", format=".2e")
    title="Accuracy Vs Image Size",

save_it(chart, "inference-accuracy-vs-size")

Figure Missing
