FastAI Cats and Dogs

What Is This?

This is a run-through of the fastai Computer Vision Quickstart that shows how to build an image classification model from a public dataset hosted on fastai's site. It is similar to the post on classifying rabbits and pigs except in the other post we create our own dataset by searching duckduckgo for images.

Importing

# python standard library
from pathlib import Path

As noted on Stack Overflow, FastAI does a lot of monkey patching, so if you just import something from where it's defined (to make it clearer where things are coming from) it might not have the methods or attributes you expect. In this case, for instance, the vision_learner function is defined in fastai.vision.learner but if you try and import it from there the object you get back won't have the to_fp16 method that we're going to use so you have to import it from fastai.vision.all instead. Since there's no good way to avoid using all I'll import objects from there but I'll try and also point to the original modules where things are defined to make it easier to look things up.

Module Import
fastai.data.external untar_data, URLs
fastai.data.transforms get_image_files
fastai.metrics error_rate
fastai.vision.augment Resize
fastai.vision.core PILImage
fastai.vision.data ImageDataLoaders
fastai.vision.learner vision_learner
torchvision.models.resnet resnet34


from fastai.vision.all import (
    ImageDataLoaders,
    PILImage,
    Resize, 
    URLs,
    error_rate,
    get_image_files,
    resnet34,
    untar_data,
    vision_learner,
)

Setting Up

This downloads the Oxford-IIIT Pet Dataset. Despite the name, there are only cats and dogs in the dataset (37 breeds across the species).

Function/Object Description Documentation Link
untar_data Function to download fastai datasets/weights External Data, function arguments
URLs Constants for datasets A brief description

By default this will download the data to ~/.fastai/data but both untar_data and URLs (note the s at the end is lowercase) take an argument c_key that allows changing this but I don't know what the difference is between using one or the other.

path = untar_data(URLs.PETS)/"images"
print(path)
/home/athena/.fastai/data/oxford-iiit-pet/images

The names of the files give the breed of the pet (either cat or dog) with dog names all in lower case (e.g. "yorkshire_terrire_9.jpg") and cats with the first initials capitalized (e.g. "Abyssinian_100.jpg"). So our function to categorize the training data will check if the first letter is a capital letter and label it True if it is, False if it isn't, using the following function.

def its_a_cat(filename: str) -> bool:
    """Decide if file is a picture of a cat

    Args:
     filename: name of file where first letter is capitalized if it's a cat

    Returns:
     True if first letter is capitalized (so it's a picture of a cat)
    """
    return filename[0].isupper()

This next bit creates a batch data loader for us.

Object Description Documentation
ImageDataLoaders Data loader with functions for images. ImageDataLoaders, from_name_func
get_image_files Recursively retrieve images from folders. docstring
Resize Resize each image (if you pass in one size it uses it for all dimensions). docstring


loader = ImageDataLoaders.from_name_func(
    path,
    get_image_files(path),
    valid_pct=0.2,
    seed=42,
    label_func=its_a_cat,
    item_tfms=Resize(224)
)

Now we create the model that learns to detect cats.

Object Description Documentation
vision_learner Builds a model for transfer learning. Arguments
resnet34 Residual Network model torchvision documentation
error_rate 1 - accuracy (the fraction that was incorrect) arguments
to_fp16 Use 16-bit (half-precision) floats Mixed Precision Training Explained


learner = vision_learner(
    loader, resnet34, metrics=error_rate)

cat_model = learner.to_fp16()

Pretty much all of this is inexplicable if you haven't used some kind of neural network library before, but that last call (``to_fp16``) seems especially mysterious. This first part is just about making sure things work, though, so I'll wait until I get to the more detailed explanations to figure it out, although their article "Mixed Precision Training Explained" explains it pretty well.

Train It

We're using a pre-trained model so we just have to do some transfer learning - freezing the weights of most of the layers and training the last layer to make a cat or not a cat classification.

For some reason fastai assumes that you'll only run it in a jupyter notebook and dumps out a progress bar with no simple way to disable it permanently. As a workaround I'll use the context-manager no_bar to turn off the progress bar temporarily.

Method Description Documentation
fine_tune Does transfer learning (presumably) None found, but here's the signatures for the freeze and unfreeze methods
no_bar Turn off the progress bar. docstring


with cat_model.no_bar():
    cat_model.fine_tune(1)
[0, 0.17085878551006317, 0.019044965505599976, 0.005412719678133726, '00:20']
[0, 0.05584857985377312, 0.01942548155784607, 0.0067658997140824795, '00:25']

Fastai really seems to want to force you to use their system the way they do - the output from fine_tune is printed to standard out and not returned as some kind of object so I can't re-format it to make it nicer looking here (using org-mode), but for reference, the columns for the two rows of output are:

  • epoch
  • train_loss
  • valid_loss
  • error_rate
  • time

Given these labels, the output of the last block shows that the error rate for the second epoch was 0.005, and it took about twenty and twenty-five seconds per epoch.

Some Test Images

We're going to apply our model to some images of cats and a dog to see what it tells us about the image. Since it's the same process for each image I'll create a function check_image to handle it.

Object Description Documentation
PILImage Object to represent images. docstring
create Load the image as PILImage load_image, PILBase (follow source link to see definition of create)


def check_image(path: str) -> None:
    """Loads the image and checks if it's a cat

    Args:
     path: string with path to the image
    """
    POSITIVE, NEGATIVE = " think", " don't think"

    image = PILImage.create(Path(path).expanduser())

    with cat_model.no_bar():
        ees_cat, _, probablilities = cat_model.predict(image)
    print(f"I{POSITIVE if ees_cat=='True' else NEGATIVE} this is a cat.")
    print(f"The probability that it's a cat is {probablilities[1].item():.2f}")
    return

A Cat

Here's our first test image.

test-cat.jpg

As you can see, it appears to be ridden with parasites, causing it to scratch uncontrollably (the toxoplasma isn't visible but assumed) -let's see how our classifier does at guessing that it's a cat.

check_image("~/test-cat.jpg")
I think this is a cat.
The probability that it's a cat is 1.00

So, it's pretty sure that this is a cat.

A Negative Test Image

We could try any image, but for now, since the dataset used dogs and cats, let's see if it thinks a dog is a cat.

test-dog.jpg

check_image("~/test-dog.jpg")
I don't think this is a cat.
The probability that it's a cat is 0.00

It's sure that this isn't a cat.

A Strange Cat

I tried to find images of cats that looked like dogs or vice-versa, but it turns out that they're pretty different looking things, so let's just try an unusual looking cat.

elf-cat.jpg

check_image("~/elf-cat.jpg")
I think this is a cat.
The probability that it's a cat is 1.00

The End

So there you go, not really exciting, which I suppose is sort of the point of fastai - it should be simple, almost boring, to do image classification. This is just a rehash of what they did, of course, a better check would be to try something different, but since this is the first take it'll have to do for now.

The top post for the quickstart posts is this one and the next post will be on Image Segmentation.

Sources

Fast AI

Test Images (Wikimedia commons)