FastAI Cats and Dogs

Cloistered Monkey

2022-10-25 17:50

What Is This?

This is a run-through of the fastai Computer Vision Quickstart that shows how to build an image classification model from a public dataset hosted on fastai's site. It is similar to the post on classifying rabbits and pigs except in the other post we create our own dataset by searching duckduckgo for images.

Importing

# python standard library
from pathlib import Path

As noted on Stack Overflow, FastAI does a lot of monkey patching, so if you just import something from where it's defined (to make it clearer where things are coming from) it might not have the methods or attributes you expect. In this case, for instance, the vision_learner function is defined in fastai.vision.learner but if you try and import it from there the object you get back won't have the to_fp16 method that we're going to use so you have to import it from fastai.vision.all instead. Since there's no good way to avoid using all I'll import objects from there but I'll try and also point to the original modules where things are defined to make it easier to look things up.

Module	Import
fastai.data.external	untar_data, URLs
fastai.data.transforms	get_image_files
fastai.metrics	error_rate
fastai.vision.augment	Resize
fastai.vision.core	PILImage
fastai.vision.data	ImageDataLoaders
fastai.vision.learner	vision_learner
torchvision.models.resnet	resnet34

from fastai.vision.all import (
    ImageDataLoaders,
    PILImage,
    Resize, 
    URLs,
    error_rate,
    get_image_files,
    resnet34,
    untar_data,
    vision_learner,
)

Setting Up

This downloads the Oxford-IIIT Pet Dataset. Despite the name, there are only cats and dogs in the dataset (37 breeds across the species).

Function/Object	Description	Documentation Link
`untar_data`	Function to download fastai datasets/weights	External Data, function arguments
`URLs`	Constants for datasets	A brief description

By default this will download the data to ~/.fastai/data but both untar_data and URLs (note the s at the end is lowercase) take an argument c_key that allows changing this but I don't know what the difference is between using one or the other.

path = untar_data(URLs.PETS)/"images"
print(path)

/home/athena/.fastai/data/oxford-iiit-pet/images

The names of the files give the breed of the pet (either cat or dog) with dog names all in lower case (e.g. "yorkshire_terrire_9.jpg") and cats with the first initials capitalized (e.g. "Abyssinian_100.jpg"). So our function to categorize the training data will check if the first letter is a capital letter and label it True if it is, False if it isn't, using the following function.

def its_a_cat(filename: str) -> bool:
    """Decide if file is a picture of a cat

    Args:
     filename: name of file where first letter is capitalized if it's a cat

    Returns:
     True if first letter is capitalized (so it's a picture of a cat)
    """
    return filename[0].isupper()

This next bit creates a batch data loader for us.

Object	Description	Documentation
`ImageDataLoaders`	Data loader with functions for images.	ImageDataLoaders, from_name_func
`get_image_files`	Recursively retrieve images from folders.	docstring
`Resize`	Resize each image (if you pass in one size it uses it for all dimensions).	docstring

loader = ImageDataLoaders.from_name_func(
    path,
    get_image_files(path),
    valid_pct=0.2,
    seed=42,
    label_func=its_a_cat,
    item_tfms=Resize(224)
)

Now we create the model that learns to detect cats.

Object	Description	Documentation
`vision_learner`	Builds a model for transfer learning.	Arguments
`resnet34`	Residual Network model	torchvision documentation
`error_rate`	1 - accuracy (the fraction that was incorrect)	arguments
`to_fp16`	Use 16-bit (half-precision) floats	Mixed Precision Training Explained

learner = vision_learner(
    loader, resnet34, metrics=error_rate)

cat_model = learner.to_fp16()

Pretty much all of this is inexplicable if you haven't used some kind of neural network library before, but that last call (``to_fp16``) seems especially mysterious. This first part is just about making sure things work, though, so I'll wait until I get to the more detailed explanations to figure it out, although their article "Mixed Precision Training Explained" explains it pretty well.

Train It

We're using a pre-trained model so we just have to do some transfer learning - freezing the weights of most of the layers and training the last layer to make a cat or not a cat classification.

For some reason fastai assumes that you'll only run it in a jupyter notebook and dumps out a progress bar with no simple way to disable it permanently. As a workaround I'll use the context-manager no_bar to turn off the progress bar temporarily.

Method	Description	Documentation
`fine_tune`	Does transfer learning (presumably)	None found, but here's the signatures for the freeze and unfreeze methods
`no_bar`	Turn off the progress bar.	docstring

with cat_model.no_bar():
    cat_model.fine_tune(1)

[0, 0.17085878551006317, 0.019044965505599976, 0.005412719678133726, '00:20']
[0, 0.05584857985377312, 0.01942548155784607, 0.0067658997140824795, '00:25']

Fastai really seems to want to force you to use their system the way they do - the output from fine_tune is printed to standard out and not returned as some kind of object so I can't re-format it to make it nicer looking here (using org-mode), but for reference, the columns for the two rows of output are:

epoch
train_loss
valid_loss
error_rate
time

Given these labels, the output of the last block shows that the error rate for the second epoch was 0.005, and it took about twenty and twenty-five seconds per epoch.

Some Test Images

We're going to apply our model to some images of cats and a dog to see what it tells us about the image. Since it's the same process for each image I'll create a function check_image to handle it.

Object	Description	Documentation
`PILImage`	Object to represent images.	docstring
`create`	Load the image as PILImage	load_image, PILBase (follow source link to see definition of `create`)

def check_image(path: str) -> None:
    """Loads the image and checks if it's a cat

    Args:
     path: string with path to the image
    """
    POSITIVE, NEGATIVE = " think", " don't think"

    image = PILImage.create(Path(path).expanduser())

    with cat_model.no_bar():
        ees_cat, _, probablilities = cat_model.predict(image)
    print(f"I{POSITIVE if ees_cat=='True' else NEGATIVE} this is a cat.")
    print(f"The probability that it's a cat is {probablilities[1].item():.2f}")
    return

A Cat

Here's our first test image.

As you can see, it appears to be ridden with parasites, causing it to scratch uncontrollably (the toxoplasma isn't visible but assumed) -let's see how our classifier does at guessing that it's a cat.

check_image("~/test-cat.jpg")

I think this is a cat.
The probability that it's a cat is 1.00

So, it's pretty sure that this is a cat.

A Negative Test Image

We could try any image, but for now, since the dataset used dogs and cats, let's see if it thinks a dog is a cat.

check_image("~/test-dog.jpg")

I don't think this is a cat.
The probability that it's a cat is 0.00

It's sure that this isn't a cat.

A Strange Cat

I tried to find images of cats that looked like dogs or vice-versa, but it turns out that they're pretty different looking things, so let's just try an unusual looking cat.

check_image("~/elf-cat.jpg")

I think this is a cat.
The probability that it's a cat is 1.00

The End

So there you go, not really exciting, which I suppose is sort of the point of fastai - it should be simple, almost boring, to do image classification. This is just a rehash of what they did, of course, a better check would be to try something different, but since this is the first take it'll have to do for now.

The top post for the quickstart posts is this one and the next post will be on Image Segmentation.

Sources

Fast AI

The Quickstart