FastAI Cats and Dogs
Table of Contents
What Is This?
This is a run-through of the fastai Computer Vision Quickstart that shows how to build an image classification model from a public dataset hosted on fastai's site. It is similar to the post on classifying rabbits and pigs except in the other post we create our own dataset by searching duckduckgo for images.
Importing
# python standard library
from pathlib import Path
As noted on Stack Overflow, FastAI does a lot of monkey patching, so if you just import something from where it's defined (to make it clearer where things are coming from) it might not have the methods or attributes you expect. In this case, for instance, the vision_learner
function is defined in fastai.vision.learner
but if you try and import it from there the object you get back won't have the to_fp16
method that we're going to use so you have to import it from fastai.vision.all
instead. Since there's no good way to avoid using all
I'll import objects from there but I'll try and also point to the original modules where things are defined to make it easier to look things up.
Module | Import |
---|---|
fastai.data.external | untar_data, URLs |
fastai.data.transforms | get_image_files |
fastai.metrics | error_rate |
fastai.vision.augment | Resize |
fastai.vision.core | PILImage |
fastai.vision.data | ImageDataLoaders |
fastai.vision.learner | vision_learner |
torchvision.models.resnet | resnet34 |
from fastai.vision.all import (
ImageDataLoaders,
PILImage,
Resize,
URLs,
error_rate,
get_image_files,
resnet34,
untar_data,
vision_learner,
)
Setting Up
This downloads the Oxford-IIIT Pet Dataset. Despite the name, there are only cats and dogs in the dataset (37 breeds across the species).
Function/Object | Description | Documentation Link |
---|---|---|
untar_data |
Function to download fastai datasets/weights | External Data, function arguments |
URLs |
Constants for datasets | A brief description |
By default this will download the data to ~/.fastai/data
but both untar_data
and URLs
(note the s
at the end is lowercase) take an argument c_key
that allows changing this but I don't know what the difference is between using one or the other.
path = untar_data(URLs.PETS)/"images"
print(path)
/home/athena/.fastai/data/oxford-iiit-pet/images
The names of the files give the breed of the pet (either cat or dog) with dog names all in lower case (e.g. "yorkshire_terrire_9.jpg") and cats with the first initials capitalized (e.g. "Abyssinian_100.jpg"). So our function to categorize the training data will check if the first letter is a capital letter and label it True if it is, False if it isn't, using the following function.
def its_a_cat(filename: str) -> bool:
"""Decide if file is a picture of a cat
Args:
filename: name of file where first letter is capitalized if it's a cat
Returns:
True if first letter is capitalized (so it's a picture of a cat)
"""
return filename[0].isupper()
This next bit creates a batch data loader for us.
Object | Description | Documentation |
---|---|---|
ImageDataLoaders |
Data loader with functions for images. | ImageDataLoaders, from_name_func |
get_image_files |
Recursively retrieve images from folders. | docstring |
Resize |
Resize each image (if you pass in one size it uses it for all dimensions). | docstring |
loader = ImageDataLoaders.from_name_func(
path,
get_image_files(path),
valid_pct=0.2,
seed=42,
label_func=its_a_cat,
item_tfms=Resize(224)
)
Now we create the model that learns to detect cats.
Object | Description | Documentation |
---|---|---|
vision_learner |
Builds a model for transfer learning. | Arguments |
resnet34 |
Residual Network model | torchvision documentation |
error_rate |
1 - accuracy (the fraction that was incorrect) | arguments |
to_fp16 |
Use 16-bit (half-precision) floats | Mixed Precision Training Explained |
learner = vision_learner(
loader, resnet34, metrics=error_rate)
cat_model = learner.to_fp16()
Pretty much all of this is inexplicable if you haven't used some kind of neural network library before, but that last call (``to_fp16``) seems especially mysterious. This first part is just about making sure things work, though, so I'll wait until I get to the more detailed explanations to figure it out, although their article "Mixed Precision Training Explained" explains it pretty well.
Train It
We're using a pre-trained model so we just have to do some transfer learning - freezing the weights of most of the layers and training the last layer to make a cat or not a cat classification.
For some reason fastai assumes that you'll only run it in a jupyter notebook and dumps out a progress bar with no simple way to disable it permanently. As a workaround I'll use the context-manager no_bar
to turn off the progress bar temporarily.
Method | Description | Documentation |
---|---|---|
fine_tune |
Does transfer learning (presumably) | None found, but here's the signatures for the freeze and unfreeze methods |
no_bar |
Turn off the progress bar. | docstring |
with cat_model.no_bar():
cat_model.fine_tune(1)
[0, 0.17085878551006317, 0.019044965505599976, 0.005412719678133726, '00:20'] [0, 0.05584857985377312, 0.01942548155784607, 0.0067658997140824795, '00:25']
Fastai really seems to want to force you to use their system the way they do - the output from fine_tune
is printed to standard out and not returned as some kind of object so I can't re-format it to make it nicer looking here (using org-mode), but for reference, the columns for the two rows of output are:
- epoch
- train_loss
- valid_loss
- error_rate
- time
Given these labels, the output of the last block shows that the error rate for the second epoch was 0.005, and it took about twenty and twenty-five seconds per epoch.
Some Test Images
We're going to apply our model to some images of cats and a dog to see what it tells us about the image. Since it's the same process for each image I'll create a function check_image
to handle it.
Object | Description | Documentation |
---|---|---|
PILImage |
Object to represent images. | docstring |
create |
Load the image as PILImage | load_image, PILBase (follow source link to see definition of create ) |
def check_image(path: str) -> None:
"""Loads the image and checks if it's a cat
Args:
path: string with path to the image
"""
POSITIVE, NEGATIVE = " think", " don't think"
image = PILImage.create(Path(path).expanduser())
with cat_model.no_bar():
ees_cat, _, probablilities = cat_model.predict(image)
print(f"I{POSITIVE if ees_cat=='True' else NEGATIVE} this is a cat.")
print(f"The probability that it's a cat is {probablilities[1].item():.2f}")
return
A Cat
Here's our first test image.
As you can see, it appears to be ridden with parasites, causing it to scratch uncontrollably (the toxoplasma isn't visible but assumed) -let's see how our classifier does at guessing that it's a cat.
check_image("~/test-cat.jpg")
I think this is a cat. The probability that it's a cat is 1.00
So, it's pretty sure that this is a cat.
A Negative Test Image
We could try any image, but for now, since the dataset used dogs and cats, let's see if it thinks a dog is a cat.
check_image("~/test-dog.jpg")
I don't think this is a cat. The probability that it's a cat is 0.00
It's sure that this isn't a cat.
A Strange Cat
I tried to find images of cats that looked like dogs or vice-versa, but it turns out that they're pretty different looking things, so let's just try an unusual looking cat.
check_image("~/elf-cat.jpg")
I think this is a cat. The probability that it's a cat is 1.00
The End
So there you go, not really exciting, which I suppose is sort of the point of fastai - it should be simple, almost boring, to do image classification. This is just a rehash of what they did, of course, a better check would be to try something different, but since this is the first take it'll have to do for now.
The top post for the quickstart posts is this one and the next post will be on Image Segmentation.