Building an Object Detection Model with FastAI and IceVision

Sadiva Madaan
6 min readFeb 25, 2022

--

· Installation
· Data Overview
· Dataset Preparation
· Image Transformations
· Training using FastAI
· Inference on a batch of images
· Inference on a Single Image
· Future Work
Reference

I started using IceVision recently as I needed to create an object detection model for the latest Computer Vision Kaggle competition. In this blog , I will explain few important concepts that I found particularly useful as I was getting my feet wet with the IceVision framework.

Installation

Installing IceVision is fairly easy. We can install just by running the following command in the terminal — pip install icevision[all] . For a more detailed information about the installation process, you can check out the IceVision documentation which is pretty straightforward.

Data Overview

We will be using Humpback Whale Fluke Key points data which has 1000 hand annotated whale images. From these key points, we can easily get the bounding box for a whale. Following code , shows how to visualise key points for a particular image.

function to visualise keypoints

Dataset Preparation

We will be using IceVision’s custom parser to prepare our datasets and dataloaders. To know more about the details of custom parser, we can use the following command to get a better and deeper understanding of the working of our custom parser.

IceVision Custom Parser

We will create a custom parser class which will inherit the IceVision’s Parser class. Following is the code for it —

We created WhaleParser class and we’ll pass the template record i.e object detection record and the data directory where all our images are present in it. The WhaleParser class will add bounding boxes and labels to every image. The details for them will be passed through another data frame where all the information such as labels, bounding boxes will be present for every image.

Following image shows a single record. Whale is covered by a bounding box and it is also labeled as Whale.

Image Transformations

IceVision lays the foundation to easily integrate different augmentation libraries by using adapters. It implements an adapter for the popular Albumentations library. Following are the transformations which we’ll do on our images after we label them and get their bounding boxes .

  1. Image Presizing — It is an image augmentation technique that is used to data destruction while maintaining good performance. It adopts two strategies — Resize images to relatively large dimensions that is significantly larger than the target training dimension. In the second step, it composes all of the common augmentation operations (including a resize to the final target size) into one and perform the combined operation on GPU once at the end of processing.
  2. Normalization — It is important to do normalization when we are using pre-trained models (in this case we will be using pre-trained YOLOv5). The pre-trained model knows best to work on the data it has seen. If 0 is the minimum value in it but 0 is the average value in our dataset, then the distribution will be very different.

Following lines apply these transformations to our train and validation datasets.

We can also have a look at some records after applying these transformations.

We have prepared our final dataset. I chose YOLOv5 for training for this particular task but there are tons of models available in the IceVision library. You can find out and experiment with them.

Training using FastAI

One of the major reasons, I like IceVision is because it allows to train deep neural networks with easy-to-use robust high performance libraries such as FastAI. To train the model using FastAI, first we will first find the perfect learning rate using lr finder.

A brief explanation about what lr finder does in FastAI — It starts with a very small learning rate. It uses that for one mini batch and finds what the losses are afterwards and then increase the learning rate by a certain percentage (doubling every time). We keep doing this until the loss gets worse . This is the point where we know we have gone too far. Then we select the learning rate one order of magnitude less than where minimum loss was achieved (minimum divided by 10 ). After training for more than 30 epochs , we get COCOMetric score around 0.84 and following is the plot between no of iterations and the loss for training and validation.

As we can see , the loss continues to decrease for the validation set.

Inference on a batch of images

It is very easy to get predictions on a batch of images. Below are a few predictions. We can see that our model has been pretty spot on most of the times in detecting whales and their bounding boxes.

Inference on a Single Image

IceVision makes it very easy to get a prediction on a single image. Not only it helps to compute prediction but also automatically adjusts predicted boxes to the original image size. The output for a single image prediction is a dictionary which even helps us to access the bounding box coordinates. IMO, this is super useful.

Bounding box coordinates predicted by our model for a single image

Future Work

I will be posting more tutorials and blogs related to Computer Vision and Machine Learning. I will also be exploring IceVision further for tasks related to Image Segmentation.

Reference

  1. https://airctic.com/
  2. https://www.kaggle.com/c/happy-whale-and-dolphin/
  3. https://www.kaggle.com/oewyn000/humpback-whale-fluke-keypoints
  4. https://www.kaggle.com/jprusso/whales-bounding-box

You can check out all the details and code from my github repo -

Connect with me on Linkedin —

--

--

Sadiva Madaan

I write about machine learning. (twitter — @sadiva_madaan)