A beginner’s guide to Fastai’s Image Dataloaders
Table Of Contents
· Importing all the necessary files and libraries
· Loading all the images using Fastai Dataloaders
∘ Image Dataloaders From Folder
∘ Image Dataloaders From Name Function
∘ Image Dataloader From Df
· Future Work
· Reference
∘ Connect with me on Linkedin —
I started using PyTorch and Fastai recently . Below I outline key concepts which will be helpful during image processing or in any computer vision problem .
Importing all the necessary files and libraries
The first step will be to import all the necessary files. For the simplicity , I’m using MNIST dataset. Fastai allows us to download the whole dataset in a just few lines of code.
The above code will import all the necessary libraries for our task and the last line will install the full MNIST dataset to our directory.
Loading all the images using Fastai Dataloaders
After downloading all the images, the next step will be to load them . We will take the help of Fastai dataloader for it. After downloading, we’ll have two folders , one for training and the other for testing.
Image Dataloaders From Folder
The first image dataloader we’ll use is the ImageDataloaders.from_folder as we have our images in two separate folders (training and testing).
The above code will load all the images in the Fastai’s dataloader . We don’t need to take care of labeling as the dataloader will do it automatically for us. All the images for different classes were in separate sub folders hence Fastai dataloader automatically assumes the class as the name of the sub folder.
We can see above, Fastai took care of the labeling of the images for us.
Sometimes, we don’t have the images in separate sub folders but in one big training folder , we have all the images. In that case this dataloader won’t work .
Image Dataloaders From Name Function
For using this dataloader, we’ll be using cats and dogs dataset. Again , we can download this dataset to our directory very easily.
In the first line , we’re downloading the pets dataset. The format of the dataset is such that if the name starts with an upper case letter then it is an image of a cat otherwise it is a dog . So we’ve created is_cat function which will return True for all the cat images and False for all the dog images . Finally we’re using ImageDataLoaders.from_name_func and passing the path of the images, label_func = is_cat which will allow Fastai to label our images for training convolutional neural networks .
Some times we’re given path of the images and their labels in a csv file . These two dataloaders won’t work in that case. Fastai has a separate dataloader for it.
Image Dataloader From Df
For using this dataloader we’ll again use MNIST dataset but in a different way. For showcasing how to use this specific dataloader, I’ll first create a dataframe which will have two columns — path and label .
The above few lines of code will create a dataframe for us containing the path and label for an image . We’re just iterating through every sub folder , saving the paths and labels in a list, converting them into a dataframe and concatenating them with the original dataframe . The final dataframe will look something like this —
Now we just have to pass this dataframe to the dataloader and Fastai will take care of training and test sets, resizing the images , augmentations etc.
The first line will call the ImageDataLoaders.from_df in which we’re passing the dataframe which we created. Fastai randomly selects 20% of the images for testing . In the second line we’re printing 5 images from the validation set.
Now we can train or fine tune any CNN model like ResNet, VGG etc very easily on it (which I’ll leave for the readers to implement by themselves).
Future Work
I’ll cover more tutorials related to Fastai and PyTorch in the future.