Facial Keypoint Detection: Detect relevant features of face in a go using CNN & your own dataset in Python

Garima Nishad
4 min readMar 23, 2019

Facial key-points are relevant for a variety of tasks, such as face filters, emotion recognition, pose recognition, and so on. So if you’re onto these projects, keep reading!

In this project, facial key-points (also called facial landmarks) are the small magenta dots shown on each of the faces in the image below. In each training and test image, there is a single face and 68 key-points, with coordinates (x, y), for that face.These key-points mark important areas of the face: the eyes, corners of the mouth, the nose, etc.

Magenta dots showing key-points

Dataset used:
We’ll be using YouTube Faces Dataset, which includes videos of people in YouTube videos.This facial key-points dataset consists of 5770 colour images. All of these images are separated into either a training or a test set of data.

  • 3462 of these images are training images, for you to use as you create a model to predict key-points.
  • 2308 are test images, which will be used to test the accuracy of your model.

Now the question arises “ the input images are never of the same size so how would neural network work on it?”

Since neural networks often expect images that are standardized; a fixed size, with a normalized range for color ranges and coordinates, and (for PyTorch) converted from numpy lists and arrays to Tensors. Therefore, we will need to perform some pre-processing.
For this you can :

  1. Normalize: to convert a color image to grayscale values with a range of [0,1] and normalize the keypoints to be in a range of about [-1, 1]
  2. Rescale: to rescale an image to a desired size.
  3. RandomCrop: to crop an image randomly.
  4. ToTensor: to convert numpy images to torch images.
Transform

Now let’s define our own Convolutional Neural Network that can learn from this data !

The steps you need to follow are as follows:

  • Define a CNN with images as input and keypoints as output:
    Input image size is 224*224px (size obtained from tranform earlier) & the output class scores shall be 136 i.e. 136/2 = 68 (our desired 68 keypoints)
CNN Architecture

You can add regularization as per your discretion, but if you still need a hand, here you go-

Droput
  • Construct the transformed FaceKeypointsDataset, just as before
FaceKeypointsDataset
  • Train the CNN on the training data, tracking loss
Loss & Optimization
  • See how the trained model performs on test data

To quickly observe how your model is training and decide on whether or not you should modify it’s structure or hyperparameters, start off with just one or two epochs at first. As you train, note how your the model’s loss behaves over time: does it decrease quickly at first and then slow down?

Use these initial observations to make changes to your model and decide on the best architecture before you train for many epochs and create a final model.

Training

If necessary, modify the CNN structure and model hyper-parameters, so that it performs well.
Once you’ve found a good model, Don’t Forget to save it ! So that you can load and use it later!

After you’ve trained a neural network to detect facial keypoints, you can then apply this network to any image that includes faces.

  • Detect all the faces in an image using a face detector (I have used Haar Cascade detector in this project).

Output:

Face Detection using Haar Cascade
  • Pre-process those face images so that they are gray-scale, and transformed to a Tensor of the input size that your net expects. This step will be similar to the earlier pre-processing.
Grayscale Image
  • Use your trained model to detect facial keypoints on the image.
Detected Keypoints

Wanna check out how to pull this code off in detail ?
Check this project out on my github : Facial Keypoint Detection

--

--

Garima Nishad

A Machine Learning Research scholar who loves to moonlight as a blogger.