Go to homepage
Get a Demo
Get a Demo

POWER READ


Facial Analysis: Separating Facts From Hollywood

Aug 15, 2019 | 17m

Gain Actionable Insights Into:

  • How FA is vastly different from what is portrayed in the movies or even the news
  • Know the scope and limitations of FA systems to build an effective use-case
  • The difference, myths and controversies in facial detection, recognition, and tracking

01

Getting Acquainted With Facial Analysis

We have all seen it in the movies: a camera tens of meters away that can zoom almost infinitely, down to the minute details, allowing operators to locate and track thousands of faces. They can recognise a particular person among thousands of others, and once they do, they can analyse his or her face to know their gender, age, ethnicity, emotions, micro-expressions, personality, whether they’re lying, where they’re looking at or even what they’re about to do next.

But is this how it really works? What is technologically possible today and what is not? What’s Hollywood fiction and what’s reality? And what should we expect in the near future? This is what Facial Analysis, an umbrella term that comes under the general research field of Computer Vision (CV), is all about.

Computer Vision

CV is an interdisciplinary scientific and engineering field that focuses on developing techniques to help computers “see” and understand the content of images and videos. What CV aims to do is to mimic functions of the human vision (not necessarily by copying it) and solve many visual subtasks that all of us effortlessly perform in our everyday lives, such as locating objects, reading text, navigating in a room, recognising and reading faces, and so on.

CV requires an image-capturing device, like a camera, and a processing unit, like a PC’s processor which will analyse image data using complicated algorithms and extract useful information from them. While cameras comprise the majority of inputs in CV, any sensor that can produce images can be part of a CV system. Some examples include radiographic sensors capturing X-rays or inspecting production lines, LIDARs or Time of Flight sensors. Whatever image-capturing device you use, the pipeline is usually the same.

CV is also considered as a part (or even a subset) of AI. In fact, CV has been one of the major drivers of AI advancements in the last few years, especially with the widespread adoption of Deep Learning (DL). DL is a subset of Machine Learning and AI, which has revolutionised many aspects of CV such as object recognition. The breakthrough came in 2012, when a DL model called “AlexNet” won the annual ImageNet object recognition competition by a significant margin, compared to the best models of the previous years. AlexNet was able to recognise thousands of object categories, by analysing millions of images, with an error rate of 15.3% (previous best was ~26.1%!). In sports terms, this is equivalent to crashing Usain Bolt’s world record in the 100-meter race from 9.58sec down to the superhuman 5.62sec! Such was the scientific impact of this, that ever since, DL has been dominating many aspects of CV, among which, Face Analysis.

Facial Analysis (FA) is a series of computational tasks that extracts useful information from images or videos of faces, and is part of the CV field. Most people use the term “Face Recognition” to describe any technologies related to FA. In actuality, Face Recognition is just one part of this series of tasks that FA encompasses.

Face Detection

In order to analyse a face, first you need to locate it. This is called Face Detection. The objective here is to locate any areas of pixels in a photo or video that correspond to faces. Face Detection is not about the identity of a person (i.e. whether this face belongs to Bob, Susan or a wanted terrorist) but just whether this area of pixels is a face or not. The output of Face Detection is usually a “bounding box”, a square or rectangle that marks the pixel area in which a face exists.

By itself, Face Detection can give a lot of insightful information, based on the application you’re targeting. For example, if you want to measure how many people are visiting your store every hour, or count how many people are standing outside your shop window, face detection can help you achieve intelligent estimates. Face Detection is a mature technology if operating specifications are met (see later section on this).

Face Tracking

Face tracking is only applicable to videos; allowing you to track a face from the previous face detection process across successive video frames. You can track multiple detected faces at the same time. This doesn’t mean you know the identity of the person, just that it is the same face that was detected several moments before. Face Tracking is useful when there are multiple faces in a space and you don’t want to re-detect (and perhaps re-count) the same face twice. If you’re counting people in attendance, face tracking is particularly useful.

Face Recognition

Once you have localised a face using face detection, you may need to know the identity of a person. Is this Bob, Susan or a wanted terrorist? For face recognition to work, you’ll need a database of known faces, among which you can find the new unknown detected face. Face recognition is commonly used in automated entrance control in the premises of an office. You have a database of the faces of all employees, and once a new face is detected at the entrance of the office, it is matched with the existing faces of the authorised employees.

Face Verification

Face Verification attempts to answer the question: “given two faces, do they belong to the same person?”. There is no database of known faces involved. That is, while you may conclude that the two faces belong to the same person, you still don’t know who this person is. Have you ever unlocked your mobile phone with your face? That’s face verification in action.

Face Demographics

Once you’ve located a face in an image, there are dedicated algorithms to estimate the age, gender and race of the person, but not their identity. A typical output of such algorithms would look like this: “23 years old, Caucasian female”, or “elderly Asian male”. Usually, face demographics is used in digital signage or retail store analysis to generate aggregated statistics for the average demographic profile of the people who viewed the signage or entered the store.

Facial Emotions or Facial Expression Analysis

Specialised families of CV algorithms called Facial Emotions or Facial Expression Analysis can analyse facial contortions to estimate the emotions portrayed by it.

There are three major approaches when it comes to facial emotion. The first one attempts to detect a fixed number of predefined emotions on the face, such as the seven basic universal prototypical emotions: happy, surprised, afraid, angry, disgusted, sad, or neutral. This is the simplest form of face emotions analysis, and is the most widespread. The second approach aims at estimating Valence (positive or negative, and to what extent) and Arousal (energetic or passive, and to what extent) of a facial expression. The last approach aims to understand which exact muscle groups of the face are activated (called facial action units) within the current facial expression.

Face Gaze and Headpose Analysis

This type of facial analysis estimates the orientation of the face and the eyes, relative to the camera. Specifically, where exactly the person is looking at and where they’re turning. This can be used in advertisements to estimate whether a person is actually looking at it, and count impressions, or to check whether a student is paying attention to the content presented by a teacher.

Other Niche Face Analysis

Other types of face analysis include facial attractiveness (prediction of how attractive a face is), facial skin analysis (estimation of the conditions of the skin at the face), heart rate estimation from the face (simply by analysing minute colour fluctuations on the facial skin in order to estimate blood flow in the face, from a typical face video), drowsiness (estimating whether someone is sleepy or not), personality prediction (by analysing the facial features of a face), sexual orientation prediction (by analysing facial features) and face synthesis (generating synthetic faces that may look indistinguishable from real ones, or fictional faces of non-existing people).

Some of these techniques are rather controversial, such as sexual orientation and personality prediction. However, research has already been published in these areas and as long as there is abundant data to work with, there is a high chance that someone will eventually commercialise these models. The approach of face synthesis has also attracted a lot of negative attention lately, with high publicity regarding “deep fakes”, where a person’s face is transferred on to the body of another, blurring the line between reality and animations. This technique has the potential to disrupt news, since it will become exceedingly difficult to distinguish between a synthetic and a real face on a video.

A Reality Check

Hollywood and the news have created a distorted reality regarding CV and FA. While the news tends to over-emphasise success stories and under-report failures of technology, movies tend to show applications that are simply impossible. Let’s try to bust some of the most typical misconceptions.

Want to continue your read?

To view the full content, sign up for a free account and unlock 3 free podcasts, power reads or videos every month.


Thinkfluencers

Dr. Vasileios Vonikakis

Head of the Technology Innovation Team

Panasonic

View

Tags

Let's Talk Tech