Preview Mode: Access 20% of each content piece.
POWER READ
We have all seen it in the movies: a camera tens of meters away that can zoom almost infinitely, down to the minute details, allowing operators to locate and track thousands of faces. They can recognise a particular person among thousands of others, and once they do, they can analyse his or her face to know their gender, age, ethnicity, emotions, micro-expressions, personality, whether they’re lying, where they’re looking at or even what they’re about to do next.
But is this how it really works? What is technologically possible today and what is not? What’s Hollywood fiction and what’s reality? And what should we expect in the near future? This is what Facial Analysis, an umbrella term that comes under the general research field of Computer Vision (CV), is all about.
CV is an interdisciplinary scientific and engineering field that focuses on developing techniques to help computers “see” and understand the content of images and videos. What CV aims to do is to mimic functions of the human vision (not necessarily by copying it) and solve many visual subtasks that all of us effortlessly perform in our everyday lives, such as locating objects, reading text, navigating in a room, recognising and reading faces, and so on.
CV requires an image-capturing device, like a camera, and a processing unit, like a PC’s processor which will analyse image data using complicated algorithms and extract useful information from them. While cameras comprise the majority of inputs in CV, any sensor that can produce images can be part of a CV system. Some examples include radiographic sensors capturing X-rays or inspecting production lines, LIDARs or Time of Flight sensors. Whatever image-capturing device you use, the pipeline is usually the same.
CV is also considered as a part (or even a subset) of AI. In fact, CV has been one of the major drivers of AI advancements in the last few years, especially with the widespread adoption of Deep Learning (DL). DL is a subset of Machine Learning and AI, which has revolutionised many aspects of CV such as object recognition. The breakthrough came in 2012, when a DL model called “AlexNet” won the annual ImageNet object recognition competition by a significant margin, compared to the best models of the previous years. AlexNet was able to recognise thousands of object categories, by analysing millions of images, with an error rate of 15.3% (previous best was ~26.1%!). In sports terms, this is equivalent to crashing Usain Bolt’s world record in the 100-meter race from 9.58sec down to the superhuman 5.62sec! Such was the scientific impact of this, that ever since, DL has been dominating many aspects of CV, among which, Face Analysis.
Facial Analysis (FA) is a series of computational tasks that extracts useful information from images or videos of faces, and is part of the CV field. Most people use the term “Face Recognition” to describe any technologies related to FA. In actuality, Face Recognition is just one part of this series of tasks that FA encompasses.
In order to analyse a face, first you need to locate it. This is called Face Detection. The objective here is to locate any areas of pixels in a photo or video that correspond to faces. Face Detection is not about the identity of a person (i.e. whether this face belongs to Bob, Susan or a wanted terrorist) but just whether this area of pixels is a face or not. The output of Face Detection is usually a “bounding box”, a square or rectangle that marks the pixel area in which a face exists.
By itself, Face Detection can give a lot of insightful information, based on the application you’re targeting. For example, if you want to measure how many people are visiting your store every hour, or count how many people are standing outside your shop window, face detection can help you achieve intelligent estimates. Face Detection is a mature technology if operating specifications are met (see later section on this).
Get full access FREE for 30 days