2010 was a milestone year for face recognition. That’s when Facebook introduced a photo tagging feature with the ability to identify individuals in a photograph by matching faces to the pictures stored in a user’s profile. The feature was popular but frequently inaccurate. Getting the best results required the people in the photograph to look directly into the lens. Accuracy was also dependent on the quality of the user’s Facebook profile picture and other photos they were tagged in. Blurs caused by camera motion, reflective surfaces and light levels all had a negative impact on performance. But it was a start.

Flash forward nine years. Face recognition has been adopted by several industries, most notably in the areas of law enforcement and home / commercial security. Biometric measures such as retinal scans and voice analysis are also useful in security applications, but face identification is still the preferred method.

Other biometric measures require users to physically interact with a device or to voluntarily position themselves next to a sensor. Think of pressing your palm against a reader, speaking directly into a microphone, or staring, unblinking, into a lens while a computer scans your retina. Measurements like these are impractical when it comes to identifying one individual in a large group of people moving through an airport.

Despite the inherent advantages of face recognition, the technology is still in its infancy. Here are four areas where the standard approach has failed to live up to its potential.

The limitations of standard face recognition technology

1) Low accuracy

Camera angles have a strong influence on how successfully a face can be detected and identified. Most of the existing models need to compare multiple angles, including profiles and full-frontal views, to achieve the best results. Facial hair, makeup, scarves, and hats can cause trouble. Ideally, a subject must hold still, remove their eyeglasses and look into the lens or a number of photos have to be taken from different angles. This makes training for face recognition extremely difficult.

2) Compute requirements

Whether it’s analyzing images to run the model or training a new model, traditional recognition algorithms need to run on a robust processor with a neural or GPU accelerator – and they need a persistent, high-bandwidth connection to the cloud. In fact, during training, most face recognition algorithms require multiple photos from thousands of people. Once the parent model is trained, the model still has to be pushed to the cloud or run expensive hardware to work for your specific face. This causes latency and security issues and delivers a poor user experience.

3) Inflexible deployment options

Standard technology requires developers to accommodate the need for fast processors and access to cloud-hosted servers. That rules out deploying face apps in remote areas and on cheap devices. This limits the applications for face identification and forces developers using computer vision apps to make compromises on user experience, responsiveness, accuracy, and data security.

4) High cost

Unsurprisingly, incorporating face recognition capabilities into an existing app often requires a hardware upgrade.

Self-contained deep learning models

At Xnor, we realized that eliminating these restrictions required a completely new approach, so we started at the beginning: the learning models. Our computer vision technology is trained to operate in a range of environmental conditions. The resulting facial signatures can accurately analyze faces in live video streams at more than 30 FPS on GPU-enabled hardware and at 4 FPS on resource-constrained hardware, such as a CPU, regardless of changing lighting conditions, movement or camera angles.

In real life, people don’t stare directly at a lens, without moving or waiting for an algorithm to do its work. People are in motion. Expressions can change several times in the time it takes you to read this paragraph. Faces can be partially obscured by eyeglasses, a scarf, a hat, makeup or even earrings. Our deep learning models ensure accuracy regardless of the subject’s skin tone or fashion sensibilities.

Even better, the training for the individual face can happen completely on-device, with as few as three images. This means you don’t need to take hundreds or thousands of photos of a face or use a large number of frames from a video.  This makes our solution completely edge-enabled. There’s no need to rely on a cloud solution or risk downtime with network and service outages, and most importantly, it makes face identification possible for cheap hardware.

Speed and reliability

Xnor’s apps can detect and identify individual faces in real-time, on-device (at up to 5 frames per second), utilizing a commodity camera, or on embedded hardware running on a processor as small as 1 GHz. In fact, we’re currently running face recognition on an Ambarella S5L commodity chip. Without the need for an internet connection, the real applications for these ML algorithms are enormous. It’s now possible to use advanced face identification features in remote locations, or in situations where maximizing uptime is essential.

Security

Our face recognition algorithms and training models can be run completely on-device, using a low-end processor. Personal information is stored on the device, not transmitted to the cloud for processing, where it can become vulnerable to security breaches. Taken together, these capabilities allow developers to build face identification apps that not only offer increased performance, they go farther in protecting sensitive data.

A new approach yields new capabilities

In addition to enhancing performance, Xnor’s technology allows developers to integrate new capabilities into their applications, such as the ability to determine the subject’s age or gender, which direction they are looking, and whether the subject is happy, angry, scared, sad or surprised. This new technology will create new opportunities for developers to use face recognition in more powerful ways, in more scenarios, and, most importantly, on more devices.

Visit us to learn how to incorporate the next generation of face recognition into a broad range of applications.