Posts

I’ve loved cars since I was a little boy. From classic cars to custom hot rods, I loved them all, but I was especially fascinated by the futuristic vehicles featured on TV. Depending on which generation you identify with, you might remember Kitt from Knight Rider, the Batmobile, or the nameless Delorean from Back to the Future. Not only were these cars fast, they could think, talk and sometimes even see.

AI has given us the first generation of autonomous cars — and it’s pretty impressive. But there is a host of next generation of AI-enhanced features that go even further in providing convenience and ensuring passenger safety.

Auto-evolution: AI at the edge for cars

Xnor is focused on bringing computer vision to edge devices, so our technology is particularly valuable for automobiles and commercial vehicles. Every AI capability we offer – whether it involves people, object or face recognition – delivers a degree of speed and accuracy that until recently, was only possible using a high-end processor augmented by a neural accelerator. We take that same level of performance, improve upon it, and make it available on an edge device, such as a 1 GHz ARM processor or a simple onboard computer.

Check out this demo of our computer vision technology:

Object detection capabilities

Crime prevention

For car sharing companies or taxis, the system can enforce security regulations by recognizing when passengers hold weapons or other objects that present a safety hazard.

Loss prevention

Using object detection, the system can remind a passenger to retrieve the phone or purse they left on the seat. Transportation and logistics companies could receive an alert if a package was not delivered at the end of a route.

Face recognition capabilities

Here are a few of the capabilities that can be incorporated into a line of vehicles using Xnor’s face recognition or action detection models.

Secure access

Using face recognition, a driver can be authenticated even before they enter a vehicle. The door could automatically open for people recognized by the car, making hands-free entry possible. Our technology would even allow the car to differentiate between children and adults. Commercial vehicles could use that information to control access to certain areas by authorizing drivers.

Because all of this is done on-device, the data doesn’t need to be transmitted to the cloud, making it significantly more secure and practical of a feature.

Personalization

Once a driver or passenger is authenticated, the car could adjust settings to align with personal preferences, such as the position of the seat and steering column, interior temperature and infotainment system settings.

Driver awareness

ML-powered driver monitoring can tell when a driver is looking at a phone, instead of the road ahead. And if the driver becomes drowsy and their eyelids start to close, the system will know that too.

Emergency response

In the event of a crash or another emergency, the system can generate a passenger list, and notify someone if the driver does not respond to an audible alarm.

Passenger safety

Action detection models can be trained to detect specific gestures like fastening a seatbelt to ensure that everyone is buckled in.

Person and pet detection models can identify if a pet is left inside a car (a potentially dangerous situation on a hot day) or if an infant or small child is left behind, and then sound an alarm to notify the driver.

AI at the edge drives automotive innovation

Without recent advances in deep learning for computer vision, many of these features would be too difficult or expensive to implement.

Xnor’s AI technology is unique in that it delivers state-of-the-art performance on a commodity processor, using only the bare minimum for energy and memory.

Even with a simple onboard computer, Xnor models execute at up to 10x faster than conventional solutions – while using up to 15x less memory and 30x less energy.

Taken together, all these capabilities make it both practical and profitable for automobile manufacturers to incorporate high-performance computer vision into a variety of applications for the commercial and consumer vehicle markets.

At Xnor, we’re fascinated by the creative and powerful ways our customers are working to incorporate machine learning into their line of cars and commercial vehicles. It’s not as cool as owning one of the super-smart, fast-talking exotic cars that my TV heroes used to drive, but it comes pretty close.

Read more about how you can incorporate the latest in computer vision into your line of vehicles.

Search for the term “the future of retailing” and you’ll see plenty of stories about physical retailers being marginalized by their dot-com counterparts. Some would say that physical stores are fading from the retail landscape. Quaint, but doomed. To understand why consider the shopping experiences offered by each channel.

Online vs. Offline

For example, while checking the number of followers in their Instagram account, your future customer sees an image of their favorite celeb wearing shoes that they simply must have. Other distractions intervene, but after seeing several banner ads they finally click, swipe or tap their way to an online store. Thanks to cookies and ad tracking, the site already knows a great deal about the customer, from their purchase history down to their shoe size. The customer browses for products, reads reviews and compares items. With each click, the store knows a little bit more.

As the customer moves through the site, the convenience, selection and price advantage of shopping online becomes obvious. When they make a purchase, the customer can be rewarded for their loyalty with a coupon code, and the inventory system knows which item to reorder.

On the other hand, a retail store doesn’t know who you are the moment you walk in the door. They don’t know if you’ve bought from them – or from any of their competitors – before. They have no idea what color you like, or what shoe size you wear. Traditional retailers rely heavily on in-store displays or staff to guide customers through the store.

Now replay that scenario – but with one difference. This time it’s a physical store equipped with the latest generation in AI. Small cameras placed throughout the store use computer vision to provide an advanced level of retail analytics, possibly even better than what is available to online stores, while also creating a better experience for shoppers.

The Customer Journey in an AI-enabled Store

In this new scenario, a face recognition algorithm identifies customers and their demographics as they walk through the front door. Maybe this individual is a regular shopper and a member of your loyalty program. Based on their purchase history, you can send them a notification while they are in your store about new offerings that may be enticing to them.

As they move through the aisles, multiple cameras recognize that customer as the same person and track them throughout the store. Do the endcap displays attract their attention? Where do they stop and spend time? Does the location of a preferred product impact what else they buy nearby? Once your customers are at the check-out counter, payment can be as simple as a quick scan of their face.

On a larger scale, this data can be used to develop in-depth, real-time heatmaps without having to lift a finger. The information can also be bolstered with other AI capabilities such as emotion detection and action recognition in order to build highly detailed customer insights. Your customers and their paths through the store are now actionable data for your business, opening up a vast number of opportunities.

Security and Store Operations

The analytics you collect on the floor will impact your customers and their experiences, but there’s a slew of potential opportunities behind the scenes that can streamline operations for your business.

Surveillance and access control are important in-store functions for avoiding crime and unauthorized activity. Using Xnor’s AI capabilities, security can be enhanced with features like weapon or dangerous action detection. Secure areas can be better controlled with computer vision solutions like face recognition and person detection to make sure only the right people have access to restricted areas.

Another particularly valuable function is inventory management. Knowing when items are out of stock on the shelves helps to restock more efficiently. Creating efficient, real-time solutions for monitoring items also helps to keep vendors up-to-date on their products within your store as well as how they are performing. This can also be tied to traffic patterns so you can understand how often people are interacting with different products.

Gaining a competitive advantage

Many see the future of retail as being fully automated, but that shift won’t happen overnight. Retailers are beginning to introduce these capabilities piece by piece in order to stay ahead without having to completely overhaul operations. By incorporating AI solutions developed by Xnor, your store will avoid the headaches of conventional AI solutions. Xnor models can run on commodity devices, so you don’t need to upgrade your cameras or pay for expensive cloud-computing services (which are less secure). Running on-device also reduces latency and power consumption so your solutions will pick up that power-walker even on a battery-powered camera that you can place anywhere.

With Xnor’s computer vision models, physical stores can have the retail analytics they need to compete with their online counterparts – and help a loyal customer to find the perfect pair of shoes.

Visit Xnor to learn how the next generation in AI can help your retail store compete.

Mention Smart Appliance, and most people think of using a smartphone to turn on house lights as they pull in the driveway, arm security systems, control thermostats, or check if Amazon left a package on the front porch. Initially, that level of functionality was impressive. But so far, the value associated with Smart Appliances has been centered around heightened security and managing your home from a remote location.

It’s time Smart Appliances got an upgrade.

Smart Appliances V1

The first iterations of Smart Appliances were hampered by technical limitations. In some cases, the only smart thing about the earliest versions was touch screen interfaces, Bluetooth connectivity and the option to use a mobile device to control the appliance. Advanced features like food detection, if it was used at all, was constrained by the limitations inherent in AI technology at that time. One of those factors was the processing power needed to run an AI application. AI apps that could recognize and identify specific varieties of food required a robust processor with a neural or GPU accelerator, as well as an ample power source. Incorporating a power-hungry processor into the design of an energy-efficient appliance wasn’t practical. It also required a persistent, high bandwidth connection to the cloud. The resulting latency could delay system response to user input and create a poor customer experience. At any rate, aside from the onerous compute requirement, food detection models were still in their infancy. They were often inconsistent, and it was difficult to train them to identify new items.

The new generation of food identification technology promises to break through those barriers. With highly efficient algorithms, AI apps can be run on a small embedded device inside the appliance, without a persistent, high-bandwidth, internet connection.

Here are a few ways AI on the Edge can make a Smart refrigerator a little smarter:

  • Add items to a shopping list when they need to be replenished
  • Suggest a recipe based on the items you already have in your refrigerator
  • Make grocery shoppers faster and more informed
  • Make recommendations for how best to store certain produce
  • Provide cooking tips for certain foods
  • Detect when there’s a spill inside

With this kind of upgrade, homeowners can use the new generation of Smart Appliances to reduce their monthly grocery bill, reduce waste, and save time at the grocery store.

Compact, efficient algorithms are the brains behind smart appliances

With Xnor’s efficient, on-device computer vision models, smart appliances are now becoming a reality. Xnor’s food identification models offer appliance manufacturers some specific advantages over conventional AI solutions:

Improved performance

The new generation of food identification technology brings AI to edge devices, so there’s no need for internet connectivity. When Smart Appliances aren’t tethered to an internet connection, they are more responsive. Plus, there’s no risk of downtime due to a network or service outage. That translates into a better experience for consumers.

Improved accuracy

Even an item as ubiquitous as a Granny Smith apple comes in a variety of shades, sizes, and shapes. Our highly efficient training models deliver substantially higher accuracy, making it possible to visually identify food items in less than ideal lighting conditions, even if they are partially obscured.

Reduced energy use

Keeping energy consumption to a minimum is a top priority for appliance manufacturers. Xnor’s food detection models have been shown to be up to 30x more energy efficient than conventional AI technology.

Lower costs

Without the need for fast, power-hungry processors, the cost of introducing these features comes way down. Combined with low energy use and internet-free, on-device computing, its now possible to incorporate advanced food detection capabilities into a range of products at multiple price points.

Bon-Appetit

There’s a multitude of tasks involved in preparing a meal. By going beyond preserving and cooking food, refrigerators will begin to behave less like an appliance, and more like a virtual sous-chef. As a company that’s invested a significant amount of research in this area, we’d like to say, “Bon-Appetit!”

Visit us to learn how the next generation in food detection technology can boost the performance of your Smart Appliance.

 

2010 was a milestone year for face recognition. That’s when Facebook introduced a photo tagging feature with the ability to identify individuals in a photograph by matching faces to the pictures stored in a user’s profile. The feature was popular but frequently inaccurate. Getting the best results required the people in the photograph to look directly into the lens. Accuracy was also dependent on the quality of the user’s Facebook profile picture and other photos they were tagged in. Blurs caused by camera motion, reflective surfaces and light levels all had a negative impact on performance. But it was a start.

Flash forward nine years. Face recognition has been adopted by several industries, most notably in the areas of law enforcement and home / commercial security. Biometric measures such as retinal scans and voice analysis are also useful in security applications, but face identification is still the preferred method.

Other biometric measures require users to physically interact with a device or to voluntarily position themselves next to a sensor. Think of pressing your palm against a reader, speaking directly into a microphone, or staring, unblinking, into a lens while a computer scans your retina. Measurements like these are impractical when it comes to identifying one individual in a large group of people moving through an airport.

Despite the inherent advantages of face recognition, the technology is still in its infancy. Here are four areas where the standard approach has failed to live up to its potential.

The limitations of standard face recognition technology

1) Low accuracy

Camera angles have a strong influence on how successfully a face can be detected and identified. Most of the existing models need to compare multiple angles, including profiles and full-frontal views, to achieve the best results. Facial hair, makeup, scarves, and hats can cause trouble. Ideally, a subject must hold still, remove their eyeglasses and look into the lens or a number of photos have to be taken from different angles. This makes training for face recognition extremely difficult.

2) Compute requirements

Whether it’s analyzing images to run the model or training a new model, traditional recognition algorithms need to run on a robust processor with a neural or GPU accelerator – and they need a persistent, high-bandwidth connection to the cloud. In fact, during training, most face recognition algorithms require multiple photos from thousands of people. Once the parent model is trained, the model still has to be pushed to the cloud or run expensive hardware to work for your specific face. This causes latency and security issues and delivers a poor user experience.

3) Inflexible deployment options

Standard technology requires developers to accommodate the need for fast processors and access to cloud-hosted servers. That rules out deploying face apps in remote areas and on cheap devices. This limits the applications for face identification and forces developers using computer vision apps to make compromises on user experience, responsiveness, accuracy, and data security.

4) High cost

Unsurprisingly, incorporating face recognition capabilities into an existing app often requires a hardware upgrade.

Self-contained deep learning models

At Xnor, we realized that eliminating these restrictions required a completely new approach, so we started at the beginning: the learning models. Our computer vision technology is trained to operate in a range of environmental conditions. The resulting facial signatures can accurately analyze faces in live video streams at more than 30 FPS on GPU-enabled hardware and at 4 FPS on resource-constrained hardware, such as a CPU, regardless of changing lighting conditions, movement or camera angles.

In real life, people don’t stare directly at a lens, without moving or waiting for an algorithm to do its work. People are in motion. Expressions can change several times in the time it takes you to read this paragraph. Faces can be partially obscured by eyeglasses, a scarf, a hat, makeup or even earrings. Our deep learning models ensure accuracy regardless of the subject’s skin tone or fashion sensibilities.

Even better, the training for the individual face can happen completely on-device, with as few as three images. This means you don’t need to take hundreds or thousands of photos of a face or use a large number of frames from a video.  This makes our solution completely edge-enabled. There’s no need to rely on a cloud solution or risk downtime with network and service outages, and most importantly, it makes face identification possible for cheap hardware.

Speed and reliability

Xnor’s apps can detect and identify individual faces in real-time, on-device (at up to 5 frames per second), utilizing a commodity camera, or on embedded hardware running on a processor as small as 1 GHz. In fact, we’re currently running face recognition on an Ambarella S5L commodity chip. Without the need for an internet connection, the real applications for these ML algorithms are enormous. It’s now possible to use advanced face identification features in remote locations, or in situations where maximizing uptime is essential.

Security

Our face recognition algorithms and training models can be run completely on-device, using a low-end processor. Personal information is stored on the device, not transmitted to the cloud for processing, where it can become vulnerable to security breaches. Taken together, these capabilities allow developers to build face identification apps that not only offer increased performance, they go farther in protecting sensitive data.

A new approach yields new capabilities

In addition to enhancing performance, Xnor’s technology allows developers to integrate new capabilities into their applications, such as the ability to determine the subject’s age or gender, which direction they are looking, and whether the subject is happy, angry, scared, sad or surprised. This new technology will create new opportunities for developers to use face recognition in more powerful ways, in more scenarios, and, most importantly, on more devices.

Visit us to learn how to incorporate the next generation of face recognition into a broad range of applications.

Machine vision has long been the holy grail to unlocking a number of real-world use cases for AI – think of home automation and security, autonomous vehicles, crop picking robots, retail analytics, delivery drones or real time threat detection. However, until recently, AI models for computer vision have been constrained to expensive hardware with sophisticated hardware that often contain neural accelerators, or these models were required to be processed in the cloud with GPU or TPU enabled servers. Through Xnor’s groundbreaking research, in coordination with the Allen Institute for AI, on YOLO, Xnor-Net, YOLO 9000, Tiny YOLO and other AI models, we’ve been to able move machine learning from expensive hardware and the cloud, to simple, resource-constrained devices that can operate completely on-device and autonomously. This means you can run sophisticated deep learning models on embedded devices without the need for neural processors and without the need for a data connection to the cloud. For example, on a 1.4 GHz dual-core ARM chip with no GPU, we can run object detection with CPU utilization of only 55%, a memory footprint of only 20MB, and power consumption of less than 4.7W.

Object Detection

Let’s dig into one specific model that we’ve built – object detection. Object detection is a type of AI model that identifies categories of objects that are present in images or videos – think people, automobiles, animals, packages, signs, lights, etc. – and then localizes their presence by drawing a bounding box around them. Utilizing a CNN (convolutional neural network), the model is able to simultaneously draw multiple bounding boxes and then predict classification probabilities for those boxes based on a trained model.

Traditionally these models have been resource intensive because of the model architecture – the number of layers (convolution, pooling and fully connected) – and the fact that most CNN’s use 32-bit precision floating-point operations.

Xnor’s approach is different and we’ve summarized this approach below.

Xnorization (How It Works)

Our models are optimized to run more efficiently and up to 10x faster through a process we call Xnorization. This process contains five essential steps. First, we binarize the AI model. Second, we design a compact model. Third, we prune the model. Fourth, we optimize the loss function. Fifth, we optimize the model for the specific piece of hardware.

Let’s explore each of these in further detail

Model Binarization

To reduce the compute required to run our object detection models, the first step is to retrain these models into a binary neural network called Xnor-Net. In Binary-Weight-Networks, the filters are approximated with binary values. This produces results that are 58x faster for convolutional operations and a memory savings of up to 32x. Furthermore, these binary networks are simple, accurate, and efficient. In fact, the classification accuracy with a Binary-Weight-Network version of AlexNet is only 2.9% less than the full-precision AlexNet (in top-1 measure).

To do this, both the filters and the input to convolutional layers are binary. This is done by approximating the convolutions using primarily binary operations. Finally, the operations are parallelized in CPUs and optimized to reduce model size. This gives us the ability to reduce floating point operations to as small as a binary operation, making it hyper efficient. Once completed, we have state-of-the-art accuracy for models that:

  • Are 10x faster
  • Can be 20-200x more power efficient
  • Need 8-15x less memory than traditional deep learning models

Compact Model Design

The second critical piece is to design models that are compact. Without compact model design, the compute required for the model remains high. Our Xnorized models utilize a compact design to reduce the number of required operations and model size. We design as few layers and parameters into the model as possible. The model design is dependent on the hardware, but we take the same fundamental approach for each model.

Sparse Model Design

Third, a variety of techniques are used to prune the model’s operations and parameters. This reduces the model size and minimizes the operations necessary to provide accurate results.  Here, most of the parameters are assigned zero as their value. The remaining parameters, which are very few, will be non-zero. By doing this, we can ignore all the computations for the zero parameters and only save the indexes and the values for the non-zero parameters.

Optimized Loss Functions

Fourth, we’ve built groundbreaking new techniques for retraining models on their own predicted model. Techniques like Label Refinery greatly increase accuracy by optimizing loss functions for a distribution of all possible categories. With Label Refinery, we actually rely on another neural network model to produce labels. These labels contain the following properties: 1) Soft; 2) Informative; and 3) Dynamic.

Soft labels are able to categorize multiple objects in an image and can determine what percentage of the image is represented by what object category. Informative labels provide a range of categories with the relevant confidence, so, for example, if something is mislabeled as a cat, you can know that the second highest category is dog. Dynamic labels allow you to ensure that the random crop is labeling the correct object in the image by running the model dynamically as you sample over the image.

You can learn more about this technique here.

Hardware Optimization

Lastly, because we’re building models for all sorts of embedded devices, we need to optimize the model for different hardware platforms to provide the highest efficiency across a broad range of Intel and Arm CPUs, graphical processing units (GPU), and FPGA devices. For example, we’ve partnered with Toradex and Amabrella to build person detection models that can be viewed here and here.

Results

By Xnorizing our models, we’re able to achieve cutting edge results. We have miniaturized models that are < 1MB in size and can run on the smallest devices. The models have fewer operations, faster inferences and higher frames per second, and low latency because they are running on device. And, we have fewer joules per inference which translates to lower power consumption.

More refined labels at training yield higher accuracy for on-device models

Hessam Bagherinezhad, Maxwell Horton, Mohammad Rastegari, Ali Farhadi

Advancements in deep learning opened the possibility for any camera to operate as a smart sensor capable of seeing and understanding a range of visual content. Deep neural networks are now powering technology underlying most of our online interactions through social, shopping, watching TV and movies, managing our photos. What led to the rise of deep learning? One basic reason is the parallel advancement in cloud computing and the accessibility of GPU’s to train and run such large neural networks.

What happens when we move AI onto edge devices?

This accessibility to GPUs is why the early adoption of AI was mainly for problems that could be solved in the cloud. But there are many real-world applications that would benefit from the promise of AI, for example, smart devices in the home that can recognize food in fridges, map drone direction in the sky and improve the outcomes of search and rescue missions.

Smaller neural networks

The first approach that enabled deep neural networks to even run at the edge was to implement efficient network architectures that require radically less memory and compute. The three main approaches are:

Low-Precision Models

Part of what makes deep learning so computationally intense are the convolutional operations. A standard full precision model relies on 32-bit floating point convolutional operations, which for standard deep learning models with billions of operations means a massive level of compute is required to make even a single inference. Through our work on XNOR-NET, we have shown one approach to reducing the model size is to drive convolutions down all the way from 32-bit to 1-bit. This approach has demonstrated the ability to radically reduce the memory and compute of deep learning models. Furthermore, given the binary nature of the operations the models can be easily embedded into hardware that relies on binary operations.

Sparse Models

Another way to reduce the model size and the computation is using sparse parameters, where most of the parameters are assigned to zero as their value and very few will remain non-zero. This simply means that we can ignore all the computations for the zero parameters and only save the indexes and the values for the non-zero parameters. We have developed LCNN: Look-up based convolutional neural networks that benefit from the sparse structure of the model to improve the efficiency of the network.

Compact Network Design

MobileNets, pioneered by Google, are an example of a streamlined architecture that relies on depth-wise separable convolutions. These models make it easy to trade off between accuracy and latency dependent on the constraints of the model’s application.

How can we improve accuracy for edge models?

Optimizing network architectures was a critical first step in getting deep learning models to run on edge devices. Now that the problem of how to run AI on the edge has been solved, we have been focused on new ways to improve performance. One approach that has been largely overlooked until now is data labeling, so we asked the question:

Are the current data labels for standard mainstream AI tasks good enough for training the models running in resource-constrained environments?

Training Data for Object Classification

To train a deep learning model to classify objects we need to provide a set of images that have been “tagged” with those objects. Until now, the field of computer vision has relied heavily on standard images sets such as ImageNet. Because they work to a large extent, researchers have not questioned how effective the labels are at providing perfect levels of generalization from training datasets to a test set.

Challenge 1: Incomplete label for one image

Sample image labeled ‘Persian cat’ in ImageNet’s training set

This image has the data label “Persian cat,” which means that we are training to model to learn that everything in this image should be classified as a Persian cat. As humans, it’s trivial for us to see that this image actually contains a Persian cat playing with a ball, where “ball” is considered an object category in this dataset but not labeled in this image. We understand that the cat is a separate object from the ball, that the cat is significantly bigger than the ball, and that the cat is living and the ball is non-living. Despite all of this complexity in the image, with only one data training label, we are simply telling the network that this image = Persian cat. Hence, this labeling is incomplete.

Challenge 2: Inconsistencies from random cropping — cropping the wrong object

To prevent overfitting (where models perform poorly because of an inability to generalize to novel images), various training techniques have been introduced to prevent models from memorizing actual images. One approach commonly used is to take a randomly sampled crop from different areas of the image, known as crop-level augmentation, which can represent as little as 8% of the entire image. While this approach improves the model’s ability to generalize from training to test datasets, it introduces inconsistencies between the image label and crop when there is more than one object in the image. For example, in the image below, a random crop may contain the pixels from the ball, but be labelled “Persian cat”.

example of random cropping augmentation

Challenge 3: Inconsistencies from random cropping — similarities between crops

Another challenge of training models on image crops is that there are opportunities for objects with different image-level labels to look indistinguishable to the model, for example a patch of bread could look very similar to a patch of butternut squash. To maintain high-levels of accuracy, it’s important for the model to be able to distinguish between these two food categories.

Example of random crop from bread (left) and butternut squash (right)

Challenge 4: Similar or very different mis-categorizations deliver the same penalty to model

As humans, we have all looked at a little dog that could be a cat, or thought a big dog looks like a bear. Currently, the way that most models are designed is that any misclassification or mistake is penalized equally. The way current labels are determined does not provide any insight into related categories. For example, the model receives the same level of penalization if it mis-categorizes a cat as either a dog or as a mirror on a car.

Examples of the limitations from taxonomy dependency

Large models that have a large model complexity are able to learn despite these inconsistencies in training data, but when models are lower precision, sparser, or more compact, then these inconsistencies in the training data cannot be resolved and accuracy of the model suffers.

How can we create training labels that deliver higher accuracies?

Given the challenges in data labeling identified above, we proposed a new iterative procedure to dynamically update ground truth labels using a visual model trained on the entire dataset. This new approach is called Label Refinery and relies on a neural network model to produce labels with the following properties that are consistent with the image content:

  • Soft
  • Informative
  • Dynamic

Soft labels are able to categorize multiple objects in an image and can determine what percentage of the image is represented by what object category. For the cat and ball example above, we are able to classify the image as 80% cat and 20% dog.

Informative labels provide a range of categories with the relevant confidence, so that if something is mislabeled as a cat, you can know that the second highest category is dog.

Dynamic labels — this approach to labeling allows you to ensure that the random crop is labeling the correct object in the image by running the model dynamically as you sample over the image.

How does label refinery impact accuracy?

We evaluated the approach of label refinery on the standard Image-Net ILSRVC2012 classification challenge using a variety of model architecture.

The figure below shows the first label generated from Image-Net and goes on to show how the model refines the labels over time. The graphs show the line of “perfect generalization” where a model perfectly generalizes from training data to test. Our results show that with progressively more automatic label refining, the model performance moves closer and closer to perfect generalization. This trend towards perfect generalization is reflected in the accuracy performance table below.

Figure taken from Label Refinery paper

Table taken from Label Refinery paper

Why is Label Refinery critical for boosting accuracy at the edge?

The results for implementing label refinery show interesting findings. We found that large models with small generalization gap (differences between the accuracy of the train and the test) are able to better handle situations where data labels are imprecise. This means that running label refinery has less impact on these models. However, where we see the biggest boost in performance is on models that have been compressed down to optimize for small memory and compute or models with a large generalization gap. This is why Label Refinery is critical for boosting accuracy on edge models running in resource-constrained environments.

With this approach, combined with Xnorized network architectures, we can now create models that are small, power efficient, low latency, and have a high degree of accuracy that can power smart devices as small as a doorbell and as mobile as a drone.

Source code is available on GitHub for Label Refinery —

https://github.com/hessamb/label-refinery

Paper on Archive
https://arxiv.org/pdf/1805.02641.pdf

Source code is available on GitHub for XNOR-NET —

https://github.com/allenai/XNOR-Net
https://arxiv.org/abs/1603.05279

Learn how to deploy Edge AI models at Xnor.ai’s developer workshop

At the Edge AI Summit in San Francisco on December 11, we will show how we’re using the Label Refinery and multiple other algorithm and model optimizations to create real-time AI solutions on hardware as small as a Raspberry Pi Zero.