I’ve loved cars since I was a little boy. From classic cars to custom hot rods, I loved them all, but I was especially fascinated by the futuristic vehicles featured on TV. Depending on which generation you identify with, you might remember Kitt from Knight Rider, the Batmobile, or the nameless Delorean from Back to the Future. Not only were these cars fast, they could think, talk and sometimes even see.

AI has given us the first generation of autonomous cars — and it’s pretty impressive. But there is a host of next generation of AI-enhanced features that go even further in providing convenience and ensuring passenger safety.

Auto-evolution: AI at the edge for cars

Xnor is focused on bringing computer vision to edge devices, so our technology is particularly valuable for automobiles and commercial vehicles. Every AI capability we offer – whether it involves people, object or face recognition – delivers a degree of speed and accuracy that until recently, was only possible using a high-end processor augmented by a neural accelerator. We take that same level of performance, improve upon it, and make it available on an edge device, such as a 1 GHz ARM processor or a simple onboard computer.

Check out this demo of our computer vision technology:

Object detection capabilities

Crime prevention

For car sharing companies or taxis, the system can enforce security regulations by recognizing when passengers hold weapons or other objects that present a safety hazard.

Loss prevention

Using object detection, the system can remind a passenger to retrieve the phone or purse they left on the seat. Transportation and logistics companies could receive an alert if a package was not delivered at the end of a route.

Face recognition capabilities

Here are a few of the capabilities that can be incorporated into a line of vehicles using Xnor’s face recognition or action detection models.

Secure access

Using face recognition, a driver can be authenticated even before they enter a vehicle. The door could automatically open for people recognized by the car, making hands-free entry possible. Our technology would even allow the car to differentiate between children and adults. Commercial vehicles could use that information to control access to certain areas by authorizing drivers.

Because all of this is done on-device, the data doesn’t need to be transmitted to the cloud, making it significantly more secure and practical of a feature.


Once a driver or passenger is authenticated, the car could adjust settings to align with personal preferences, such as the position of the seat and steering column, interior temperature and infotainment system settings.

Driver awareness

ML-powered driver monitoring can tell when a driver is looking at a phone, instead of the road ahead. And if the driver becomes drowsy and their eyelids start to close, the system will know that too.

Emergency response

In the event of a crash or another emergency, the system can generate a passenger list, and notify someone if the driver does not respond to an audible alarm.

Passenger safety

Action detection models can be trained to detect specific gestures like fastening a seatbelt to ensure that everyone is buckled in.

Person and pet detection models can identify if a pet is left inside a car (a potentially dangerous situation on a hot day) or if an infant or small child is left behind, and then sound an alarm to notify the driver.

AI at the edge drives automotive innovation

Without recent advances in deep learning for computer vision, many of these features would be too difficult or expensive to implement.

Xnor’s AI technology is unique in that it delivers state-of-the-art performance on a commodity processor, using only the bare minimum for energy and memory.

Even with a simple onboard computer, Xnor models execute at up to 10x faster than conventional solutions – while using up to 15x less memory and 30x less energy.

Taken together, all these capabilities make it both practical and profitable for automobile manufacturers to incorporate high-performance computer vision into a variety of applications for the commercial and consumer vehicle markets.

At Xnor, we’re fascinated by the creative and powerful ways our customers are working to incorporate machine learning into their line of cars and commercial vehicles. It’s not as cool as owning one of the super-smart, fast-talking exotic cars that my TV heroes used to drive, but it comes pretty close.

Read more about how you can incorporate the latest in computer vision into your line of vehicles.

Search for the term “the future of retailing” and you’ll see plenty of stories about physical retailers being marginalized by their dot-com counterparts. Some would say that physical stores are fading from the retail landscape. Quaint, but doomed. To understand why consider the shopping experiences offered by each channel.

Online vs. Offline

For example, while checking the number of followers in their Instagram account, your future customer sees an image of their favorite celeb wearing shoes that they simply must have. Other distractions intervene, but after seeing several banner ads they finally click, swipe or tap their way to an online store. Thanks to cookies and ad tracking, the site already knows a great deal about the customer, from their purchase history down to their shoe size. The customer browses for products, reads reviews and compares items. With each click, the store knows a little bit more.

As the customer moves through the site, the convenience, selection and price advantage of shopping online becomes obvious. When they make a purchase, the customer can be rewarded for their loyalty with a coupon code, and the inventory system knows which item to reorder.

On the other hand, a retail store doesn’t know who you are the moment you walk in the door. They don’t know if you’ve bought from them – or from any of their competitors – before. They have no idea what color you like, or what shoe size you wear. Traditional retailers rely heavily on in-store displays or staff to guide customers through the store.

Now replay that scenario – but with one difference. This time it’s a physical store equipped with the latest generation in AI. Small cameras placed throughout the store use computer vision to provide an advanced level of retail analytics, possibly even better than what is available to online stores, while also creating a better experience for shoppers.

The Customer Journey in an AI-enabled Store

In this new scenario, a face recognition algorithm identifies customers and their demographics as they walk through the front door. Maybe this individual is a regular shopper and a member of your loyalty program. Based on their purchase history, you can send them a notification while they are in your store about new offerings that may be enticing to them.

As they move through the aisles, multiple cameras recognize that customer as the same person and track them throughout the store. Do the endcap displays attract their attention? Where do they stop and spend time? Does the location of a preferred product impact what else they buy nearby? Once your customers are at the check-out counter, payment can be as simple as a quick scan of their face.

On a larger scale, this data can be used to develop in-depth, real-time heatmaps without having to lift a finger. The information can also be bolstered with other AI capabilities such as emotion detection and action recognition in order to build highly detailed customer insights. Your customers and their paths through the store are now actionable data for your business, opening up a vast number of opportunities.

Security and Store Operations

The analytics you collect on the floor will impact your customers and their experiences, but there’s a slew of potential opportunities behind the scenes that can streamline operations for your business.

Surveillance and access control are important in-store functions for avoiding crime and unauthorized activity. Using Xnor’s AI capabilities, security can be enhanced with features like weapon or dangerous action detection. Secure areas can be better controlled with computer vision solutions like face recognition and person detection to make sure only the right people have access to restricted areas.

Another particularly valuable function is inventory management. Knowing when items are out of stock on the shelves helps to restock more efficiently. Creating efficient, real-time solutions for monitoring items also helps to keep vendors up-to-date on their products within your store as well as how they are performing. This can also be tied to traffic patterns so you can understand how often people are interacting with different products.

Gaining a competitive advantage

Many see the future of retail as being fully automated, but that shift won’t happen overnight. Retailers are beginning to introduce these capabilities piece by piece in order to stay ahead without having to completely overhaul operations. By incorporating AI solutions developed by Xnor, your store will avoid the headaches of conventional AI solutions. Xnor models can run on commodity devices, so you don’t need to upgrade your cameras or pay for expensive cloud-computing services (which are less secure). Running on-device also reduces latency and power consumption so your solutions will pick up that power-walker even on a battery-powered camera that you can place anywhere.

With Xnor’s computer vision models, physical stores can have the retail analytics they need to compete with their online counterparts – and help a loyal customer to find the perfect pair of shoes.

Visit Xnor to learn how the next generation in AI can help your retail store compete.

Mention Smart Appliance, and most people think of using a smartphone to turn on house lights as they pull in the driveway, arm security systems, control thermostats, or check if Amazon left a package on the front porch. Initially, that level of functionality was impressive. But so far, the value associated with Smart Appliances has been centered around heightened security and managing your home from a remote location.

It’s time Smart Appliances got an upgrade.

Smart Appliances V1

The first iterations of Smart Appliances were hampered by technical limitations. In some cases, the only smart thing about the earliest versions was touch screen interfaces, Bluetooth connectivity and the option to use a mobile device to control the appliance. Advanced features like food detection, if it was used at all, was constrained by the limitations inherent in AI technology at that time. One of those factors was the processing power needed to run an AI application. AI apps that could recognize and identify specific varieties of food required a robust processor with a neural or GPU accelerator, as well as an ample power source. Incorporating a power-hungry processor into the design of an energy-efficient appliance wasn’t practical. It also required a persistent, high bandwidth connection to the cloud. The resulting latency could delay system response to user input and create a poor customer experience. At any rate, aside from the onerous compute requirement, food detection models were still in their infancy. They were often inconsistent, and it was difficult to train them to identify new items.

The new generation of food identification technology promises to break through those barriers. With highly efficient algorithms, AI apps can be run on a small embedded device inside the appliance, without a persistent, high-bandwidth, internet connection.

Here are a few ways AI on the Edge can make a Smart refrigerator a little smarter:

  • Add items to a shopping list when they need to be replenished
  • Suggest a recipe based on the items you already have in your refrigerator
  • Make grocery shoppers faster and more informed
  • Make recommendations for how best to store certain produce
  • Provide cooking tips for certain foods
  • Detect when there’s a spill inside

With this kind of upgrade, homeowners can use the new generation of Smart Appliances to reduce their monthly grocery bill, reduce waste, and save time at the grocery store.

Compact, efficient algorithms are the brains behind smart appliances

With Xnor’s efficient, on-device computer vision models, smart appliances are now becoming a reality. Xnor’s food identification models offer appliance manufacturers some specific advantages over conventional AI solutions:

Improved performance

The new generation of food identification technology brings AI to edge devices, so there’s no need for internet connectivity. When Smart Appliances aren’t tethered to an internet connection, they are more responsive. Plus, there’s no risk of downtime due to a network or service outage. That translates into a better experience for consumers.

Improved accuracy

Even an item as ubiquitous as a Granny Smith apple comes in a variety of shades, sizes, and shapes. Our highly efficient training models deliver substantially higher accuracy, making it possible to visually identify food items in less than ideal lighting conditions, even if they are partially obscured.

Reduced energy use

Keeping energy consumption to a minimum is a top priority for appliance manufacturers. Xnor’s food detection models have been shown to be up to 30x more energy efficient than conventional AI technology.

Lower costs

Without the need for fast, power-hungry processors, the cost of introducing these features comes way down. Combined with low energy use and internet-free, on-device computing, its now possible to incorporate advanced food detection capabilities into a range of products at multiple price points.


There’s a multitude of tasks involved in preparing a meal. By going beyond preserving and cooking food, refrigerators will begin to behave less like an appliance, and more like a virtual sous-chef. As a company that’s invested a significant amount of research in this area, we’d like to say, “Bon-Appetit!”

Visit us to learn how the next generation in food detection technology can boost the performance of your Smart Appliance.


2010 was a milestone year for face recognition. That’s when Facebook introduced a photo tagging feature with the ability to identify individuals in a photograph by matching faces to the pictures stored in a user’s profile. The feature was popular but frequently inaccurate. Getting the best results required the people in the photograph to look directly into the lens. Accuracy was also dependent on the quality of the user’s Facebook profile picture and other photos they were tagged in. Blurs caused by camera motion, reflective surfaces and light levels all had a negative impact on performance. But it was a start.

Flash forward nine years. Face recognition has been adopted by several industries, most notably in the areas of law enforcement and home / commercial security. Biometric measures such as retinal scans and voice analysis are also useful in security applications, but face identification is still the preferred method.

Other biometric measures require users to physically interact with a device or to voluntarily position themselves next to a sensor. Think of pressing your palm against a reader, speaking directly into a microphone, or staring, unblinking, into a lens while a computer scans your retina. Measurements like these are impractical when it comes to identifying one individual in a large group of people moving through an airport.

Despite the inherent advantages of face recognition, the technology is still in its infancy. Here are four areas where the standard approach has failed to live up to its potential.

The limitations of standard face recognition technology

1) Low accuracy

Camera angles have a strong influence on how successfully a face can be detected and identified. Most of the existing models need to compare multiple angles, including profiles and full-frontal views, to achieve the best results. Facial hair, makeup, scarves, and hats can cause trouble. Ideally, a subject must hold still, remove their eyeglasses and look into the lens or a number of photos have to be taken from different angles. This makes training for face recognition extremely difficult.

2) Compute requirements

Whether it’s analyzing images to run the model or training a new model, traditional recognition algorithms need to run on a robust processor with a neural or GPU accelerator – and they need a persistent, high-bandwidth connection to the cloud. In fact, during training, most face recognition algorithms require multiple photos from thousands of people. Once the parent model is trained, the model still has to be pushed to the cloud or run expensive hardware to work for your specific face. This causes latency and security issues and delivers a poor user experience.

3) Inflexible deployment options

Standard technology requires developers to accommodate the need for fast processors and access to cloud-hosted servers. That rules out deploying face apps in remote areas and on cheap devices. This limits the applications for face identification and forces developers using computer vision apps to make compromises on user experience, responsiveness, accuracy, and data security.

4) High cost

Unsurprisingly, incorporating face recognition capabilities into an existing app often requires a hardware upgrade.

Self-contained deep learning models

At Xnor, we realized that eliminating these restrictions required a completely new approach, so we started at the beginning: the learning models. Our computer vision technology is trained to operate in a range of environmental conditions. The resulting facial signatures can accurately analyze faces in live video streams at more than 30 FPS on GPU-enabled hardware and at 4 FPS on resource-constrained hardware, such as a CPU, regardless of changing lighting conditions, movement or camera angles.

In real life, people don’t stare directly at a lens, without moving or waiting for an algorithm to do its work. People are in motion. Expressions can change several times in the time it takes you to read this paragraph. Faces can be partially obscured by eyeglasses, a scarf, a hat, makeup or even earrings. Our deep learning models ensure accuracy regardless of the subject’s skin tone or fashion sensibilities.

Even better, the training for the individual face can happen completely on-device, with as few as three images. This means you don’t need to take hundreds or thousands of photos of a face or use a large number of frames from a video.  This makes our solution completely edge-enabled. There’s no need to rely on a cloud solution or risk downtime with network and service outages, and most importantly, it makes face identification possible for cheap hardware.

Speed and reliability

Xnor’s apps can detect and identify individual faces in real-time, on-device (at up to 5 frames per second), utilizing a commodity camera, or on embedded hardware running on a processor as small as 1 GHz. In fact, we’re currently running face recognition on an Ambarella S5L commodity chip. Without the need for an internet connection, the real applications for these ML algorithms are enormous. It’s now possible to use advanced face identification features in remote locations, or in situations where maximizing uptime is essential.


Our face recognition algorithms and training models can be run completely on-device, using a low-end processor. Personal information is stored on the device, not transmitted to the cloud for processing, where it can become vulnerable to security breaches. Taken together, these capabilities allow developers to build face identification apps that not only offer increased performance, they go farther in protecting sensitive data.

A new approach yields new capabilities

In addition to enhancing performance, Xnor’s technology allows developers to integrate new capabilities into their applications, such as the ability to determine the subject’s age or gender, which direction they are looking, and whether the subject is happy, angry, scared, sad or surprised. This new technology will create new opportunities for developers to use face recognition in more powerful ways, in more scenarios, and, most importantly, on more devices.

Visit us to learn how to incorporate the next generation of face recognition into a broad range of applications.

Machine vision has long been the holy grail to unlocking a number of real-world use cases for AI – think of home automation and security, autonomous vehicles, crop picking robots, retail analytics, delivery drones or real time threat detection. However, until recently, AI models for computer vision have been constrained to expensive hardware with sophisticated hardware that often contain neural accelerators, or these models were required to be processed in the cloud with GPU or TPU enabled servers. Through Xnor’s groundbreaking research, in coordination with the Allen Institute for AI, on YOLO, Xnor-Net, YOLO 9000, Tiny YOLO and other AI models, we’ve been to able move machine learning from expensive hardware and the cloud, to simple, resource-constrained devices that can operate completely on-device and autonomously. This means you can run sophisticated deep learning models on embedded devices without the need for neural processors and without the need for a data connection to the cloud. For example, on a 1.4 GHz dual-core ARM chip with no GPU, we can run object detection with CPU utilization of only 55%, a memory footprint of only 20MB, and power consumption of less than 4.7W.

Object Detection

Let’s dig into one specific model that we’ve built – object detection. Object detection is a type of AI model that identifies categories of objects that are present in images or videos – think people, automobiles, animals, packages, signs, lights, etc. – and then localizes their presence by drawing a bounding box around them. Utilizing a CNN (convolutional neural network), the model is able to simultaneously draw multiple bounding boxes and then predict classification probabilities for those boxes based on a trained model.

Traditionally these models have been resource intensive because of the model architecture – the number of layers (convolution, pooling and fully connected) – and the fact that most CNN’s use 32-bit precision floating-point operations.

Xnor’s approach is different and we’ve summarized this approach below.

Xnorization (How It Works)

Our models are optimized to run more efficiently and up to 10x faster through a process we call Xnorization. This process contains five essential steps. First, we binarize the AI model. Second, we design a compact model. Third, we prune the model. Fourth, we optimize the loss function. Fifth, we optimize the model for the specific piece of hardware.

Let’s explore each of these in further detail

Model Binarization

To reduce the compute required to run our object detection models, the first step is to retrain these models into a binary neural network called Xnor-Net. In Binary-Weight-Networks, the filters are approximated with binary values. This produces results that are 58x faster for convolutional operations and a memory savings of up to 32x. Furthermore, these binary networks are simple, accurate, and efficient. In fact, the classification accuracy with a Binary-Weight-Network version of AlexNet is only 2.9% less than the full-precision AlexNet (in top-1 measure).

To do this, both the filters and the input to convolutional layers are binary. This is done by approximating the convolutions using primarily binary operations. Finally, the operations are parallelized in CPUs and optimized to reduce model size. This gives us the ability to reduce floating point operations to as small as a binary operation, making it hyper efficient. Once completed, we have state-of-the-art accuracy for models that:

  • Are 10x faster
  • Can be 20-200x more power efficient
  • Need 8-15x less memory than traditional deep learning models

Compact Model Design

The second critical piece is to design models that are compact. Without compact model design, the compute required for the model remains high. Our Xnorized models utilize a compact design to reduce the number of required operations and model size. We design as few layers and parameters into the model as possible. The model design is dependent on the hardware, but we take the same fundamental approach for each model.

Sparse Model Design

Third, a variety of techniques are used to prune the model’s operations and parameters. This reduces the model size and minimizes the operations necessary to provide accurate results.  Here, most of the parameters are assigned zero as their value. The remaining parameters, which are very few, will be non-zero. By doing this, we can ignore all the computations for the zero parameters and only save the indexes and the values for the non-zero parameters.

Optimized Loss Functions

Fourth, we’ve built groundbreaking new techniques for retraining models on their own predicted model. Techniques like Label Refinery greatly increase accuracy by optimizing loss functions for a distribution of all possible categories. With Label Refinery, we actually rely on another neural network model to produce labels. These labels contain the following properties: 1) Soft; 2) Informative; and 3) Dynamic.

Soft labels are able to categorize multiple objects in an image and can determine what percentage of the image is represented by what object category. Informative labels provide a range of categories with the relevant confidence, so, for example, if something is mislabeled as a cat, you can know that the second highest category is dog. Dynamic labels allow you to ensure that the random crop is labeling the correct object in the image by running the model dynamically as you sample over the image.

You can learn more about this technique here.

Hardware Optimization

Lastly, because we’re building models for all sorts of embedded devices, we need to optimize the model for different hardware platforms to provide the highest efficiency across a broad range of Intel and Arm CPUs, graphical processing units (GPU), and FPGA devices. For example, we’ve partnered with Toradex and Amabrella to build person detection models that can be viewed here and here.


By Xnorizing our models, we’re able to achieve cutting edge results. We have miniaturized models that are < 1MB in size and can run on the smallest devices. The models have fewer operations, faster inferences and higher frames per second, and low latency because they are running on device. And, we have fewer joules per inference which translates to lower power consumption.

Andrew, one of’s engineers, showing a demo of Image Segmentation running off a webcam video feed, using 60 MB of memory and just the CPU – no GPU necessary.

With so many meetings involving participants from multiple locations, it’s no surprise that video conferencing has quickly become an essential collaboration tool. Best-in-class solutions allow users to share screens, access other desktops, chat, exchange files, and communicate via digital whiteboards. When done right, these capabilities add up to more than the long-distance equivalent of a face-to-face meeting. They provide a platform for a participatory experience that can break down corporate silos and boost productivity.

However, traditional video conferencing is plagued with a long list of vexing issues. A cluttered office or background distractions can draw a viewer’s attention away from the speaker. Poor image quality can detract from the content being presented. Frustrated with these technical and experiential imperfections, participants often use the time to catch up on their email and lose focus on the meeting.

Introducing AI-powered image segmentation

Image segmentation improves video by identifying the boundaries of people and objects, and isolating those pixels to enhance the focus or brightness separately from the rest of the image. It’s a technique that’s been around for years, but until now, two factors have delayed wide adoption.

First, traditional image segmentation involves billions of floating-point operations. That requires a significant amount of computing power with a fast processor augmented with a GPU or an neural accelerator chip. Second, a lack of good training data and models make it time-consuming to achieve smooth output. And, when you do have enough data, training it successfully requires running on expensive cloud resources. Often, only a large company can afford to invest the time and resources necessary to build image segmentation into their products. Xnor’s segmentation technology overcomes these blockers to give video conference providers precise control to a world-class video conferencing experience. Here’s what makes our image segmentation technology so revolutionary:

Flexible deployment options

Xnor can perform real-time image segmentation on embedded devices running on a 1 GHz Arm processor. For complex AI tasks, Xnor can also take advantage of GPUs, accelerators and neural processors running on servers or in the cloud.

A revolutionary learning model

Xnor image segmentation partitions video frames into distinct regions containing an instance of an object. The object may be a person, vehicle, animal, or any one of hundreds of objects. The attributes for each type of object is derived using an image-based training model. Xnor’s technology uses optimized pre-trained models and tuned algorithms to achieve substantially higher performance and accuracy than other models. Our core neural network model is the fastest and most accurate in the industry. Together, these deep learning models and revolutionary algorithms enable AI tasks to execute, in real-time, on streaming video and on form-factors as small as mobile handsets.

Low processor requirements

Traditional object detection and segmentation requires an application to perform billions of floating-point operations. Xnor’s AI processing technology can execute up to 9x faster than other computer vision solutions by utilizing performance breakthroughs our researchers have discovered, such as YOLO object detection and XNOR-Net image classification. That kind of performance delivers an enhanced user experience on a wide variety of devices, including webcams, mobile phones, and even dedicated conferencing hardware running commodity processors.

AI image segmentation introduces new video conferencing capabilities

Xnor’s technology provides video conference providers with a new set of tools to enhance video conferencing, including:

Scene Optimization

Improve video quality by dynamically adjusting the exposure, brightness, contrast, and sharpness of different portions of the image.

Background Blur and Replacement

A successful video conference has to hold the viewer’s attention, but distractions can make that difficult. You may want to encourage users to focus on the speaker, or perhaps a speaker has recorded a presentation in their office – and the whiteboard behind them contains sensitive information.

With Xnor’s real-time image segmentation you can dynamically isolate people and objects in a live video, then superimpose them anywhere in either a 2D, VR, or even augmented reality.

See it for yourself

See how easy it can be to transform ordinary video into an experience that will engage your viewers from the first frame to the last. Visit us to learn more.

Imagine being able to create more focus in your video-conference, or transport users to a different world in a mobile app experience. Image segmentation, a computer vision machine learning task, makes this a reality by creating pixel-accurate image masks of detected objects. Computer vision is progressing at such a rapid rate that these tasks can now run on mobile handsets, and even Raspberry-Pi like devices with simple ARM processors. What’s most exciting is that developers can start creating these new experiences today. Let’s take a moment to think about what’s possible:

Social, Retail & Gaming Scenarios

Some of the most exciting new opportunities are in social, retail, and AR/VR. For social, gaming and photography apps — imagine superimposing users into completely different landscapes and scenery, or immersing them into a game. In retail, what if you could transport the user into a virtual fitting room or let them interact with products in a virtual showroom?

Image Segmentation for mobile & AR experiences

Productivity & Videoconferencing

Image segmentation can also enhance online meetings by eliminating background distractions. This is done by blurring out or completely changing the background in the video stream. This allows users to preserve privacy, make the environment appear more professional, or even make a conference call more productive by placing people together into a virtual conference room.

How It Works

Image segmentation partitions images and video frames into distinct regions containing pixels of an instance of an object. These attributes are derived by training models with images to identify different types of objects like people, vehicles, and animals. A binary mask of an image is created, which is represented by a black and white image showing where the segmentation algorithm finds a match.

Segmentation mask isolating the dancer in the frame

Improving Training Data & Performance Optimization

With Xnor’s real-time image segmentation you can dynamically isolate people in live video and superimpose them anywhere in 2D, VR, or augmented reality. Capable of running solely on the CPU of devices or servers, Xnor’s segmentation algorithm can also take advantage of GPUs, accelerators and neural processors.

This article by our CTO, Mohammad Rastegari, shows just one of the ways we are improving deep learning accuracy and performance on devices. Advances like these also power our image segmentation offering, executing efficiently enough to run on mobile handsets and streaming camera video. Internal benchmarks indicate our approach performs up to 9x faster than standard solutions.

Until now, segmentation has been difficult to accomplish due to the lack of accurate deep learning models and high processing requirements. This lack of good training data has made it nearly impossible, except for the largest companies, to invest the time and resources necessary to create deep learning models that can identify and segment people and objects with high accuracy.

Additionally, traditional object detection and segmentation tasks perform billions of compute-intensive, floating point operations which require bigger processors that are enhanced with GPUs or AI accelerator chips.

Xnor solves these problems by providing optimized pre-trained models and tuned algorithms that perform with higher performance and accuracy than other state-of-the-art models. By precisely training deep learning models and reducing the complexity of the algorithms, our AI scientists enable segmentation tasks to be executed in real-time on streaming video on form-factors as small as mobile handsets.

Want to learn more?

Visit us at or click here to learn more about image segmentation.

At we work on every aspect of computing platforms to optimize artificial intelligence and machine learning, from the software down to the hardware. We have a diverse set of skills, so it is easy to quickly build a prototype for an end-to-end project. Saman Naderiparizi, PhD, is an hardware engineer and is here to share an example of the types of problems our team solves.

Here he describes one of our projects showing how we were able to take a Raspberry Pi Zero and turn it into a real-time edge AI device:

Before joining Xnor I was an electrical engineering PhD student at the University of Washington working on two core projects: developing cameras that harvest energy from Radio Signals such as WiFi, and streaming HD video without the need for any batteries. Leveraging these skills, I develop low-power hardware platforms to run deep neural network models.


The Raspberry Pi Zero powering our object classifier and emotion detection.


To demonstrate’s efficient machine learning models I want to show you what we humorously call our “Thing Detector”. It is a battery-powered device running on a $5 Raspberry Pi Zero that utilizes the Raspberry Pi camera to do complex and accurate object classification and emotion detection in real-time, while using a fraction of the compute capability present in desktop computers and cloud servers.


The xnorized models can recognize 80 object classes and detect emotions from facial expressions.


An image is fed to the Pi 0 and an inference is made. For object classification, if it sees a person, it chirps “person”. It can also infer human emotions such as happy, sad, angry, and scared. This is made possible with Xnor’s efficient binarized models in concert with our efficient inference engine.

The model running on this tiny computer is capable of recognizing 80 types of objects at several frames a second and only takes up a few megabytes of space. It’s a low-power device so it can run for about 5 hours with onboard battery that has the same capacity as two AA batteries.

To understand what’s happening in the real world, a computer needs to process input from a variety of sources such as imagery, video, and audio. This requires computationally demanding processes to run quickly — all made more challenging when running on limited hardware like our Raspberry Pi Zero.

Taking an image and making an accurate inference that it contains a human, other physical objects, or infer emotion from a face previously required billions of floating point operations…per image. Depending on the usage scenario this may be needed to processed on hundreds of frames per second in real time. Typically these workloads run on graphical processing units (GPU). In contrast, an xnorized network contains binary (0 or 1) values which results in floating point operations such as multiply/add to be converted down into simple operations such as XNOR and POPCOUNT. Additionally, because of the reduced bit-width from 32-bit floating point to 1-bit binary values, the memory requirements of xnorized models reduces significantly. This is where the performance boost of xnorized networks outperform traditional neural networks. What we’re doing here would have needed significant hardware or cloud resources just a short while ago.

While the prototype I’m showing you is an internal demo, we have successfully deployed production-quality models for our customers. We’re currently enabling new capabilities on $2 embedded chips and cameras that are components of everyday consumer appliances, mobile handsets, and home security devices.

I hope this gives you a glimpse into the intriguing opportunities we work on. If you find machine learning and AI as fascinating as I do come join our team!

Nearly forty years ago Paul Allen and Bill Gates set an audacious goal to put a computer on every desk and in every home. Since then we’ve seen our lives change as computers became increasingly available, miniaturizing from expensive mainframes to tremendously powerful handheld smartphones that nearly anyone can access.

I believe we’re on the brink of a similar breakthrough with artificial intelligence, and we are about to witness the next computer revolution. Until now, AI has required vast amounts of computing power to create and run deep learning models, relegating it to research, running in expensive data centers, or controlled by an elite group of cloud computing vendors. Where AI is truly needed is at the edge — cameras, sensors, mobile devices and IoT — where AI can interact with the real world in real-time

Jon Gelsey, Carlo C del Mundo, and Stephanie Wang in Xnor’s office

I’m excited and honored to be joining Xnor as CEO, joining Xnor’s founders — Professor Ali Farhadi and Dr. Mohammad Rastegari — to enable AI on billions of devices such as cameras, phones, wearables, autonomous vehicles and IoT devices that previously wasn’t feasible. Ali and Mohammad’s breakthrough discoveries have dramatically shrunk the compute requirements for advanced AI functions such as computer vision and speech recognition. Xnor is revolutionizing what’s possible on edge devices, delivering sophisticated AI on small and inexpensive devices, e.g. powerful computer vision even on something like a $5 Raspberry Pi Zero. We are already working with companies accomplishing amazing things on autonomous vehicles, home security, and on mobile devices.

Can AI Save Lives?

I’m also incredibly optimistic about the good that AI can bring to the world. Movies and science fiction often paint a dystopian future of how AI can be misused. Instead, I see many possibilities to improve lives — perhaps even save them. One of my friends is an avid sailor and I sometimes worry about what would happen if his boat capsized in a storm. Similar incidents in the recent past innovated by organizing crowdsourcing efforts enlisting people to scour satellite images of oceans spanning thousands of square miles to search for signs of survivors. As noble as these efforts were it was still looking for a needle in a haystack, with human eyes susceptible to fatigue reviewing imagery that quickly became out of date. I envision a future, already possible today, where autonomous search and rescue drones tirelessly traverse large expanses of ocean, equipped with cameras and utilizing deep machine learning to detect human life, boat wreckage, and survival gear in real-time to expedite a rescue.

Imagine drones using ai for search and rescue missions

What else is in the realm of possibility to improve our existence? One of the emerging areas of AI is human emotion detection and behavioral intent to improve retail experiences, utilizing deep learning models that measure consumer intent and engagement through movement and behavior. Those same concepts could be used to alert us to potential terrorist activity, human trafficking, and identify people in distress.

As with most exciting journeys, they’re rarely straight and can take a few surprise turns — but they are always memorable and worth venturing on. I’m looking forward to starting this one.

Learn more in our press release.

About brings highly efficient AI to edge devices such as cameras, cars, drones, wearables and IoT devices. The Xnor platform allows product developers to run complex deep learning algorithms — previously restricted to the cloud — locally, on a wide range of mobile and low-energy devices. Xnor is a venture funded startup, founded on award winning research conducted at the University of Washington and the Allen Institute for Artificial Intelligence. Xnor’s industry-leading technology is used by global corporations in aerospace, automotive, retail, photography, and consumer electronics.