Imagine being able to create more focus in your video-conference, or transport users to a different world in a mobile app experience. Image segmentation, a computer vision machine learning task, makes this a reality by creating pixel-accurate image masks of detected objects. Computer vision is progressing at such a rapid rate that these tasks can now run on mobile handsets, and even Raspberry-Pi like devices with simple ARM processors. What’s most exciting is that developers can start creating these new experiences today. Let’s take a moment to think about what’s possible:

Social, Retail & Gaming Scenarios

Some of the most exciting new opportunities are in social, retail, and AR/VR. For social, gaming and photography apps — imagine superimposing users into completely different landscapes and scenery, or immersing them into a game. In retail, what if you could transport the user into a virtual fitting room or let them interact with products in a virtual showroom?

Image Segmentation for mobile & AR experiences

Productivity & Videoconferencing

Image segmentation can also enhance online meetings by eliminating background distractions. This is done by blurring out or completely changing the background in the video stream. This allows users to preserve privacy, make the environment appear more professional, or even make a conference call more productive by placing people together into a virtual conference room.

How It Works

Image segmentation partitions images and video frames into distinct regions containing pixels of an instance of an object. These attributes are derived by training models with images to identify different types of objects like people, vehicles, and animals. A binary mask of an image is created, which is represented by a black and white image showing where the segmentation algorithm finds a match.

Segmentation mask isolating the dancer in the frame

Improving Training Data & Performance Optimization

With Xnor’s real-time image segmentation you can dynamically isolate people in live video and superimpose them anywhere in 2D, VR, or augmented reality. Capable of running solely on the CPU of devices or servers, Xnor’s segmentation algorithm can also take advantage of GPUs, accelerators and neural processors.

This article by our CTO, Mohammad Rastegari, shows just one of the ways we are improving deep learning accuracy and performance on devices. Advances like these also power our image segmentation offering, executing efficiently enough to run on mobile handsets and streaming camera video. Internal benchmarks indicate our approach performs up to 9x faster than standard solutions.

Until now, segmentation has been difficult to accomplish due to the lack of accurate deep learning models and high processing requirements. This lack of good training data has made it nearly impossible, except for the largest companies, to invest the time and resources necessary to create deep learning models that can identify and segment people and objects with high accuracy.

Additionally, traditional object detection and segmentation tasks perform billions of compute-intensive, floating point operations which require bigger processors that are enhanced with GPUs or AI accelerator chips.

Xnor solves these problems by providing optimized pre-trained models and tuned algorithms that perform with higher performance and accuracy than other state-of-the-art models. By precisely training deep learning models and reducing the complexity of the algorithms, our AI scientists enable segmentation tasks to be executed in real-time on streaming video on form-factors as small as mobile handsets.

Want to learn more?

Visit us at or click here to learn more about image segmentation.