Andrew, one of’s engineers, showing a demo of Image Segmentation running off a webcam video feed, using 60 MB of memory and just the CPU – no GPU necessary.

With so many meetings involving participants from multiple locations, it’s no surprise that video conferencing has quickly become an essential collaboration tool. Best-in-class solutions allow users to share screens, access other desktops, chat, exchange files, and communicate via digital whiteboards. When done right, these capabilities add up to more than the long-distance equivalent of a face-to-face meeting. They provide a platform for a participatory experience that can break down corporate silos and boost productivity.

However, traditional video conferencing is plagued with a long list of vexing issues. A cluttered office or background distractions can draw a viewer’s attention away from the speaker. Poor image quality can detract from the content being presented. Frustrated with these technical and experiential imperfections, participants often use the time to catch up on their email and lose focus on the meeting.

Introducing AI-powered image segmentation

Image segmentation improves video by identifying the boundaries of people and objects, and isolating those pixels to enhance the focus or brightness separately from the rest of the image. It’s a technique that’s been around for years, but until now, two factors have delayed wide adoption.

First, traditional image segmentation involves billions of floating-point operations. That requires a significant amount of computing power with a fast processor augmented with a GPU or an neural accelerator chip. Second, a lack of good training data and models make it time-consuming to achieve smooth output. And, when you do have enough data, training it successfully requires running on expensive cloud resources. Often, only a large company can afford to invest the time and resources necessary to build image segmentation into their products. Xnor’s segmentation technology overcomes these blockers to give video conference providers precise control to a world-class video conferencing experience. Here’s what makes our image segmentation technology so revolutionary:

Flexible deployment options

Xnor can perform real-time image segmentation on embedded devices running on a 1 GHz Arm processor. For complex AI tasks, Xnor can also take advantage of GPUs, accelerators and neural processors running on servers or in the cloud.

A revolutionary learning model

Xnor image segmentation partitions video frames into distinct regions containing an instance of an object. The object may be a person, vehicle, animal, or any one of hundreds of objects. The attributes for each type of object is derived using an image-based training model. Xnor’s technology uses optimized pre-trained models and tuned algorithms to achieve substantially higher performance and accuracy than other models. Our core neural network model is the fastest and most accurate in the industry. Together, these deep learning models and revolutionary algorithms enable AI tasks to execute, in real-time, on streaming video and on form-factors as small as mobile handsets.

Low processor requirements

Traditional object detection and segmentation requires an application to perform billions of floating-point operations. Xnor’s AI processing technology can execute up to 9x faster than other computer vision solutions by utilizing performance breakthroughs our researchers have discovered, such as YOLO object detection and XNOR-Net image classification. That kind of performance delivers an enhanced user experience on a wide variety of devices, including webcams, mobile phones, and even dedicated conferencing hardware running commodity processors.

AI image segmentation introduces new video conferencing capabilities

Xnor’s technology provides video conference providers with a new set of tools to enhance video conferencing, including:

Scene Optimization

Improve video quality by dynamically adjusting the exposure, brightness, contrast, and sharpness of different portions of the image.

Background Blur and Replacement

A successful video conference has to hold the viewer’s attention, but distractions can make that difficult. You may want to encourage users to focus on the speaker, or perhaps a speaker has recorded a presentation in their office – and the whiteboard behind them contains sensitive information.

With Xnor’s real-time image segmentation you can dynamically isolate people and objects in a live video, then superimpose them anywhere in either a 2D, VR, or even augmented reality.

See it for yourself

See how easy it can be to transform ordinary video into an experience that will engage your viewers from the first frame to the last. Visit us to learn more.

Imagine being able to create more focus in your video-conference, or transport users to a different world in a mobile app experience. Image segmentation, a computer vision machine learning task, makes this a reality by creating pixel-accurate image masks of detected objects. Computer vision is progressing at such a rapid rate that these tasks can now run on mobile handsets, and even Raspberry-Pi like devices with simple ARM processors. What’s most exciting is that developers can start creating these new experiences today. Let’s take a moment to think about what’s possible:

Social, Retail & Gaming Scenarios

Some of the most exciting new opportunities are in social, retail, and AR/VR. For social, gaming and photography apps — imagine superimposing users into completely different landscapes and scenery, or immersing them into a game. In retail, what if you could transport the user into a virtual fitting room or let them interact with products in a virtual showroom?

Image Segmentation for mobile & AR experiences

Productivity & Videoconferencing

Image segmentation can also enhance online meetings by eliminating background distractions. This is done by blurring out or completely changing the background in the video stream. This allows users to preserve privacy, make the environment appear more professional, or even make a conference call more productive by placing people together into a virtual conference room.

How It Works

Image segmentation partitions images and video frames into distinct regions containing pixels of an instance of an object. These attributes are derived by training models with images to identify different types of objects like people, vehicles, and animals. A binary mask of an image is created, which is represented by a black and white image showing where the segmentation algorithm finds a match.

Segmentation mask isolating the dancer in the frame

Improving Training Data & Performance Optimization

With Xnor’s real-time image segmentation you can dynamically isolate people in live video and superimpose them anywhere in 2D, VR, or augmented reality. Capable of running solely on the CPU of devices or servers, Xnor’s segmentation algorithm can also take advantage of GPUs, accelerators and neural processors.

This article by our CTO, Mohammad Rastegari, shows just one of the ways we are improving deep learning accuracy and performance on devices. Advances like these also power our image segmentation offering, executing efficiently enough to run on mobile handsets and streaming camera video. Internal benchmarks indicate our approach performs up to 9x faster than standard solutions.

Until now, segmentation has been difficult to accomplish due to the lack of accurate deep learning models and high processing requirements. This lack of good training data has made it nearly impossible, except for the largest companies, to invest the time and resources necessary to create deep learning models that can identify and segment people and objects with high accuracy.

Additionally, traditional object detection and segmentation tasks perform billions of compute-intensive, floating point operations which require bigger processors that are enhanced with GPUs or AI accelerator chips.

Xnor solves these problems by providing optimized pre-trained models and tuned algorithms that perform with higher performance and accuracy than other state-of-the-art models. By precisely training deep learning models and reducing the complexity of the algorithms, our AI scientists enable segmentation tasks to be executed in real-time on streaming video on form-factors as small as mobile handsets.

Want to learn more?

Visit us at or click here to learn more about image segmentation.