Andrew, one of’s engineers, showing a demo of Image Segmentation running off a webcam video feed, using 60 MB of memory and just the CPU – no GPU necessary.

With so many meetings involving participants from multiple locations, it’s no surprise that video conferencing has quickly become an essential collaboration tool. Best-in-class solutions allow users to share screens, access other desktops, chat, exchange files, and communicate via digital whiteboards. When done right, these capabilities add up to more than the long-distance equivalent of a face-to-face meeting. They provide a platform for a participatory experience that can break down corporate silos and boost productivity.

However, traditional video conferencing is plagued with a long list of vexing issues. A cluttered office or background distractions can draw a viewer’s attention away from the speaker. Poor image quality can detract from the content being presented. Frustrated with these technical and experiential imperfections, participants often use the time to catch up on their email and lose focus on the meeting.

Introducing AI-powered image segmentation

Image segmentation improves video by identifying the boundaries of people and objects, and isolating those pixels to enhance the focus or brightness separately from the rest of the image. It’s a technique that’s been around for years, but until now, two factors have delayed wide adoption.

First, traditional image segmentation involves billions of floating-point operations. That requires a significant amount of computing power with a fast processor augmented with a GPU or an neural accelerator chip. Second, a lack of good training data and models make it time-consuming to achieve smooth output. And, when you do have enough data, training it successfully requires running on expensive cloud resources. Often, only a large company can afford to invest the time and resources necessary to build image segmentation into their products. Xnor’s segmentation technology overcomes these blockers to give video conference providers precise control to a world-class video conferencing experience. Here’s what makes our image segmentation technology so revolutionary:

Flexible deployment options

Xnor can perform real-time image segmentation on embedded devices running on a 1 GHz Arm processor. For complex AI tasks, Xnor can also take advantage of GPUs, accelerators and neural processors running on servers or in the cloud.

A revolutionary learning model

Xnor image segmentation partitions video frames into distinct regions containing an instance of an object. The object may be a person, vehicle, animal, or any one of hundreds of objects. The attributes for each type of object is derived using an image-based training model. Xnor’s technology uses optimized pre-trained models and tuned algorithms to achieve substantially higher performance and accuracy than other models. Our core neural network model is the fastest and most accurate in the industry. Together, these deep learning models and revolutionary algorithms enable AI tasks to execute, in real-time, on streaming video and on form-factors as small as mobile handsets.

Low processor requirements

Traditional object detection and segmentation requires an application to perform billions of floating-point operations. Xnor’s AI processing technology can execute up to 9x faster than other computer vision solutions by utilizing performance breakthroughs our researchers have discovered, such as YOLO object detection and XNOR-Net image classification. That kind of performance delivers an enhanced user experience on a wide variety of devices, including webcams, mobile phones, and even dedicated conferencing hardware running commodity processors.

AI image segmentation introduces new video conferencing capabilities

Xnor’s technology provides video conference providers with a new set of tools to enhance video conferencing, including:

Scene Optimization

Improve video quality by dynamically adjusting the exposure, brightness, contrast, and sharpness of different portions of the image.

Background Blur and Replacement

A successful video conference has to hold the viewer’s attention, but distractions can make that difficult. You may want to encourage users to focus on the speaker, or perhaps a speaker has recorded a presentation in their office – and the whiteboard behind them contains sensitive information.

With Xnor’s real-time image segmentation you can dynamically isolate people and objects in a live video, then superimpose them anywhere in either a 2D, VR, or even augmented reality.

See it for yourself

See how easy it can be to transform ordinary video into an experience that will engage your viewers from the first frame to the last. Visit us to learn more.