Head pose estimation without keypoints in MXNET/GLUON

By the Blueprint Team

In retail, sales data is commonly used to identify hot products in stores for marketing. For instance, products that sell well in established stores are usually marketed heavily in new stores. With the recent success of machine deep learning use cases, especially Convolutional Neural Networks (CNN) in computer vision, companies have been combining insights extracted from images with sales data to refine marketing strategies. We’re going to discuss the particulars of deriving data from head pose estimation with Euler angles in MXNET/GLUON.

Head pose estimation from an image is currently derived from two main methods: with and without facial keypoints, which include eyes, ears, and nose. The accuracy of the keypoints approach depends upon the correct representation of a 3D generic body model. Such a model is usually difficult to achieve. The no-keypoints approach, however, works around the depth complexity of the keypoint approach and directly learns from 2D images with multi-loss. For instance, Ruiz et al., 2018 and Shao et al., 2019 developed no-keypoints models in Pytorch and Tensorflow, respectively, and their models outperform the traditional face landmark algorithms on several widely used data sets.

We adopted the no-keypoints approach and reimplemented the algorithm of Ruiz et al., 2018 in MXNET/GLUON (hereafter gazenet). Compared to other deep learning platforms, MXNET (with GLUON API) provides the same simplicity and flexibility as Pytorch, but also allows data scientists to hybridize the deep learning networks to leverage performance optimizations of the symbolic graph. Moreover, MXNET/GLUON does not need to specify the input size of networks, instead, it directly specifies the activation functions in the fully connected and the convolutional layers, and it can create a namescope to attach a unique name to each layer. Finally, its scalability and stability attract many retail companies to the MXNET/GLUON platform for their product deployment.

This gazenet algorithm takes in 3-channel (RGB) images and outputs three-unit vectors of a person’s gazing direction, that is, yaw, roll, and pitch (as illustrated below). The bounding box of that person’s face is provided by a face detector we modified and trained based on the paper of Najibi et al., 2017. Given the bounding box of a face, gazenet can detect that person’s gazing directions even when that person is looking sideways and when the video or images are in relatively low resolution, making facial landmarks hard to detect.

Similar to Rui et al., 2018 and Shao et al., 2019, gazenet employs a pre-trained ResNet50 (He et al., 2015) architecture followed by a fully connected layer. A softmax function is then used to derive the class scores. Multi-loss functions are used to classify and regress each angle. Its architecture is illustrated below. Gazenet achieves a comparable performance as Rui et al., 2018 on the public data set of AFLW200 with approximately 6.5 degrees average errors for yaw, roll, pitch, and mean squared error. Its open-sourced MXNET/GLUON implementation is here: https://github.com/Cjiangbpcs/gazenet_mxJiang/blob/master/README.md. We also adopted gazenet in our video analytics product, Reflect.

The estimated Euler angles can be further aggregated to provide insights on promising new products, driving marketing programs. This new data, together with traditional sales data, can provide retail stores valuable information to design experiments and drive meaningful business impact.

Share with your network

Classic vs. Serverless: Exploring Databricks’ latest Innovations

Explore the benefits of Databricks’ serverless solutions, which simplify resource management, improve productivity, and optimize costs. Discover key insights and best practices to enhance your data strategy with cutting-edge serverless technologies.

Help for FinOps Leaders – How the Lakehouse Optimizer can assist with your Lakehouse

Discover how FinOps leaders manage cloud and data costs effectively while maximizing business value. Learn how the Lakehouse Optimizer (LHO) addresses common business problems through discovery, optimization, and operation.

Artificial Intelligence

Engineering

Data & Analytics

Strategy

Manufacturing

Retail

Health & Life Sciences

Financial Services

Databricks
Center of Excellence

QuickStarts

Accelerated Data Migration

Unity Catalog Migration

Lakehouse Optimizer

Accelerated Snowflake to Databricks Migration

Our Approach

Careers

News

Events

Our Partners

Head pose estimation without keypoints in MXNET/GLUON

Share with your network

You may also enjoy

Classic vs. Serverless: Exploring Databricks’ latest Innovations

Help for FinOps Leaders – How the Lakehouse Optimizer can assist with your Lakehouse