BlazePose: A skeleton detection model capable of obtaining 3D coordinates

Here’s an introduction to “BlazePose,” a machine learning model available for use with the ailia SDK. By utilizing machine learning models published in the ailia MODELS and the edge inference framework of the ailia SDK, you can easily implement AI functionality into your applications.

Overview of BlazePose

BlazePose (Full Body) is a skeleton detection model developed by Google, capable of obtaining 3D coordinates. It can retrieve 33 points of (x, y, z) coordinates, making it suitable for use in fitness applications and more.

Source：https://pixabay.com/ja/photos/%E5%A5%B3%E3%81%AE%E5%AD%90-%E7%BE%8E%E3%81%97%E3%81%84-%E8%8B%A5%E3%81%84-%E3%83%9B%E3%83%AF%E3%82%A4%E3%83%88-5204299/

BlazePose: On-device Real-time Body Pose tracking

We present BlazePose, a lightweight convolutional neural network architecture for human pose estimation that is…

arxiv.org

On-device, Real-time Body Pose Tracking with MediaPipe BlazePose

Pose estimation from video plays a critical role enabling the overlay of digital content and information on top of the…

ai.googleblog.com

Input and output of BlazePose

BlazePose consists of two machine learning models: Detector and Estimator. In Detector, it extracts the region of the person from the input image. In Estimator, it takes a 256×256 resolution image of a person as input and outputs keypoints.

BlazePose’s keypoints consist of the following 33 points, which is more than the 17 points commonly used in COCO:

Keypoints of BlazePose（Source：https://developers.google.com/ml-kit/vision/pose-detection）

Architecture of BlazePose

The architecture of Detector is based on an SSD (Single Shot Multibox Detector). It takes an input image of size (1,224,224,3) and outputs bounding boxes of size (1,2254,12) and confidence scores of size (1,2254,1). The 12 elements of the bounding boxes are in the format (x,y,w,h,kp1x,kp1y,…,kp4x,kp4y), where kp1x to kp4y represent additional keypoints. Each of the 2254 elements corresponds to an anchor, and you need to apply the scale and offset of the anchors.

There are two usage modes for the Detector. In the box mode, it determines the bounding box from (x,y) and (w,h). In the alignment mode, it determines the scale and angle from (kp1x,kp1y) and (kp2x,kp2y). In the alignment mode, it is possible to predict a bounding box that includes rotation.

Source：https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html

During training, Estimator utilizes HeatMaps, but during inference, it directly computes keypoints without using HeatMaps. This enables fast inference.

Tracking network architecture: regression with heatmap supervision（Source：https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html）

The first output of Estimator consists of Landmarks of size (1,195), and the second output consists of flags of size (1,1). The Landmarks contain 33 keypoints, each comprising (x,y,z,visibility,presence), totaling 165 elements.

The z-value is in “image pixels” scale and can be treated on the same scale as x and y. The z-value is relative to the hips, with negative values indicating keypoints between the hips and the camera and positive values indicating keypoints behind the hips.

Visibility and presence are stored in the range [min_float, max_float] and are transformed into probabilities by applying sigmoid. Visibility returns the probability of a keypoint being present in the frame and not occluded by other objects. Presence returns the probability of a keypoint being present in the frame.

Model Card BlazePose GHUM 3D.pdf

Edit description

drive.google.com

How to use BlazePose

To use BlazePose (Full Body), use the following command.

$ python3 blazepose-fullbody.py -v 0

ailia-models/pose_estimation_3d/blazepose-fullbody at master · axinc-ai/ailia-models

(Image from…

github.com

Here is an example of execution. The size of the circle indicates the z-value.

https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FbPnO7d4ofb8%3Ffeature%3Doembed&display_name=YouTube&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DbPnO7d4ofb8&image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FbPnO7d4ofb8%2Fhqdefault.jpg&key=a19fcc184b9711e1b4764040d3dc5c07&type=text%2Fhtml&schema=youtube

BlazePose (Upper Body) is also available for estimating only the upper body. Initially, MediaPipe released only the Upper Body model, and later, the Full Body model was released. There are differences in specifications between Full Body and Upper Body, such as the resolution of the Detector being 128×128 for Upper Body.

$ python3 blazepose.py -v 0

axinc-ai/ailia-models

(Image from…

github.com

AX Corporation is developing ailia SDK, a cross-platform AI inference engine that utilizes GPUs for high-speed processing. AX Corporation offers a total solution for AI, including consulting, model creation, SDK provision, development of AI-based applications and systems, and support. Please feel free to contact us for inquiries.