{"id":2473,"date":"2021-06-28T09:00:31","date_gmt":"2021-06-28T01:00:31","guid":{"rendered":"https:\/\/blog.ailia.ai\/uncategorized\/blazepose-skeletal-detection-model-capable-of-obtaining-3d-coordinates\/"},"modified":"2025-05-20T14:46:30","modified_gmt":"2025-05-20T06:46:30","slug":"blazepose-skeletal-detection-model-capable-of-obtaining-3d-coordinates","status":"publish","type":"post","link":"https:\/\/blog.ailia.ai\/en\/tips-en\/blazepose-skeletal-detection-model-capable-of-obtaining-3d-coordinates\/","title":{"rendered":"BlazePose : A 3D Pose Estimation Model"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\" id=\"3f0c\"><strong>Overview<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c7ca\"><em>BlazePose\u00a0<\/em>(Full Body) is a pose detection model developed by Google that can compute (x,y,z) coordinates of 33 skeleton keypoints. It can be used for example in fitness applications.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"480\" height=\"640\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-47.jpg\" alt=\"\" class=\"wp-image-287\"\/><figcaption class=\"wp-element-caption\">\u51fa\u5178\uff1a<a href=\"https:\/\/pixabay.com\/ja\/photos\/%E5%A5%B3%E3%81%AE%E5%AD%90-%E7%BE%8E%E3%81%97%E3%81%84-%E8%8B%A5%E3%81%84-%E3%83%9B%E3%83%AF%E3%82%A4%E3%83%88-5204299\/\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/pixabay.com\/ja\/photos\/%E5%A5%B3%E3%81%AE%E5%AD%90-%E7%BE%8E%E3%81%97%E3%81%84-%E8%8B%A5%E3%81%84-%E3%83%9B%E3%83%AF%E3%82%A4%E3%83%88-5204299\/<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/arxiv.org\/abs\/2006.10204?source=post_page-----a6c588013009--------------------------------\" rel=\"noreferrer noopener\" target=\"_blank\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/arxiv.org\/abs\/2006.10204?source=post_page-----a6c588013009--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\">BlazePose: On-device Real-time Body Pose <\/a><a href=\"https:\/\/arxiv.org\/abs\/2006.10204\" target=\"_blank\" rel=\"noreferrer noopener\">tracking<\/a><a href=\"https:\/\/ai.googleblog.com\/2020\/08\/on-device-real-time-body-pose-tracking.html?source=post_page-----a6c588013009--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/ai.googleblog.com\/2020\/08\/on-device-real-time-body-pose-tracking.html\" target=\"_blank\" rel=\"noreferrer noopener\">On-device, Real-time Body Pose Tracking with MediaPipe BlazePose<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"5820\"><strong>BlazePose input and output<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"37a8\"><em>BlazePose&nbsp;<\/em>consists of two machine learning models: a&nbsp;<em>Detector&nbsp;<\/em>and an&nbsp;<em>Estimator<\/em>. The&nbsp;<em>Detector&nbsp;<\/em>cuts out the human region from the input image, while the&nbsp;<em>Estimator&nbsp;<\/em>takes a 256&#215;256 resolution image of the detected person as input and outputs the keypoints.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9981\"><em>BlazePose&nbsp;<\/em>outputs the 33 keypoints according the following ordering convention. This is more points than the commonly used 17 keypoints of the COCO dataset.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"900\" height=\"505\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-46.png\" alt=\"\" class=\"wp-image-286\"\/><figcaption class=\"wp-element-caption\">BlazePose keypoints (Source: <a href=\"https:\/\/developers.google.com\/ml-kit\/vision\/pose-detection\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/developers.google.com\/ml-kit\/vision\/pose-detection<\/a>)<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1cc9\"><strong>Architecture<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"de63\">The&nbsp;<em>Detector&nbsp;<\/em>is an Single-Shot Detector(SSD) based architecture. Given an input image (1,224,224,3), it outputs a bounding box (1,2254,12) and a confidence score (1,2254,1). The 12 elements of the bounding box are of the form (x,y,w,h,kp1x,kp1y,\u2026,kp4x,kp4y), where kp1x to kp4y are additional keypoints. Each one of the 2254 elements has its own anchor, anchor scale and offset need to be applied.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"71ca\">There are two ways to use the&nbsp;<em>Detector<\/em>. In&nbsp;<em>box mode<\/em>, the bounding box is determined from its position (x,y) and size (w,h). In&nbsp;<em>alignment mode<\/em>, the scale and angle are determined from (kp1x,kp1y) and (kp2x,kp2y), and bounding box including rotation can be predicted.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"210\" height=\"210\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-2.jpeg\" alt=\"\" class=\"wp-image-284\"\/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/ai.googleblog.com\/2020\/08\/on-device-real-time-body-pose-tracking.html\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/ai.googleblog.com\/2020\/08\/on-device-real-time-body-pose-tracking.html<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5948\">The\u00a0<em>Estimator\u00a0<\/em>uses heatmap for training, but computes keypoints directly without using heatmap for faster inference.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"440\" height=\"270\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-3.jpeg\" alt=\"\" class=\"wp-image-285\"\/><figcaption class=\"wp-element-caption\">Tracking network architecture: regression with heatmap supervision (Source: <a href=\"https:\/\/ai.googleblog.com\/2020\/08\/on-device-real-time-body-pose-tracking.html\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/ai.googleblog.com\/2020\/08\/on-device-real-time-body-pose-tracking.html<\/a>)<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"1339\">The first output of the&nbsp;<em>Estimator<\/em>&nbsp;is (1,195) landmarks , the second output is (1,1) flags. The landmarks are made of 165 elements for the (<em>x,y,z,visibility,presence<\/em>) for every 33 keypoints .<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a4b5\">The&nbsp;<em>z<\/em>-values are based on the person\u2019s hips, with keypoints being between the hips and the camera when the value is negative, and behind the hips when the value is positive.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"60d7\">The&nbsp;<em>visibility&nbsp;<\/em>and&nbsp;<em>presence&nbsp;<\/em>are stored in the range of [<em>min_float,max_float<\/em>] and are converted to probability by applying a sigmoid function. The&nbsp;<em>visibility&nbsp;<\/em>returns the probablity of keypoints that exist in the frame and are not occluded by other objects.&nbsp;<em>presence&nbsp;<\/em>returns the probablity of keypoints that exist in the frame.<a href=\"https:\/\/drive.google.com\/file\/d\/10WlcTvrQnR_R2TdTmKw0nkyRLqrwNkWU\/preview?source=post_page-----a6c588013009--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/drive.google.com\/file\/d\/10WlcTvrQnR_R2TdTmKw0nkyRLqrwNkWU\/preview\" target=\"_blank\" rel=\"noreferrer noopener\">Model Card BlazePose GHUM 3D.pdf<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"eac3\"><strong>Usage<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"89a4\">Use the following command to run\u00a0<em>BlazePose (Full Body)<\/em>\u00a0with ailia SDK.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><code>$ python3 blazepose-fullbody.py -v 0<\/code><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/pose_estimation_3d\/blazepose-fullbody?source=post_page-----a6c588013009--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/pose_estimation_3d\/blazepose-fullbody\" target=\"_blank\" rel=\"noreferrer noopener\">ailia-models\/pose_estimation_3d\/blazepose-fullbody at master \u00b7 axinc-ai\/ailia-models<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b4e3\">Here is a result on a sample video. The size of the circles at keypoints indicates the z-value.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"ailia MODELS : BlazePose\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/bPnO7d4ofb8?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"09ef\">The\u00a0<em>BlazePose (Upper Body)<\/em>\u00a0can also be used to estimate only the upper body. Initially,\u00a0<em>MediaPipe\u00a0<\/em>released only the upper body model, and later the full body model . The specifications of the full body and upper body models are different, for example, the detector resolution is 128&#215;128 for the upper body model.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><code style=\"white-space: normal;\">$ python3 blazepose.py -v 0<\/code><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"09ef\"><a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/pose_estimation\/blazepose?source=post_page-----a6c588013009--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\">axinc-ai\/ailia-models<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"51ef\"><a href=\"https:\/\/axinc.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">ax Inc.<\/a>&nbsp;has developed&nbsp;<a href=\"https:\/\/ailia.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">ailia SDK<\/a>, which enables cross-platform, GPU-based rapid inference.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"51ef\">ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to\u00a0<a href=\"https:\/\/axinc.jp\/en\/\" target=\"_blank\" rel=\"noreferrer noopener\">contact us<\/a>\u00a0for any inquiry.<a href=\"https:\/\/medium.com\/@kyakuno?source=post_page-----a6c588013009--------------------------------\"><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview BlazePose\u00a0(Full Body) is a pose detection model developed by Google that can compute (x,y,z) coordinates of 33 skeleton keypoints. It can be used for example in fitness applications. BlazePose: On-device Real-time Body Pose tracking On-device, Real-time Body Pose Tracking with MediaPipe BlazePose BlazePose input and output BlazePose&nbsp;consists of two machine learning models: a&nbsp;Detector&nbsp;and an&nbsp;Estimator. The&nbsp;Detector&nbsp;cuts out the human region from the input image, while the&nbsp;Estimator&nbsp;takes a 256&#215;256 resolution image of the detected person as input and outputs the keypoints. BlazePose&nbsp;outputs the 33 keypoints according the following ordering convention. This is more points than the commonly used 17 keypoints of the COCO dataset. Architecture The&nbsp;Detector&nbsp;is an Single-Shot Detector(SSD) based architecture. [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":2393,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[255],"tags":[266],"class_list":["post-2473","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tips-en","tag-ailiamodels-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2473","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/comments?post=2473"}],"version-history":[{"count":1,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2473\/revisions"}],"predecessor-version":[{"id":2475,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2473\/revisions\/2475"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/media\/2393"}],"wp:attachment":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/media?parent=2473"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/categories?post=2473"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/tags?post=2473"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}