{"id":2376,"date":"2021-01-14T09:00:16","date_gmt":"2021-01-14T01:00:16","guid":{"rendered":"https:\/\/blog.ailia.ai\/%e6%9c%aa%e5%88%86%e9%a1%9e\/hope-net-machine-learning-model-for-head-pose-estimation\/"},"modified":"2025-05-14T15:31:48","modified_gmt":"2025-05-14T07:31:48","slug":"hope-net-machine-learning-model-for-head-pose-estimation","status":"publish","type":"post","link":"https:\/\/blog.ailia.ai\/en\/tips-en\/hope-net-machine-learning-model-for-head-pose-estimation\/","title":{"rendered":"HOPE-Net : A Machine Learning Model for Estimating Face Orientation"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\" id=\"ca4e\"><strong>Overview<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c577\"><em>HOPE-Net&nbsp;<\/em>is a machine learning model released in October 2017 which compute the angles in three axes (yaw, pitch, and roll) of a face in an input image.<a href=\"https:\/\/arxiv.org\/abs\/1710.00925?source=post_page-----6db21979f935--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/arxiv.org\/abs\/1710.00925\" target=\"_blank\" rel=\"noreferrer noopener\">Fine-Grained Head Pose Estimation Without Keypoints<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"86be\"><strong>Architecture<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ec0f\">Face orientation detection is an important technology used in gaze detection and recognition of which objects is being watched in a scene.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ecd0\">Face orientation detection usually works by detecting key points of the target face and converting those points from 2D to 3D using a standard head model. However, there is a problem that the result depends on the accuracy of the face key points, and the need for ad-hoc fitting.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ec0f\"><em>HOPE-Net<\/em>&nbsp;uses&nbsp;<em>multi-loss convolutional neural networks<\/em>&nbsp;to detect the orientation of faces in a single shot. Using the face detected by the face detector as input,&nbsp;<em>ResNet50&nbsp;<\/em>extracts features and FC Layer calculates yaw, pitch, and roll.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"833\" height=\"336\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-52.png\" alt=\"\" class=\"wp-image-306\"\/><figcaption class=\"wp-element-caption\">\u51fa\u5178\uff1a<a href=\"https:\/\/arxiv.org\/pdf\/1710.00925\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/arxiv.org\/pdf\/1710.00925<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a6ea\">HOPE-Net performs best on AFLW2000, a dataset made of the first 2000 images of the&nbsp;<em>Annotated Facial Landmarks in the Wild (AFLW)&nbsp;<\/em>dataset, which have been re-annotated with 68 3D landmarks.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/v2\/resize:fit:1162\/1*V2z389SqetIO1IIf97YjIA.png\" alt=\"\"\/><figcaption class=\"wp-element-caption\">\u51fa\u5178\uff1a<a href=\"https:\/\/arxiv.org\/pdf\/1710.00925\" rel=\"noreferrer noopener\" target=\"_blank\">https:\/\/arxiv.org\/pdf\/1710.00925<\/a><\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"9c26\"><strong>HOPE-Net Usage<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a250\">Use the following command to run HOPE-Net and detect face orientation from a web camera.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ python3 hopenet.py -v 0<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d3a0\">You can also use a faster version that uses&nbsp;<em>ShuffleNetV2&nbsp;<\/em>instead of&nbsp;<em>ResNet50&nbsp;<\/em>with the following command.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ python3 blazehand.py --lite -v 0\n<a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/face_recognition\/hopenet?source=post_page-----6db21979f935--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/face_recognition\/hopenet\" target=\"_blank\" rel=\"noreferrer noopener\">axinc-ai\/ailia-models<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"bba8\">Here is the kind of result you can expect.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-provider-youtube wp-block-embed-youtube\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" width=\"560\" height=\"315\" src=\"https:\/\/www.youtube.com\/embed\/DA1SACACK0k?si=Bn83MlDdXKwr5nCI\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"619e\"><a href=\"https:\/\/axinc.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">ax Inc.<\/a>&nbsp;has developed&nbsp;<a href=\"https:\/\/ailia.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">ailia SDK<\/a>, which enables cross-platform, GPU-based rapid inference.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"619e\">ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to&nbsp;<a href=\"https:\/\/axinc.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">contact us<\/a>&nbsp;for any inquiry.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/medium.com\/@kyakuno?source=post_page-----6db21979f935--------------------------------\"><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview HOPE-Net&nbsp;is a machine learning model released in October 2017 which compute the angles in three axes (yaw, pitch, and roll) of a face in an input image. Fine-Grained Head Pose Estimation Without Keypoints Architecture Face orientation detection is an important technology used in gaze detection and recognition of which objects is being watched in a scene. Face orientation detection usually works by detecting key points of the target face and converting those points from 2D to 3D using a standard head model. However, there is a problem that the result depends on the accuracy of the face key points, and the need for ad-hoc fitting. HOPE-Net&nbsp;uses&nbsp;multi-loss convolutional neural networks&nbsp;to [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":2108,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[255],"tags":[266],"class_list":["post-2376","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tips-en","tag-ailiamodels-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2376","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/comments?post=2376"}],"version-history":[{"count":2,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2376\/revisions"}],"predecessor-version":[{"id":2408,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2376\/revisions\/2408"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/media\/2108"}],"wp:attachment":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/media?parent=2376"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/categories?post=2376"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/tags?post=2376"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}