{"id":2486,"date":"2021-09-21T09:00:31","date_gmt":"2021-09-21T01:00:31","guid":{"rendered":"https:\/\/blog.ailia.ai\/?p=2486"},"modified":"2025-05-20T16:26:34","modified_gmt":"2025-05-20T08:26:34","slug":"3dobjectdetectionpytorch-3d-object-detection-model","status":"publish","type":"post","link":"https:\/\/blog.ailia.ai\/en\/tips-en\/3dobjectdetectionpytorch-3d-object-detection-model\/","title":{"rendered":"3DObjectDetectionPytorch : 3D Object Detection Model"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\" id=\"0021\"><strong>Overview<\/strong><\/h3>\n\n\n\n<p id=\"2a2a\"><em>3DObjectDetectionPytorch<\/em>\u00a0is a machine learning model that calculates 3D bounding boxes of objects. Other object detection models such as\u00a0<a href=\"https:\/\/medium.com\/axinc-ai\/yolov5-the-latest-model-for-object-detection-b13320ec516b\">YOLO<\/a>generally computes 2D bounding boxes, but this new model returns bounding boxes with depth information.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"720\" height=\"960\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-35.jpg\" alt=\"\" class=\"wp-image-256\"\/><figcaption class=\"wp-element-caption\">Source: Objectron dataset<\/figcaption><\/figure>\n\n\n\n<p><a href=\"https:\/\/github.com\/sovrasov\/3d-object-detection.pytorch?source=post_page-----8df18b8eb5d1--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><a href=\"https:\/\/github.com\/sovrasov\/3d-object-detection.pytorch\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub &#8211; sovrasov\/3d-object-detection.pytorch<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Architecture<\/strong><\/h3>\n\n\n\n<p id=\"b85c\"><em>3DObjectDetectionPytorch<\/em>\u00a0is capable of recognizing the following 9 classes.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p id=\"2639\">OBJECTRON_CLASSES = (\u2018bike\u2019, \u2018book\u2019, \u2018bottle\u2019, \u2018cereal_box\u2019, \u2018camera\u2019, \u2018chair\u2019, \u2018cup\u2019, \u2018laptop\u2019, \u2018shoe\u2019)<\/p>\n<\/blockquote>\n\n\n\n<p id=\"cdcb\">First, the 2D bounding box of the object is computed with&nbsp;<em>MobileNetV2 SSD<\/em>, and then the 3D bounding box is calculated with&nbsp;<em>MobileNetV3<\/em>&nbsp;regression model. This regression model takes an image (1,3,224,224) as input and returns 9 keypoints (x,y) for each class (9,1,9,2).<\/p>\n\n\n\n<p id=\"7e27\"><em>3DObjectDetectionPytorch<\/em>&nbsp;was trained on the&nbsp;<em>Objectron<\/em>&nbsp;dataset, which is publicly available from&nbsp;<em>Google<\/em>.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1296\" height=\"1034\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-35-1.jpg\" alt=\"\" class=\"wp-image-257\"\/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/github.com\/google-research-datasets\/Objectron\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/google-research-datasets\/Objectron<\/a><\/figcaption><\/figure>\n\n\n\n<p><a href=\"https:\/\/github.com\/google-research-datasets\/Objectron?source=post_page-----8df18b8eb5d1--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><a href=\"https:\/\/github.com\/google-research-datasets\/Objectron\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub \u2014 google-research-datasets\/Objectron<\/a><\/p>\n\n\n\n<p id=\"26e6\">The\u00a0<em>Objectron\u00a0<\/em>dataset is a dataset developed for AR development and contains 15K annotated videos and 4M annotated images.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"824\" height=\"197\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-34.png\" alt=\"\" class=\"wp-image-255\"\/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/github.com\/google-research-datasets\/Objectron\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/google-research-datasets\/Objectron<\/a><\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"8cec\"><strong>Usage<\/strong><\/h3>\n\n\n\n<p id=\"95e8\">Use the following command to run\u00a0<em>3DObjectDetectionPytorch\u00a0<\/em>with ailia SDK on a web camera video stream.<\/p>\n\n\n\n<p><code>$ python3 3d-object-detection.pytorch.py -v 0<\/code><\/p>\n\n\n\n<p id=\"7770\">Here is the result you can expect.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"ailia MODELS : 3DObjectDetectionPytorch\" width=\"500\" height=\"375\" src=\"https:\/\/www.youtube.com\/embed\/F8jjOikMFpQ?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p><a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/object_detection_3d\/3d-object-detection.pytorch?source=post_page-----8df18b8eb5d1--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/object_detection_3d\/3d-object-detection.pytorch\" target=\"_blank\" rel=\"noreferrer noopener\">ailia-models\/object_detection_3d\/3d-object-detection.pytorch<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"7a6c\"><strong>Related topics<\/strong><\/h3>\n\n\n\n<p id=\"e29a\">ailia MODELS also contains Google\u2019s\u00a0<em>mediapipe_objectron<\/em>\u00a0which is a model also trained using the Objectron dataset.<a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/object_detection_3d\/mediapipe_objectron?source=post_page-----8df18b8eb5d1--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/object_detection_3d\/mediapipe_objectron\" target=\"_blank\" rel=\"noreferrer noopener\">ailia-models\/object_detection_3d\/mediapipe_objectron<\/a><\/p>\n\n\n\n<p id=\"35f0\"><a href=\"https:\/\/axinc.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">ax Inc.<\/a>&nbsp;has developed&nbsp;<a href=\"https:\/\/ailia.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">ailia SDK<\/a>, which enables cross-platform, GPU-based rapid inference.<\/p>\n\n\n\n<p>ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to&nbsp;<a href=\"https:\/\/axinc.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">contact us<\/a>&nbsp;for any inquiry.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview 3DObjectDetectionPytorch\u00a0is a machine learning model that calculates 3D bounding boxes of objects. Other object detection models such as\u00a0YOLOgenerally computes 2D bounding boxes, but this new model returns bounding boxes with depth information. GitHub &#8211; sovrasov\/3d-object-detection.pytorch Architecture 3DObjectDetectionPytorch\u00a0is capable of recognizing the following 9 classes. OBJECTRON_CLASSES = (\u2018bike\u2019, \u2018book\u2019, \u2018bottle\u2019, \u2018cereal_box\u2019, \u2018camera\u2019, \u2018chair\u2019, \u2018cup\u2019, \u2018laptop\u2019, \u2018shoe\u2019) First, the 2D bounding box of the object is computed with&nbsp;MobileNetV2 SSD, and then the 3D bounding box is calculated with&nbsp;MobileNetV3&nbsp;regression model. This regression model takes an image (1,3,224,224) as input and returns 9 keypoints (x,y) for each class (9,1,9,2). 3DObjectDetectionPytorch&nbsp;was trained on the&nbsp;Objectron&nbsp;dataset, which is publicly available from&nbsp;Google. GitHub \u2014 google-research-datasets\/Objectron The\u00a0Objectron\u00a0dataset is [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":2422,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[255],"tags":[266],"class_list":["post-2486","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tips-en","tag-ailiamodels-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2486","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/comments?post=2486"}],"version-history":[{"count":2,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2486\/revisions"}],"predecessor-version":[{"id":2488,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2486\/revisions\/2488"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/media\/2422"}],"wp:attachment":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/media?parent=2486"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/categories?post=2486"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/tags?post=2486"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}