{"id":2512,"date":"2021-10-25T09:00:30","date_gmt":"2021-10-25T01:00:30","guid":{"rendered":"https:\/\/blog.ailia.ai\/uncategorized\/bytetrack-tracking-model-that-also-takes-into-account-the-low-accuracy-boundingbox\/"},"modified":"2025-05-20T17:41:26","modified_gmt":"2025-05-20T09:41:26","slug":"bytetrack-tracking-model-that-also-takes-into-account-the-low-accuracy-boundingbox","status":"publish","type":"post","link":"https:\/\/blog.ailia.ai\/en\/tips-en\/bytetrack-tracking-model-that-also-takes-into-account-the-low-accuracy-boundingbox\/","title":{"rendered":"ByteTrack : Tracking model that also considers low accuracy bounding boxes"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\" id=\"a913\"><strong>Overview<\/strong><\/h3>\n\n\n\n<p id=\"f84d\"><em>ByteTrack<\/em>\u00a0is a model for object tracking published in October 2021. By applying\u00a0<em>ByteTrack\u00a0<\/em>to the bounding box of people detected by\u00a0<a href=\"https:\/\/medium.com\/axinc-ai\/yolox-object-detection-model-exceeding-yolov5-d6cea6d3c4bc\"><em>YOLOX<\/em><\/a>, you can assign a unique ID to each person.\u00a0<em>ByteTrack<\/em>\u00a0is currently the state-of-the-art and outperforms\u00a0<em>SiamMOT\u00a0<\/em>and transformer-based tracking models.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"829\" height=\"484\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-23.jpg\" alt=\"\" class=\"wp-image-226\"\/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/github.com\/ifzhang\/ByteTrack\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/ifzhang\/ByteTrack<\/a><\/figcaption><\/figure>\n\n\n\n<p><a href=\"https:\/\/arxiv.org\/abs\/2110.06864?source=post_page-----244b994d5afb--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><a href=\"https:\/\/arxiv.org\/abs\/2110.06864\" target=\"_blank\" rel=\"noreferrer noopener\">ByteTrack: Multi-Object Tracking by Associating Every Detection Box<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/ifzhang\/ByteTrack?source=post_page-----244b994d5afb--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><a href=\"https:\/\/github.com\/ifzhang\/ByteTrack\" target=\"_blank\" rel=\"noreferrer noopener\">GitHub &#8211; ifzhang\/ByteTrack<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3413\"><strong>Architecture<\/strong><\/h3>\n\n\n\n<p id=\"13df\">In Multi-Object Tracking (MOT), object detection is first performed using models such as&nbsp;<a href=\"https:\/\/medium.com\/axinc-ai\/yolox-object-detection-model-exceeding-yolov5-d6cea6d3c4bc\"><em>YOLOX<\/em><\/a>, and a tracking algorithm is used to track objects in-between frames. However, in real-world applications, the result of object detection is sometimes incomplete, resulting in objects being ignored.<\/p>\n\n\n\n<p id=\"9b31\">Most object detection algorithms ignore bounding boxes with low confidence values. This is because there is a trade-off since accepting bounding boxes with low confidence values will improve the detection rate (True Positive), but will also cause False Positive.<\/p>\n\n\n\n<p id=\"46c7\">However, the question whether all bounding boxes with low confidence values should be removed or not is relevant. Even with a low confidence value, the object may still exist, and ignoring it would decrease the efficiency of the tracking model.<\/p>\n\n\n\n<p id=\"cd0c\">The following figure illustrates this problem. In frame\u00a0<code>t1<\/code>, four people with confidence values above 0.5 are tracked. However, at frames<code>t2<\/code>\u00a0and\u00a0<code>t3<\/code>, the score of the person with the red bounding box drops from 0.8 to 0.4 and then further down from 0.4 to 0.1 due to occlusion. As a result, this person is ignored.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"570\" height=\"771\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/2021\/10\/1mzv_kvExvX23A8kEKIQC5A.webp\" alt=\"\" class=\"wp-image-2514\"\/><figcaption class=\"wp-element-caption\">Source:\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/2110.06864.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/pdf\/2110.06864.pdf<\/a><\/figcaption><\/figure>\n\n\n\n<p id=\"cd0c\"><em>ByteTrack&nbsp;<\/em>solves this problem by using a motion model that manages a queue called&nbsp;<em>tracklets<\/em>&nbsp;to store objects being tracked, and performs tracking and matching between bounding boxes with low confidence values.<\/p>\n\n\n\n<p id=\"b4e9\">In the matching process, an algorithm called&nbsp;<em>BYTE<\/em>&nbsp;is used. First, the positions in the next frame of objects in the&nbsp;<em>tracklets<\/em>&nbsp;are predicted using the&nbsp;<em>Kalman filter<\/em>, then they are matched with high-score detected bounding boxes using&nbsp;<em>motion similarity<\/em>. With&nbsp;<em>motion similality<\/em>, the score is computed by Interaction over Union (IoU), which indicates the amount of overlap between objects (step (b) in the above image shows the results of this first matching).<\/p>\n\n\n\n<p id=\"5120\">Next, the algorithm performs a second matching. Objects in the&nbsp;<em>tracklets&nbsp;<\/em>that could not be matched (eg. red boxes in the previous image), are then matched with detected bounding boxes with lower confidence values (step (c) in the above image shows the results of this second matching).<\/p>\n\n\n\n<p id=\"cd0c\">The details of the algorithm are described below. It is a simple tracking algorithm using Kalman filter, therefore it is very fast.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"571\" height=\"1078\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-22.png\" alt=\"\" class=\"wp-image-224\"\/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/arxiv.org\/pdf\/2110.06864.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/pdf\/2110.06864.pdf<\/a><\/figcaption><\/figure>\n\n\n\n<p id=\"18f5\">Despite the simplicity of the method,\u00a0<em>ByteTrack\u00a0<\/em>achieves SoTA object tracking.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"570\" height=\"551\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/2021\/10\/16wnJTwYcHwJXCAt6VrTfKg.webp\" alt=\"\" class=\"wp-image-2516\"\/><figcaption class=\"wp-element-caption\">Source:\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/2110.06864.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/pdf\/2110.06864.pdf<\/a><\/figcaption><\/figure>\n\n\n\n<p id=\"18f5\">Here is an example of detection in each benchmark.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1128\" height=\"572\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-23-1.jpg\" alt=\"\" class=\"wp-image-227\"\/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/arxiv.org\/pdf\/2110.06864.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/pdf\/2110.06864.pdf<\/a><\/figcaption><\/figure>\n\n\n\n<p id=\"5854\">And a performance comparison with conventional methods.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1128\" height=\"572\" src=\"https:\/\/blog.ailia.ai\/wp-content\/uploads\/image-21.png\" alt=\"\" class=\"wp-image-222\"\/><figcaption class=\"wp-element-caption\">Source: <a href=\"https:\/\/arxiv.org\/pdf\/2110.06864.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/arxiv.org\/pdf\/2110.06864.pdf<\/a><\/figcaption><\/figure>\n\n\n\n<p id=\"8309\">The object detection step is based on\u00a0<a href=\"https:\/\/medium.com\/axinc-ai\/yolox-object-detection-model-exceeding-yolov5-d6cea6d3c4bc\"><em>YOLOX<\/em><\/a>\u00a0trained on the\u00a0<em>MOT19\u00a0<\/em>and\u00a0<em>MOT20<\/em>\u00a0datasets, where the recognition resolution is 1440&#215;800 for\u00a0<em>MOT17\u00a0<\/em>and 1600&#215;896 for\u00a0<em>MOT20<\/em>. Therefore,\u00a0<em>ByteTrack\u00a0<\/em>itself runs fast, but the object detection processing time is rather high.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"773f\"><strong>Comparison with DeepSort<\/strong><\/h3>\n\n\n\n<p id=\"9d72\"><a href=\"https:\/\/medium.com\/axinc-ai\/deepsort-a-machine-learning-model-for-tracking-people-1170743b5984\"><em>DeepSort<\/em><\/a><em>&nbsp;<\/em>uses&nbsp;<em>ReID<\/em>&nbsp;identification model to link bounding boxes of detected people between frames, and for those who could not be linked,&nbsp;<em>Sort<\/em>&nbsp;uses the prediction of bounding box movement calculated by Kalman filter to link them between frames. However, this is only done for bounding boxes with high confidence values.<\/p>\n\n\n\n<p id=\"8363\"><em>ByteTrack&nbsp;<\/em>does not use&nbsp;<em>ReID<\/em>, but uses only the movement prediction of bounding boxes calculated using the Kalman filter to track people between frames. Therefore, it is technically similar to&nbsp;<em>Sort<\/em>&nbsp;step used in&nbsp;<em>DeepSort<\/em>. However, performance have been improved by splitting the processing in two steps, the first one targeting the bounding boxes with high confidence values, the second one for the ones with low confidence values.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"118f\"><strong>Usage<\/strong><\/h3>\n\n\n\n<p id=\"91c4\"><em>ByteTrack<\/em>\u00a0can be used with ailia SDK by running the following command.<\/p>\n\n\n\n<p><code>$ python3 bytetrack.py -v 0<\/code><\/p>\n\n\n\n<p id=\"4409\">You will need to install the\u00a0<code>lap<\/code>\u00a0library as a dependency.<\/p>\n\n\n\n<p><code>pip3 install lap<\/code><\/p>\n\n\n\n<p id=\"2839\">To run faster, use the\u00a0<code>-m<\/code>\u00a0option to swap the object detection model for a lighter version of YOLOX, for example\u00a0<code>yolox_s<\/code>\u00a0in the command below.<\/p>\n\n\n\n<p><code>$ python3 bytetrack.py -v 0 -m yolox_s<\/code><\/p>\n\n\n\n<p><a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/object_tracking\/bytetrack?source=post_page-----244b994d5afb--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\"><\/a><a href=\"https:\/\/github.com\/axinc-ai\/ailia-models\/tree\/master\/object_tracking\/bytetrack\" target=\"_blank\" rel=\"noreferrer noopener\">ailia-models\/object_tracking\/bytetrack<\/a><\/p>\n\n\n\n<p id=\"40fb\"><a href=\"https:\/\/axinc.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">ax Inc.<\/a>&nbsp;has developed&nbsp;<a href=\"https:\/\/ailia.jp\/en\/\" rel=\"noreferrer noopener\" target=\"_blank\">ailia SDK<\/a>, which enables cross-platform, GPU-based rapid inference.<\/p>\n\n\n\n<p id=\"40fb\">ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to\u00a0<a href=\"https:\/\/axinc.jp\/en\/\" target=\"_blank\" rel=\"noreferrer noopener\">contact us<\/a>\u00a0for any inquiry.<a href=\"https:\/\/medium.com\/tag\/ailia-models?source=post_page-----244b994d5afb---------------ailia_models-----------------\"><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview ByteTrack\u00a0is a model for object tracking published in October 2021. By applying\u00a0ByteTrack\u00a0to the bounding box of people detected by\u00a0YOLOX, you can assign a unique ID to each person.\u00a0ByteTrack\u00a0is currently the state-of-the-art and outperforms\u00a0SiamMOT\u00a0and transformer-based tracking models. ByteTrack: Multi-Object Tracking by Associating Every Detection Box GitHub &#8211; ifzhang\/ByteTrack Architecture In Multi-Object Tracking (MOT), object detection is first performed using models such as&nbsp;YOLOX, and a tracking algorithm is used to track objects in-between frames. However, in real-world applications, the result of object detection is sometimes incomplete, resulting in objects being ignored. Most object detection algorithms ignore bounding boxes with low confidence values. This is because there is a trade-off since accepting [&hellip;]<\/p>\n","protected":false},"author":5,"featured_media":2440,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[255],"tags":[266],"class_list":["post-2512","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tips-en","tag-ailiamodels-en"],"acf":[],"_links":{"self":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2512","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/comments?post=2512"}],"version-history":[{"count":1,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2512\/revisions"}],"predecessor-version":[{"id":2518,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/posts\/2512\/revisions\/2518"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/media\/2440"}],"wp:attachment":[{"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/media?parent=2512"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/categories?post=2512"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.ailia.ai\/en\/wp-json\/wp\/v2\/tags?post=2512"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}