ByteTrack: A tracking model that considers bounding boxes with low confidence.

Introducing “ByteTrack,” a machine learning model available with the ailia SDK. By using the machine learning models published in ailia SDK and ailia MODELS, which is an inference framework for edge devices, you can easily implement AI functionalities into applications.

About ByteTrack:

ByteTrack is an object tracking model that was released in October 2021. By applying ByteTrack to the bounding boxes of people detected with YOLOX, it can assign IDs to individuals. It surpasses other tracking models such as SiamMOT and Transformer-based models, achieving state-of-the-art (SoTA) performance.

Source：https://github.com/ifzhang/ByteTrack

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtain…

arxiv.org

GitHub – ifzhang/ByteTrack: ByteTrack: Multi-Object Tracking by Associating Every Detection Box

ByteTrack is a simple, fast and strong multi-object tracker. ByteTrack: Multi-Object Tracking by Associating Every…

github.com

The Architecture of ByteTrack

In multi-object tracking (MOT), after performing object detection with models such as YOLOX, the tracking algorithm is used to follow objects across frames. However, in real-world applications, there is a problem where the results of detection are imperfect, leading to undetected objects.

Specifically, traditional methods involve deleting bounding boxes with low confidence values at the object detection stage. This is due to a trade-off: while adopting low-confidence bounding boxes can improve the true positive rate, it also leads to false positives.

However, there is a question about whether all low-confidence bounding boxes should be discarded. Even if a bounding box has a low confidence value, in some cases, there may actually be an object present. Deleting these objects can degrade tracking performance.

The issue is illustrated in the following figure. In frame t1, four objects with confidence values above 0.5 are tracked. However, in frames t2 and t3, occlusion occurs, causing the score of the red object to drop from 0.8 to 0.4, and then from 0.4 to 0.1. As a result, this object is ignored, leading to missed detections.

Source：https://arxiv.org/pdf/2110.06864.pdf

In ByteTrack, to address this issue, a motion model that uses a queue of tracklets representing the tracked objects is employed, considering bounding boxes with low confidence values. This approach matches low-confidence bounding boxes and resolves the problem of missed detections.

During the matching process, the BYTE algorithm is used. First, the next frame positions of objects in the tracklets are predicted using the Kalman Filter, and these are matched with high-score detection boxes based on Motion Similarity. Motion Similarity calculates scores using the Intersection over Union (IoU) which indicates the overlap of objects. This corresponds to step (2) in the figure above.

Next, a second stage of matching is conducted. Objects in tracklets that could not be matched previously (the red box in the figure above) are then matched with detection boxes that have low confidence values. This corresponds to step (3) in the figure above.

The algorithm details are as follows: it operates quickly due to its simplicity as a tracking algorithm that utilizes the Kalman Filter.

Source：https://arxiv.org/pdf/2110.06864.pdf

Despite its simplicity, ByteTrack achieves state-of-the-art (SoTA) performance.

Source：https://arxiv.org/pdf/2110.06864.pdf

Here are examples of detections in various benchmarks.

Source：https://arxiv.org/pdf/2110.06864.pdf

Here is a performance comparison with traditional methods.

Source：https://arxiv.org/pdf/2110.06864.pdf

The object detection model used is YOLOX, which has been trained on the MOT17 and MOT20 datasets. The recognition resolution of YOLOX is 1440×800 for MOT17 and 1600×896 for MOT20. Therefore, while ByteTrack itself operates quickly, the load for object detection is high.

Comparison with DeepSort

In DeepSort, a ReID model is used to associate people across frames, and for individuals who cannot be linked, the Kalman Filter is used to predict bounding box movements for frame-to-frame association. However, associations are made only with bounding boxes that have high confidence values.

In ByteTrack, instead of using ReID, it relies solely on the motion prediction of bounding boxes calculated by the Kalman Filter for associating people across frames. Technically, this approach is closer to Sort, which precedes DeepSort. However, ByteTrack improves performance by performing associations in two stages: first with bounding boxes that have high confidence values, and then with those that have lower confidence values.

How to Use ByteTrack

To use ByteTrack with the ailia SDK, use the following command:

$ python3 bytetrack.py -v 0

Please install the ‘lap’ library, as it is required as a dependency.

pip3 install lap

To make it operate faster, use the -m option to change the object detection model.

$ python3 bytetrack.py -v 0 -m yolox_s

ailia-models/object_tracking/bytetrack at master · axinc-ai/ailia-models

(Video from https://vimeo.com/60139361) This model requires additional module. Automatically downloads the onnx and…

github.com

AX Corporation is developing the ailia SDK, a cross-platform software development kit that utilizes GPUs for high-speed inference, positioning itself as a company specializing in practical AI implementation. From consulting to model creation, SDK provision, development of applications and systems utilizing AI, to support services, AX Corporation offers a total solution for AI-related needs. Please feel free to contact us for inquiries.