Fast Vehicle Detection and Tracking on Fisheye Traffic Monitoring Video Using CNN and Bounding Box Propagation

Sandy Ardianto, Hsueh-Ming Hang, and Wen-Huang Cheng

National Yang Ming Chiao Tung University


1. Video visualization

Green box: ground-truth
Blue box: YOLOv5 detection
Yellow box: bounding box propagation

fisheye-day-test-30072020_01_fisheye_day_test

fisheye-day-test-30072020_02_fisheye_day_test

fisheye-day-test-30072020_03_fisheye_day_test

fisheye-night-test-30072020_CLIP_20200628-210253

fisheye-night-test-30072020_CLIP_20200628-210808


Resources

Citation: Ardianto, Sandy, Hsueh-Ming Hang, and Wen-Huang Cheng. "Fast Vehicle Detection and Tracking on Fisheye Traffic Monitoring Video Using CNN and Bounding Box Propagation." In 2022 IEEE International Conference on Image Processing (ICIP), pp. 1891-1895. IEEE, 2022.

Bibtex:

@inproceedings{ardianto2022fast,
                    title={Fast Vehicle Detection and Tracking on Fisheye Traffic Monitoring Video Using CNN and Bounding Box Propagation},
                    author={Ardianto, Sandy and Hang, Hsueh-Ming and Cheng, Wen-Huang},
                    booktitle={Proceedings of the International Conference on Image Processing (ICIP)},
                    pages={1891--1895},
                    year={2022},
                    organization={IEEE}
                  }

Dataset Download:


2. Experiments

In our experiments, the image is resized to 640x640 and fed into the YOLOv5 object detector. The initial model was pretrained with the COCO dataset. The speed comparison is measured on a PC with an Intel i9-9900X CPU and a single NVIDIA 1080Ti GPU. Experiment settings are as follows: (or stated otherwise)

1.1. Detection rates in 3 regions for YOLOv5 and our scheme

It is clear in Table 1 the detection rates of the outer ring are much lower (than the total average), particularly, on the Night video.

Table 1: YOLOv5 on each region comparison
AP50 Night Day Both
YOLOv5 +BBox Prop YOLOv5 +BBox Prop YOLOv5 +BBox Prop
Inner circle 0.892 0.886 0.917 0.911 0.864 0.875
Middle ring 0.969 0.962 0.942 0.934 0.965 0.955
Outer ring 0.425 0.655 0.723 0.878 0.695 0.842
Overall 0.541 0.720 (+17.9 pp) 0.820 0.882 (+6.2 pp) 0.772 0.854 (+8.2 pp)

1.2. Bounding box size filtering results

The bounding box size filtering process offers about 2 percentage points (pp) AP50 improvement for Night video and about 1 pp for all datasets.

Table 2: Bounding box size filtering results
Inputs Inner Ring Middle Ring Outer Ring # Cars AP50
Min. Size Max. Size Min. Size Max. Size Min. Size Max. Size Ground Truth Before After Before After
Night 113 162 92 132 53 76 19,533 21,423 20,352 0.541 0.563
Day 217 311 166 238 94 135 22,973 24,195 23,846 0.820 0.826
Both 211 302 133 191 88 127 42,506 45,115 42,702 0.772 0.781

1.3. Modality variation

If we use different input modality for our bounding box propagation schemes, the results are shown in Table 3. If every frame (including both keyframe and intermediate frame) is RGB YOLO detection, the detection accuracy is 1 pp higher but its speed is much slower.

Table 3. Modality variation results
Keyframe Intermediate Frame Ap50 Speed (fps)
RGB RGB 0.863 30
RGB FD 0.854 58
FD RGB 0.783 46
FD FD 0.676 84

1.4. Ablation study

We propose three bounding box propagation algorithms ((a) Keyframe to Intermediate-frame (K → I), (b) Intermediate-frame to Keyframe (I → K), and (c) High Confidence Car (HCC) propagation). Table 4 provides the performance of all possible combinations of using these three algorithms. We notice that a single propagation algorithm may not be effective but their combined use shows better performance than the individual.

Table 4. Evaluation result with various features
K → I I → K HCC AP50 Speed (fps)
0.759 74
0.771 68
0.728 69
0.753 66
0.778 65
0.802 62
0.757 62
0.854 58

1.5. Car validation check network

We try different models and input size of the neural network used by the car validation checking process.

Table 5. Car validation network results on various models and input sizes
Model Inputs Network Accuracy AP50 Speed (fps)
ResNet18 32x32 96.8 0.848 62
64x64 97.2 0.853 60
128x128 98.9 0.854 58
ResNet50 32x32 99.0 0.856 54
64x64 99.2 0.857 53
128x128 99.5 0.862 52

1.6. Parameters of the High Confidence Car (HCC) propagation

The HCC propagation algorithm has three parameters: the threshold used to select the high confidence car and the two upper limits in the stopping rule: the existing car attempts and the validation failure attempts. As shown in Table 6, a confidence threshold around 0.8 gives the best results (Table 6). Tables 7 and 8 shows that there is nearly 1 pp accuracy improvement when the validation failure limit increases from one to two. On the other hand, the existing car checking limit provides only a small improvement, particularly when its value is higher than 2. For the upper limits of the attempts, Fig. 1 indicates a trade-off between accuracy and speed. It seems that the accuracy saturates if the attempt limit is higher than 3. Hence, we choose 3 as the upper limit for both the existing car and validation failure attempts.

Table 6. Threshold variation on high confidence car propagation

Threshold AP50
0.5 0.684
0.6 0.795
0.7 0.837
0.8 0.854
0.9 0.853

Table 7. Existing and failure attempts limit - AP50

Existing
Fail
1 2 3 4 5
1 0.831 0.839 0.845 0.849 0.852
2 0.839 0.847 0.851 0.854 0.856
3 0.845 0.851 0.854 0.856 0.858
4 0.849 0.854 0.856 0.858 0.859
5 0.852 0.856 0.858 0.859 0.859

Table 8. Existing and failure attempts limit - Speed (fps)

Existing
Fail
1 2 3 4 5
1 64 63 62 61 59
2 63 62 61 59 57
3 62 61 59 57 55
4 61 59 57 54 53
5 59 57 55 53 51


Fig. 1. High confidence car performance and speed trade-off
(Same number of existing and fail attempts)


3. Visualization results

Box color scheme:

Green box: ground-truth, blue box: YOLOv5 detection, yellow box: new box produced by propagation


(Top) YOLOv5 detection and ground-truth

(Middle) YOLOv5 detection, ground-truth, and new bboxes produced by the propagation process

(Bottom) The entire fisheye image: the red window indicates the event of interest.



Fig. 2. Cars occluded by a tree are often undetected by YOLO (red window). They can be recovered by a local propagation pass.



Fig. 3. When waiting for traffic light, cars close together (in the red window) are not detected. The missing cars are now detected by a local propagation pass.



Fig. 4. Cars are not detected by YOLO (in the red window) due to fisheye distortion. They are recovered by the High Confidence Car propagation.



Fig. 5. Cars are not detected by YOLO due to their bright headlight (in red window). They are recovered by the backward pass of High Confidence Car propagation