1. Video visualization
Green box: ground-truth
Blue box: YOLOv5 detection
Yellow box: bounding box propagation
fisheye-day-test-30072020_01_fisheye_day_test
fisheye-day-test-30072020_02_fisheye_day_test
fisheye-day-test-30072020_03_fisheye_day_test
fisheye-night-test-30072020_CLIP_20200628-210253
fisheye-night-test-30072020_CLIP_20200628-210808
Resources
Citation: Ardianto, Sandy, Hsueh-Ming Hang, and Wen-Huang Cheng. "Fast Vehicle Detection and Tracking on Fisheye Traffic Monitoring Video Using CNN and Bounding Box Propagation." In 2022 IEEE International Conference on Image Processing (ICIP), pp. 1891-1895. IEEE, 2022.
Bibtex:
@inproceedings{ardianto2022fast, title={Fast Vehicle Detection and Tracking on Fisheye Traffic Monitoring Video Using CNN and Bounding Box Propagation}, author={Ardianto, Sandy and Hang, Hsueh-Ming and Cheng, Wen-Huang}, booktitle={Proceedings of the International Conference on Image Processing (ICIP)}, pages={1891--1895}, year={2022}, organization={IEEE} }
Dataset Download:
2. Experiments
In our experiments, the image is resized to 640x640 and fed into the YOLOv5 object detector. The initial model was pretrained with the COCO dataset. The speed comparison is measured on a PC with an Intel i9-9900X CPU and a single NVIDIA 1080Ti GPU. Experiment settings are as follows: (or stated otherwise)
- Test data: Night video + Day video (both)
- Modality: Keyframe: RGB, Intermediate frame: frame difference
- Segment size: 3 (KF: 2, IF: 1)
- High confidence car propagation: threshold: 0.8, and success & fail limit: 3
- Car validation network: ResNet18, input size: 128x128
1.1. Detection rates in 3 regions for YOLOv5 and our scheme
It is clear in Table 1 the detection rates of the outer ring are much lower (than the total average), particularly, on the Night video.
Table 1: YOLOv5 on each region comparison
AP50 | Night | Day | Both | |||
---|---|---|---|---|---|---|
YOLOv5 | +BBox Prop | YOLOv5 | +BBox Prop | YOLOv5 | +BBox Prop | |
Inner circle | 0.892 | 0.886 | 0.917 | 0.911 | 0.864 | 0.875 |
Middle ring | 0.969 | 0.962 | 0.942 | 0.934 | 0.965 | 0.955 |
Outer ring | 0.425 | 0.655 | 0.723 | 0.878 | 0.695 | 0.842 |
Overall | 0.541 | 0.720 (+17.9 pp) | 0.820 | 0.882 (+6.2 pp) | 0.772 | 0.854 (+8.2 pp) |
1.2. Bounding box size filtering results
The bounding box size filtering process offers about 2 percentage points (pp) AP50 improvement for Night video and about 1 pp for all datasets.
Table 2: Bounding box size filtering results
Inputs | Inner Ring | Middle Ring | Outer Ring | # Cars | AP50 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Min. Size | Max. Size | Min. Size | Max. Size | Min. Size | Max. Size | Ground Truth | Before | After | Before | After | |
Night | 113 | 162 | 92 | 132 | 53 | 76 | 19,533 | 21,423 | 20,352 | 0.541 | 0.563 |
Day | 217 | 311 | 166 | 238 | 94 | 135 | 22,973 | 24,195 | 23,846 | 0.820 | 0.826 |
Both | 211 | 302 | 133 | 191 | 88 | 127 | 42,506 | 45,115 | 42,702 | 0.772 | 0.781 |
1.3. Modality variation
If we use different input modality for our bounding box propagation schemes, the results are shown in Table 3. If every frame (including both keyframe and intermediate frame) is RGB YOLO detection, the detection accuracy is 1 pp higher but its speed is much slower.
Table 3. Modality variation results
Keyframe | Intermediate Frame | Ap50 | Speed (fps) |
---|---|---|---|
RGB | RGB | 0.863 | 30 |
RGB | FD | 0.854 | 58 |
FD | RGB | 0.783 | 46 |
FD | FD | 0.676 | 84 |
1.4. Ablation study
We propose three bounding box propagation algorithms ((a) Keyframe to Intermediate-frame (K → I), (b) Intermediate-frame to Keyframe (I → K), and (c) High Confidence Car (HCC) propagation). Table 4 provides the performance of all possible combinations of using these three algorithms. We notice that a single propagation algorithm may not be effective but their combined use shows better performance than the individual.
Table 4. Evaluation result with various features
K → I | I → K | HCC | AP50 | Speed (fps) |
---|---|---|---|---|
0.759 | 74 | |||
• | 0.771 | 68 | ||
• | 0.728 | 69 | ||
• | 0.753 | 66 | ||
• | • | 0.778 | 65 | |
• | • | 0.802 | 62 | |
• | • | 0.757 | 62 | |
• | • | • | 0.854 | 58 |
1.5. Car validation check network
We try different models and input size of the neural network used by the car validation checking process.
Table 5. Car validation network results on various models and input sizes
Model | Inputs | Network Accuracy | AP50 | Speed (fps) |
---|---|---|---|---|
ResNet18 | 32x32 | 96.8 | 0.848 | 62 |
64x64 | 97.2 | 0.853 | 60 | |
128x128 | 98.9 | 0.854 | 58 | |
ResNet50 | 32x32 | 99.0 | 0.856 | 54 |
64x64 | 99.2 | 0.857 | 53 | |
128x128 | 99.5 | 0.862 | 52 |
1.6. Parameters of the High Confidence Car (HCC) propagation
The HCC propagation algorithm has three parameters: the threshold used to select the high confidence car and the two upper limits in the stopping rule: the existing car attempts and the validation failure attempts. As shown in Table 6, a confidence threshold around 0.8 gives the best results (Table 6). Tables 7 and 8 shows that there is nearly 1 pp accuracy improvement when the validation failure limit increases from one to two. On the other hand, the existing car checking limit provides only a small improvement, particularly when its value is higher than 2. For the upper limits of the attempts, Fig. 1 indicates a trade-off between accuracy and speed. It seems that the accuracy saturates if the attempt limit is higher than 3. Hence, we choose 3 as the upper limit for both the existing car and validation failure attempts.
Table 6. Threshold variation on high confidence car propagation
Threshold | AP50 |
---|---|
0.5 | 0.684 |
0.6 | 0.795 |
0.7 | 0.837 |
0.8 | 0.854 |
0.9 | 0.853 |
Table 7. Existing and failure attempts limit - AP50
Existing Fail |
1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
1 | 0.831 | 0.839 | 0.845 | 0.849 | 0.852 |
2 | 0.839 | 0.847 | 0.851 | 0.854 | 0.856 |
3 | 0.845 | 0.851 | 0.854 | 0.856 | 0.858 |
4 | 0.849 | 0.854 | 0.856 | 0.858 | 0.859 |
5 | 0.852 | 0.856 | 0.858 | 0.859 | 0.859 |
Table 8. Existing and failure attempts limit - Speed (fps)
Existing Fail |
1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
1 | 64 | 63 | 62 | 61 | 59 |
2 | 63 | 62 | 61 | 59 | 57 |
3 | 62 | 61 | 59 | 57 | 55 |
4 | 61 | 59 | 57 | 54 | 53 |
5 | 59 | 57 | 55 | 53 | 51 |
Fig. 1. High confidence car performance and speed trade-off
(Same number of existing and fail attempts)
3. Visualization results
Box color scheme:
Green box: ground-truth, blue box: YOLOv5 detection, yellow box: new box produced by propagation
(Top) YOLOv5 detection and ground-truth
(Middle) YOLOv5 detection, ground-truth, and new bboxes produced by the propagation process
(Bottom) The entire fisheye image: the red window indicates the event of interest.
Fig. 2. Cars occluded by a tree are often undetected by YOLO (red window). They can be recovered by a local propagation pass.
Fig. 3. When waiting for traffic light, cars close together (in the red window) are not detected. The missing cars are now detected by a local propagation pass.
Fig. 4. Cars are not detected by YOLO (in the red window) due to fisheye distortion. They are recovered by the High Confidence Car propagation.
Fig. 5. Cars are not detected by YOLO due to their bright headlight (in red window). They are recovered by the backward pass of High Confidence Car propagation