J. Pang, C. Li, J. Shi, Z. Xu, and H. Feng. We organize the first large-scale Tiny Object Detection (TOD) challenge, which is a competition track: tiny person detection. We provide 18433 normal person boxes and 16909 dense boxes in training set. In Figure 1, WIDER Face holds a similar absolute scale distribution to TinyPerson. Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, Zhenjun Han; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1) The persons in TinyPerson are quite tiny compared with other representative datasets, shown in Figure 1 and Table 1, which is the main characteristics of TinyPerson; 2) The aspect ratio, of persons in TinyPerson has a large variance, given in Talbe. The tiny relative size results in more false positives and serious imbalance of positive/negative, due to massive and complex backgrounds are introduced in a real scenario. The objects' relative size of TinyPerson is smaller than that of CityPersons as shown in bottom-right of the Figure 1. Wi, Hi denote the width and height of Ii, respectively. The intuition of our approach is to align the object scales of the dataset for pre-training and the one for detector training. Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, Zhenjun Han WACV 2020; HRDNet: High-resolution Detection Network for Small Objects. We define the probability density function of objects' size, which is used to transform the probability distribution of objects' size in extra dataset. Our approach is inspired by the Human Cognition Process, while Scale Match can better utilize the existing annotated data and make the detector more sophisticated. Dataset for person detection: Pedestrian detection has always been a hot issue in computer vision. Scale Match for Tiny Person Detection. And for tiny[2, 20], it is partitioned into 3 sub-intervals: tiny1[2, 8], tiny2[8, 12], tiny3[12, 20]. For the second step, a uniform sampling algorithm is used. With detector pre-trained on SM COCO, we obtain 3.22% improvement of APtiny50, Table 7. The transformation of the mean of objects' size to that in TinyPerson is inefficient. Scale Match for Tiny Person Detection Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, Zhenjun Han Visual object detection has achieved unprecedented advance with the rise of deep convolutional neural networks. However, detecting tiny objects (for example tiny persons less than 20 pixels) in large-scale images remains not well investigated. Several small target datasets including WiderFace [25] and TinyNet [19], have been reported. The extremely small objects raise a grand challenge about feature representation while the massive and complex backgrounds aggregate the challenge. Then, we obtain a new dataset, COCO100, by setting the shorter edge of each image to 100 and keeping the height-width ratio unchanged. Inspired by the Human Cognitive Process that human will be sophisticated with some scale-related tasks when they learn more about the objects with the similar scale, we propose an easy but efficient scale transformation approach for tiny person detection by keeping the scale consistency between the TinyPerson and the extra dataset. With MSM COCO as the pre-trained dataset, the performance further improves to 47.29% of APtiny50, Table 7. Since some images are with dense objects in TinyPerson, DETECTIONS_PER_IMG (the max number of detector's output result boxes per image) is set to 200. Accordingly, we propose a simple yet effective Scale Match approach to align the object scales between the two datasets for favorable tiny-object representation. It is known that the histogram Equalization and Matching algorithms for image enhancement keep the monotonic changes of pixel values. Therefore, we use P2, P3, P4, P5, P6 of FPN instead of P3, P4, P5, P6, P7 for RetinaNet, which is similar to Faster RCNN-FPN. The TinyPerson dataset was used for the TOD Challenge and is publicly released. We follow this idea monotonically change the size, as shown in Figure 6. Due to only resizing these objects will destroy the image structure. The 1st Tiny Object Detection (TOD) Challenge aims to encourage research in developing novel and accurate methods for tiny object detection in images which have wide views, with a current focus on tiny person detection. A commonly approach is training a model on the extra datasets as pre-trained model, and then fine-tune it on a task-specified dataset. We build the baseline for tiny person detection and experimentally find that the scale mismatch could deteriorate the feature representation and the detectors. Person/pedestrian detection is an important topic in the computer vision community. Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, Zhenjun Han WACV 2020; Extended Feature Pyramid Network for Small Object Detection. Then the absolute size and relative size of a object are calculated as: For the size of objects we mentioned in the following, we use the objects' absolute size by default. In Table 4, the MRtiny50 of tiny CityPersons is 40% lower than that of CityPersons. INPUT: Dtrain (train dataset of D) Firstly, videos with a high resolution are collected from different websites. Best detector: With MS COCO, RetinaNet and FreeAnchor achieves better performance than Faster RCNN-FPN. Chunfang Deng, Mengmeng Wang, Liang Liu, and Yong Liu arXiv 2020; MatrixNets: A New Scale and Aspect Ratio Aware Architecture for Object Detection Despite the pedestrians in those datasets are in a relatively high resolution and the size of the pedestrians is large, this situation is not suitable for tiny object detection. The publicly available datasets are quite different from TinyPerson in object type and scale distribution, as shown in Figure 1. The scale factor incrementally scales the detection resolution between MinSize and MaxSize. Spatial information: Due to the size of the tiny object, spatial information maybe more important than deeper network model. Scale Match for Tiny Person Detection. This normalization is into float from 0 - 1, The scale parameter normalize all intensity values into the range of 0-1 of blobFromImg in function network.setInput( , , scale, ) parameter. While the region-based methods are complex and time-consuming, single-stage detectors, such as YOLO [20] and SSD [17], are proposed to accelerate the processing speed but with a performance drop, especially in tiny objects. To better quantify the effect of the tiny relative size, we obtain two new datasets 3*3 tiny CityPersons and 3*3 TinyPerson by directly 3*3 up-sampling tiny CityPersons and TinyPerson, respectively. Visual object detection has achieved unprecedented advance with the rise of deep convolutional neural networks. However, detecting tiny objects (for example tiny persons less than 20 pixels) in large-scale images remains not well investigated. We annotate 72651 objects with bounding boxes by hand. Proceedings of the IEEE Conference on Computer Vision and Mapping object's size s in dataset E to ^s with a monotone function f, makes the distribution of ^s same as Psize(^s,Dtrain). 【文献阅读12】Scale Match for Tiny Person Detection-微小人物检测的尺度匹配 Scale Match can transform the distribution of size to task-specified dataset, as shown in Figure 5. In The IEEE Winter Conference on Applications of Computer Vision. February 2, 2020. Scale Match for Tiny Person Detection(WACV2020), Official link of the dataset - ucas-vg/TinyBenchmark Therefore, a more efficient rectified histogram (as show in Algorithm 2) is proposed. For TinyPerson, the RetinaNet[15], FCOS[23], Faster RCNN-FPN, which are the representatives of one stage anchor base detector, anchor free detector and two stage anchor base detector respectively, are selected for experimental comparisons. They are not applicable to the scenarios where persons are in a large area and in a very long distance, e.g., marine search and rescue on a helicopter platform. Scale Match for Tiny Person Detection Xuehui Yu, Yuqi Gong, Nan Jiang, Qixiang Ye, Zhenjun Han. Different from tiny CityPersons, the images in TinyPerson are captured far away in the real scene. We choose ResNet50 as backbone. And the IOU threshold is set to 0.5 for performance evaluation. Object Detectors, The 1st Tiny Object Detection Challenge:Methods and Results, SRN: Side-output Residual Network for Object Symmetry Detection in the We provide 18433 normal person boxes and 16909 dense boxes in training set. However, detector pre-trained on MS COCO improves very limited in TinyPerson, since the object size of MS COCO is quite different from that of TinyPerson. TinyPerson represents the person in a quite low resolution, mainly less than 20 pixles, in maritime and beach scenes. R-CNN adopted a region proposal-based method based on selective search and then used a Conv-Net to classify the scale normalized proposals. TinyNet involves remote sensing target detection in a long distance. We provide 18433 normal person boxes and 16909 dense boxes in training set. For Caltech or CityPersons, IOU criteria is adopted for performance evaluation. [14] proposed feature pyramid networks that use the top-down architecture with lateral connections as an elegant multi-scale feature warping method. These image are collected from real-world scenarios based on UAVs. In this paper, we just simply adopt the first way for ignore regions. However in TinyPerson, most of ignore regions are much larger than that of a person. Training 12 epochs, and base learning rate is set to 0.01, decay 0.1 after 6 epochs and 10 epochs. TinyPerson. INPUT: K(integer, number of bin in histogram which use to estimate Psize(s;Dtrain)) Scale Match will be applied to all objects in E to get T(E), when there are a large number of targets in E, Psize(s;T(E)) will be close to Psize(s;D). WiderFace mainly focused on face detection, as shown in Figure, In recent years, with the development of Convolutional neural networks (CNNs), the performance of classification, detection and segmentation on some classical datasets, such as ImageNet, has far exceeded that of traditional machine learning algorithms. Region convolutional neural network (R-CNN), has become the popular detection architecture. Sample ^s: We firstly sample a bin's index respect to probability of H, and secondly sample ^s respect to a uniform probability distribution with min and max size equal to R[k]− and R[k]+. Such diversity enables models trained on TinyPerson to well generalize to more scenes, e.g., Long-distance human target detection and then rescue. Fcos: Fully convolutional one-stage object detection. We can ignore the mean, but the scale is important. Therefore, we cut the origin images into some sub-images with overlapping during training and test. Combining Deep Learning and Verification for Precise Object Instance Detection Flood-survivors detection using IR imagery on an autonomous drone. Different from objects in proper scales, the tiny objects are much more challenging due to the extreme small object size and low signal noise ratio, as shown in Figure 1. The scenarios of existing person/pedestrian benchmarks [2][6][24][5][4][8], e.g., CityPersons [27], are mainly in a near or middle distance. However, for TinyPerson, the same up-sampling strategy obtains limited performance improvement. The proposed Scale Match approach improves the detection performance over the state-of-the-art detector (FPN) with a significant margin ( 5%). Since the ignore region is always a group of persons (not a single person) or something else which can neither be treated as foreground (positive sample) nor background (negative sample). Chunfang Deng, Mengmeng Wang, Liang Liu, and Yong Liu arXiv 2020; MatrixNets: A New Scale and Aspect Ratio Aware Architecture for Object Detection For true object detection the above suggested keypoint based approaches work better. NOTE: N (the number of objects in dataset D); Gij(Dtrain) is j-th object in i-th image of dataset Dtrain. Freeanchor: Learning to match anchors for visual object detection. How can we use extra public datasets with lots of data to help training model for specified tasks, e.g., tiny person detection? The performance drops significantly while the object's size becomes tiny. In this paper, without losing generality, MS COCO is used as extra dataset, and Scale Match is used for the scale transformation T. Gij=(xij,yij,wij,hij) represents j-th object in image Ii of dataset E. The Scale Match approach can be simply described as three steps: Resize object with scale ratio c ,then ^Gij←(xij∗c,yij∗c,wij∗c,hij∗c); where ^Gij is the result after Scale Match. Citypersons: A diverse dataset for pedestrian detection. ok,今天分享的就是小目标检测方向的最新论文:Scale Match for Tiny Person Detection。这篇论文的"模式"也是一种较为经典的方式:新数据集+new benchmark,也就是提出了新的小目标检测数据集和小目标检测方法。 It's hard to have high location precision in TinyPerson due to the tiny objects' absolute and relative size. But it obtained poor performance on TinyPerson, due to the great difference between relative scale and aspect ratio, which also further demonstrates the great chanllange of the proposed TinyPerson.
