概述
目标检测是计算机视觉中的核心任务之一,本文系统梳理从传统方法到深度学习的目标检测发展历程。
目标检测发展时间线
gantt
title 目标检测发展历程
dateFormat YYYY
section 传统方法
Viola-Jones :2001, 2005
HOG + SVM :2005, 2010
DPM :2008, 2016
section 深度学习
R-CNN :2014, 2015
SPP-Net :2014, 2015
Fast R-CNN :2015, 2016
Faster R-CNN :2015, 2017
YOLOv1 :2015, 2016
SSD :2016, 2017
YOLOv2-v3 :2016, 2018
RetinaNet :2017, 2018
YOLOv4-v5 :2020, 2021
传统目标检测方法
流程框架
flowchart TB
subgraph 传统检测流程
IMG[输入图像] --> REGION[候选区域生成]
REGION --> FEATURE[特征提取]
FEATURE --> SVM[分类器分类]
SVM --> OUT[检测结果]
end
subgraph 候选区域方法
REGION --> SL[滑动窗口]
REGION --> SS[选择性搜索]
REGION --> EDGE[边缘检测]
end
subgraph 特征提取
FEATURE --> HOG[HOG特征]
FEATURE --> SIFT[SIFT/ORB]
FEATURE --> HAAR[Haar特征]
end
Viola-Jones检测器
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| class ViolaJonesDetector: """Viola-Jones人脸检测器""" def __init__(self): self.classifiers = [] self.integral_image = None def compute_integral_image(self, image): """计算积分图""" return np.cumsum(np.cumsum(image, axis=0), axis=1) def haar_features(self, image): """计算Haar特征""" features = [] features.append(self.edge_feature(image)) features.append(self.line_feature(image)) features.append(self.center_feature(image)) return features def detect(self, image, scale_factor=1.1): """多尺度检测""" detections = [] scale = 1.0 while scale * image.shape[0] > 24: for y, x in self.sliding_window(image, (24, 24), step=2): window = image[y:y+24, x:x+24] features = self.haar_features(window) if self.classify(features): detections.append((x, y, scale)) image = self.pyramid_down(image, scale_factor) scale *= scale_factor return self.non_max_suppression(detections)
|
Two-Stage检测器
R-CNN系列架构
flowchart TB
subgraph R-CNN
IMG1[图像] --> REGION1[选择性搜索
~2000区域]
REGION1 --> WARP1[区域缩放]
WARP1 --> CNN1[CNN特征提取
每个区域独立]
CNN1 --> SVM1[分类器]
SVM1 --> BOX1[边界框回归]
end
subgraph Fast R-CNN
IMG2[图像] --> CNN2[整图CNN
共享特征]
IMG2 --> REGION2[选择性搜索]
CNN2 --> ROIP[ROI Pooling]
REGION2 --> ROIP
ROIP --> FC2[全连接层]
FC2 --> CLS2[分类+回归]
end
subgraph Faster R-CNN
IMG3[图像] --> CNN3[共享CNN]
CNN3 --> RPN[RPN区域提议网络]
RPN --> PROPOSAL[候选区域]
CNN3 --> ROI3[ROI Align]
PROPOSAL --> ROI3
ROI3 --> HEAD3[检测头]
end
Faster R-CNN实现
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
| class FasterRCNN(nn.Module): """Faster R-CNN目标检测器""" def __init__(self, num_classes=80): super().__init__() self.num_classes = num_classes self.backbone = resnet50(pretrained=True) self.backbone_out_channels = 2048 self.rpn = RegionProposalNetwork( in_channels=self.backbone_out_channels, num_anchors=9 ) self.roi_head = ROIHead( in_channels=self.backbone_out_channels, num_classes=num_classes ) def forward(self, images, targets=None): features = self.backbone(images) rpn_logits, rpn_boxes, proposals = self.rpn(features, targets) if self.training: roi_boxes, labels, bbox_targets = self.roi_head( features, proposals, targets ) return rpn_logits, rpn_boxes, roi_boxes, labels, bbox_targets else: class_logits, box_regression = self.roi_head(features, proposals) return class_logits, box_regression, proposals
class RegionProposalNetwork(nn.Module): """区域提议网络""" def __init__(self, in_channels, num_anchors=9): super().__init__() self.conv = nn.Conv2d(in_channels, 512, 3, padding=1) self.cls_logits = nn.Conv2d(512, num_anchors * 2, 1) self.bbox_pred = nn.Conv2d(512, num_anchors * 4, 1) def forward(self, features, targets=None): x = F.relu(self.conv(features)) logits = self.cls_logits(x) bbox_reg = self.bbox_pred(x) proposals = self.generate_proposals(logits, bbox_reg) return logits, bbox_reg, proposals def generate_proposals(self, logits, bbox_reg): """生成候选区域""" scores = F.softmax(logits, dim=1)[:, 1] keep = nms(bbox_reg, scores, threshold=0.7) return bbox_reg[keep]
|
One-Stage检测器
YOLO vs SSD架构对比
flowchart LR
subgraph YOLO
YIMG[图像] --> YGRID[网格划分
S×S]
YGRID --> YFEAT[特征提取]
YFEAT --> YPRED[直接预测
B×(4+1+C)]
end
subgraph SSD
SIMG[图像] --> SFEAT[多尺度特征图]
SFEAT --> SCONV[卷积预测]
SCONV --> SPRED[每层预测
default boxes]
end
YOLOv3实现
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
| class YOLOv3(nn.Module): """YOLOv3目标检测器""" def __init__(self, num_classes=80, anchors=None): super().__init__() self.num_classes = num_classes self.anchors = anchors or self.default_anchors self.backbone = Darknet53() self.detect_small = YOLOLayer(self.anchors[0], num_classes) self.detect_medium = YOLOLayer(self.anchors[1], num_classes) self.detect_large = YOLOLayer(self.anchors[2], num_classes) def forward(self, x): features = self.backbone(x) out_small = self.detect_small(features[0]) out_medium = self.detect_medium(features[1]) out_large = self.detect_large(features[2]) if self.training: return out_small, out_medium, out_large else: return self.concat_outputs(out_small, out_medium, out_large)
class YOLOLayer(nn.Module): """YOLO检测层""" def __init__(self, anchors, num_classes): super().__init__() self.anchors = anchors self.num_classes = num_classes self.num_anchors = len(anchors) out_channels = self.num_anchors * (5 + num_classes) self.conv = nn.Conv2d(512, out_channels, 1) def forward(self, x): prediction = self.conv(x) return prediction
|
性能对比
| 检测器 |
mAP@0.5 |
FPS |
优点 |
缺点 |
| R-CNN |
66.0% |
0.5 |
精度高 |
极慢 |
| Fast R-CNN |
70.0% |
2.0 |
共享特征 |
候选框慢 |
| Faster R-CNN |
78.8% |
5.0 |
端到端 |
较慢 |
| YOLOv1 |
63.4% |
45 |
极快 |
精度低 |
| YOLOv3 |
78.6% |
50 |
快而准 |
小物体差 |
| YOLOv4 |
83.0% |
40 |
实时高精度 |
略复杂 |
| RetinaNet |
80.5% |
12 |
平衡 |
较慢 |
| SSD |
76.8% |
25 |
多尺度 |
小物体差 |
检测流程总结
flowchart TB
START[输入图像] --> RESIZE[图像缩放]
RESIZE --> ENCODE[图像编码/归一化]
ENCODE --> FEATURE[特征提取]
FEATURE --> BRANCH{检测范式}
BRANCH --> TWO[Two-Stage]
BRANCH --> ONE[One-Stage]
TWO --> RPN[区域提议]
RPN --> ROI[ROI Pooling]
ROI --> CLS[分类+回归]
ONE --> GRID[网格/Anchor]
GRID --> PRED[直接预测]
CLS --> POST1[后处理]
PRED --> POST2[后处理]
POST1 --> NMS[NMS非极大抑制]
POST2 --> NMS
NMS --> OUTPUT[检测结果]
总结
mindmap
root((目标检测))
传统方法
Viola-Jones
HOG+SVM
DPM
Two-Stage
R-CNN
SPP-Net
Fast R-CNN
Faster R-CNN
One-Stage
YOLO系列
SSD
RetinaNet
Anchor-Free
CenterNet
FCOS
CornerNet
评价指标
mAP
FPS
IoU
目标检测技术从传统方法发展到深度学习,经历了从两阶段到单阶段、从Anchor-based到Anchor-free的演进过程。