Object Detection Overview:从传统方法到深度学习

🎙️ 语音朗读 当前: 晓晓 (温柔女声)

概述

目标检测是计算机视觉中的核心任务之一,本文系统梳理从传统方法到深度学习的目标检测发展历程。

目标检测发展时间线

gantt
    title 目标检测发展历程
    dateFormat  YYYY
    section 传统方法
    Viola-Jones        :2001, 2005
    HOG + SVM         :2005, 2010
    DPM               :2008, 2016
    section 深度学习
    R-CNN             :2014, 2015
    SPP-Net           :2014, 2015
    Fast R-CNN        :2015, 2016
    Faster R-CNN      :2015, 2017
    YOLOv1            :2015, 2016
    SSD               :2016, 2017
    YOLOv2-v3         :2016, 2018
    RetinaNet          :2017, 2018
    YOLOv4-v5         :2020, 2021

传统目标检测方法

流程框架

flowchart TB
    subgraph 传统检测流程
        IMG[输入图像] --> REGION[候选区域生成]
        REGION --> FEATURE[特征提取]
        FEATURE --> SVM[分类器分类]
        SVM --> OUT[检测结果]
    end
    
    subgraph 候选区域方法
        REGION --> SL[滑动窗口]
        REGION --> SS[选择性搜索]
        REGION --> EDGE[边缘检测]
    end
    
    subgraph 特征提取
        FEATURE --> HOG[HOG特征]
        FEATURE --> SIFT[SIFT/ORB]
        FEATURE --> HAAR[Haar特征]
    end

Viola-Jones检测器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class ViolaJonesDetector:
"""Viola-Jones人脸检测器"""

def __init__(self):
self.classifiers = [] # 级联分类器
self.integral_image = None

def compute_integral_image(self, image):
"""计算积分图"""
return np.cumsum(np.cumsum(image, axis=0), axis=1)

def haar_features(self, image):
"""计算Haar特征"""
features = []
# 边缘特征
features.append(self.edge_feature(image))
# 线特征
features.append(self.line_feature(image))
# 中心环绕特征
features.append(self.center_feature(image))
return features

def detect(self, image, scale_factor=1.1):
"""多尺度检测"""
detections = []
scale = 1.0

while scale * image.shape[0] > 24:
for y, x in self.sliding_window(image, (24, 24), step=2):
window = image[y:y+24, x:x+24]
features = self.haar_features(window)

if self.classify(features):
detections.append((x, y, scale))

image = self.pyramid_down(image, scale_factor)
scale *= scale_factor

return self.non_max_suppression(detections)

Two-Stage检测器

R-CNN系列架构

flowchart TB
    subgraph R-CNN
        IMG1[图像] --> REGION1[选择性搜索
~2000区域] REGION1 --> WARP1[区域缩放] WARP1 --> CNN1[CNN特征提取
每个区域独立] CNN1 --> SVM1[分类器] SVM1 --> BOX1[边界框回归] end subgraph Fast R-CNN IMG2[图像] --> CNN2[整图CNN
共享特征] IMG2 --> REGION2[选择性搜索] CNN2 --> ROIP[ROI Pooling] REGION2 --> ROIP ROIP --> FC2[全连接层] FC2 --> CLS2[分类+回归] end subgraph Faster R-CNN IMG3[图像] --> CNN3[共享CNN] CNN3 --> RPN[RPN区域提议网络] RPN --> PROPOSAL[候选区域] CNN3 --> ROI3[ROI Align] PROPOSAL --> ROI3 ROI3 --> HEAD3[检测头] end

Faster R-CNN实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class FasterRCNN(nn.Module):
"""Faster R-CNN目标检测器"""

def __init__(self, num_classes=80):
super().__init__()
self.num_classes = num_classes

# 骨干网络
self.backbone = resnet50(pretrained=True)
self.backbone_out_channels = 2048

# RPN
self.rpn = RegionProposalNetwork(
in_channels=self.backbone_out_channels,
num_anchors=9
)

# ROI Head
self.roi_head = ROIHead(
in_channels=self.backbone_out_channels,
num_classes=num_classes
)

def forward(self, images, targets=None):
# 提取特征
features = self.backbone(images)

# RPN生成候选区域
rpn_logits, rpn_boxes, proposals = self.rpn(features, targets)

# ROI分类和回归
if self.training:
roi_boxes, labels, bbox_targets = self.roi_head(
features, proposals, targets
)
return rpn_logits, rpn_boxes, roi_boxes, labels, bbox_targets
else:
class_logits, box_regression = self.roi_head(features, proposals)
return class_logits, box_regression, proposals

class RegionProposalNetwork(nn.Module):
"""区域提议网络"""

def __init__(self, in_channels, num_anchors=9):
super().__init__()
self.conv = nn.Conv2d(in_channels, 512, 3, padding=1)
self.cls_logits = nn.Conv2d(512, num_anchors * 2, 1) # 前景/背景
self.bbox_pred = nn.Conv2d(512, num_anchors * 4, 1) # 边界框偏移

def forward(self, features, targets=None):
x = F.relu(self.conv(features))

# 分类:是否包含物体
logits = self.cls_logits(x)
# 回归:边界框偏移
bbox_reg = self.bbox_pred(x)

# 生成proposals
proposals = self.generate_proposals(logits, bbox_reg)

return logits, bbox_reg, proposals

def generate_proposals(self, logits, bbox_reg):
"""生成候选区域"""
# 提取top-k预测
scores = F.softmax(logits, dim=1)[:, 1] # 前景概率
# NMS后处理
keep = nms(bbox_reg, scores, threshold=0.7)
return bbox_reg[keep]

One-Stage检测器

YOLO vs SSD架构对比

flowchart LR
    subgraph YOLO
        YIMG[图像] --> YGRID[网格划分
S×S] YGRID --> YFEAT[特征提取] YFEAT --> YPRED[直接预测
B×(4+1+C)] end subgraph SSD SIMG[图像] --> SFEAT[多尺度特征图] SFEAT --> SCONV[卷积预测] SCONV --> SPRED[每层预测
default boxes] end

YOLOv3实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
class YOLOv3(nn.Module):
"""YOLOv3目标检测器"""

def __init__(self, num_classes=80, anchors=None):
super().__init__()
self.num_classes = num_classes
self.anchors = anchors or self.default_anchors

# 骨干网络 - Darknet53
self.backbone = Darknet53()

# 多尺度检测头
self.detect_small = YOLOLayer(self.anchors[0], num_classes) # 13×13
self.detect_medium = YOLOLayer(self.anchors[1], num_classes) # 26×26
self.detect_large = YOLOLayer(self.anchors[2], num_classes) # 52×52

def forward(self, x):
# 特征提取
features = self.backbone(x)

# 多尺度检测
out_small = self.detect_small(features[0])
out_medium = self.detect_medium(features[1])
out_large = self.detect_large(features[2])

if self.training:
return out_small, out_medium, out_large
else:
# 合并多尺度预测
return self.concat_outputs(out_small, out_medium, out_large)

class YOLOLayer(nn.Module):
"""YOLO检测层"""

def __init__(self, anchors, num_classes):
super().__init__()
self.anchors = anchors
self.num_classes = num_classes
self.num_anchors = len(anchors)

out_channels = self.num_anchors * (5 + num_classes)
self.conv = nn.Conv2d(512, out_channels, 1)

def forward(self, x):
# x: [B, 512, S, S]
prediction = self.conv(x)
# 解析预测: [B, num_anchors, S, S, 5+num_classes]
# 5 = tx, ty, tw, th, objectness
return prediction

性能对比

检测器 mAP@0.5 FPS 优点 缺点
R-CNN 66.0% 0.5 精度高 极慢
Fast R-CNN 70.0% 2.0 共享特征 候选框慢
Faster R-CNN 78.8% 5.0 端到端 较慢
YOLOv1 63.4% 45 极快 精度低
YOLOv3 78.6% 50 快而准 小物体差
YOLOv4 83.0% 40 实时高精度 略复杂
RetinaNet 80.5% 12 平衡 较慢
SSD 76.8% 25 多尺度 小物体差

检测流程总结

flowchart TB
    START[输入图像] --> RESIZE[图像缩放]
    RESIZE --> ENCODE[图像编码/归一化]
    ENCODE --> FEATURE[特征提取]
    
    FEATURE --> BRANCH{检测范式}
    BRANCH --> TWO[Two-Stage]
    BRANCH --> ONE[One-Stage]
    
    TWO --> RPN[区域提议]
    RPN --> ROI[ROI Pooling]
    ROI --> CLS[分类+回归]
    
    ONE --> GRID[网格/Anchor]
    GRID --> PRED[直接预测]
    
    CLS --> POST1[后处理]
    PRED --> POST2[后处理]
    
    POST1 --> NMS[NMS非极大抑制]
    POST2 --> NMS
    NMS --> OUTPUT[检测结果]

总结

mindmap
  root((目标检测))
    传统方法
      Viola-Jones
      HOG+SVM
      DPM
    Two-Stage
      R-CNN
      SPP-Net
      Fast R-CNN
      Faster R-CNN
    One-Stage
      YOLO系列
      SSD
      RetinaNet
    Anchor-Free
      CenterNet
      FCOS
      CornerNet
    评价指标
      mAP
      FPS
      IoU

目标检测技术从传统方法发展到深度学习,经历了从两阶段到单阶段、从Anchor-based到Anchor-free的演进过程。

© 2019-2026 ovo$^{mc^2}$ All Rights Reserved. | 站点总访问 28969 次 | 访客 19045
Theme by hiero