Development of Convolutional Neural Network(CNN的发展简介)

  • A+
所属分类:深度学习
摘要这一篇文章主要介绍关于CNN网络的一些里程碑的发展,其中会介绍LeNet, AlexNet, VGG, GoogLeNet, ResNet和他的一些变形。

内容介绍

这一部分主要参考一篇综述(见下), 来总结一下整个CNN的发展史.

Wang, Wei, et al. "Development of convolutional neural network and its application in image classification: a survey." Optical Engineering 58.4 (2019): 040901.

这一篇会主要分为五个部分,分别如下:

  1. About 2018 ACM A.M. Turing Award
  2. LeNet
  3. AlexNet
  4. VGG16/19
  5. GoogleNet
  6. ResNet及一些改进

概述

一些经典的CNN在ILSVR比赛中的表现如下,可以看到层数是在递增,同时loss在下降. 后面会详细介绍每一个经典的网络。

Development of Convolutional Neural Network(CNN的发展简介)

关于ILSVR的一些介绍:

Development of Convolutional Neural Network(CNN的发展简介)

About 2018 ACM A.M. Turing Award

这一部分会首先介绍一下2018年Turing Award的情况, 因为后面讲到的LeNet和AlexNet都会和获奖者有关,所以这里就稍微介绍一下。(Fathers of the Deep Learning Revolution Receive ACM A.M. Turing Award, Bengio, Hinton and LeCun Ushered in Major Breakthroughs in Artificial Intelligence.)

Development of Convolutional Neural Network(CNN的发展简介)

Geoffrey Hinton

  • Backpropagation: In a 1986 paper, “Learning Internal Representations by Error Propagation”
  • Boltzmann Machines: In 1983, with Terrence Sejnowski, Hinton invented Boltzmann Machines
  • Improvements to convolutional neural networks: In 2012, with his students, Alex Krizhevsky and Ilya Sutskever, Hinton improved convolutional neural networks using rectified linear neurons and dropout regularization. In the prominent ImageNet competition, Hinton and his students almost halved the error rate for object recognition and reshaped the computer vision field. (AlexNet)

Yann LeCun

  • Convolutional neural networks : In the 1980s, LeCun developed convolutional neural networks
  • Convolutional neural networks : In the late 1980s, while working at the University of Toronto and Bell Labs, LeCun was the first to train a convolutional neural network system on images of handwritten digits.
  • Improving backpropagation algorithms: LeCun proposed an early version of the backpropagation algorithm (backprop), and gave a clean derivation of it based on variational principles.
  • Broadening the vision of neural networks: LeCun is also credited with developing a broader vision for neural networks as a computational model for a wide range of tasks, introducing in early work a number of concepts now fundamental in AI.

Yoshua Bengio

  • Probabilistic models of sequences: In the 1990s, Bengio combined neural networks with probabilistic models of sequences, such as hidden Markov models.
  • High-dimensional word embeddings and attention: In 2000, Bengio authored the landmark paper, “A Neural Probabilistic Language Model,” that introduced high-dimension word embeddings as a representation of word meaning.
  • Generative adversarial networks: Since 2010, Bengio’s papers on generative deep learning, in particular the Generative Adversarial Networks (GANs) developed with Ian Goodfellow, have spawned a revolution in computer vision and computer graphics.

LeNet

LeNet是Yann LeCun在1998年的论文中所提出的, 他是一个经典的CNN的结构,由卷积层,池化层和全连接层构成。(一些具体的评价看下面英文的部分)

Ref : LeCun et al., “Gradient-based learning applied to document recognition,” Proc. IEEE 86(11), 2278–2324 (1998).

Development of Convolutional Neural Network(CNN的发展简介)
  • LeNet-5 is a classical CNN architecture. The combination of convolution layer, pooling layer, and fully connected layer is still the basic components of modern deep CNN.
  • LeNet-5 has a groundbreaking significance for the development of deep CNNs.
  • Due to insufficient hardware computing and data, LeNet-5 did not attract enough attention after it was proposed.

AlexNet

AlexNet is a milestone in the development of deep CNN, which has caused a new wave of neural network research. (因为AlexNet在ILSVR比赛中取得了较好的成绩, 引发了新的对于CNN的研究的热潮)

与LeNet-5相比, AlexNet的改变有以下的几点:

  1. ReLU activation function. ReLU can introduce both nonlinearity and sparsity into the network. Sparsity can activate neurons selectively or in a distributed manner. It can learn relatively sparse features and achieve automatic dissociation.
  2. Data augmentation. AlexNet uses label-preserving transformations to artificially enlarge the dataset. The form of data augmentation consists of generating image translations, horizontal reflections, and altering the intensities of the RGB channels in training images.(使用了数据增强)
  3. Dropout. Neurons can be discarded from the network according to a certain probability to reduce network model parameters and prevent overfitting.
  4. Training on two NVIDIA GTX 580 3GB GPUs. With the development of GPU parallel computing ability, this method speeds up network training.
  5. Local response normalization (LRN). (Not Common Anymore, 这个方法在VGG的实验中被证明是无效的)
  6. Overlapping pooling. The pooling step size is smaller than the corresponding edge of pooling kernel. (Overlapping Pooling is the pooling with stride smaller than the kernel size while Non-Overlapping Pooling is the pooling with stride equal to or larger than the kernel size.)
  7. (7) 7 CNN Ensemble. 18.2%->15.4%

AlexNet的整体结构如下图所示:

Development of Convolutional Neural Network(CNN的发展简介)

除了箭头指出的三层, 其余上下两个不会互相"影响". 下面的图片是AlexNet详细的参数,原图可以查AlexNet的百度百科.

Development of Convolutional Neural Network(CNN的发展简介)

ZFNet

ZFNet是ILSVR13的冠军,他对AlexNet做了一些小的修改, 主要是讲卷积核的大小变小了

Development of Convolutional Neural Network(CNN的发展简介)
  • The network has made minor improvements on AlexNet;
  • It changed the size of the convolution kernel in AlexNet's first layer from 11×11 to 7×7.
  • It changed the step size of the convolution kernel from 4 to 2.

Comparing ZFNet model with AlexNet single model, the error rate of top-5 is reduced by 1.7%, which confirms the correctness of this improvement.

VGG16/19

接下来的VGG和GoogleNet就是开始往更深的方向进行发展. 其中VGG有19层, 而GoogLeNet有22层. VGG的全称是:

Development of Convolutional Neural Network(CNN的发展简介)

VGG的网络结构

关于VGG网络的整体的结构,如下图所示,其实整体结构与最初的CNN的结构并没有发生很大的变化:

Development of Convolutional Neural Network(CNN的发展简介)

VGG的主要改进

The main contribution of VGG is a thorough evaluation of networks of increasing depth using an architecture with very small (3 × 3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16 to 19 weight layers. (VGG的主要改变是, 通过减小卷积核的大小, 从而来增加网络的深度), 在原文他测试了不同的深度, 最后才有了VGG16/19. 原文名称如下 :

Ref. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in Int. Conf. Learn. Represent.(2015).

解释--为什么缩小Convolution filter是有效的

我们注意到VGG的主要改进是使用very small (3 × 3) convolution filters,下面解释一下这样做的可行性.

我们要注意到,两层的33的convolutions的视野大小相当于是一层的55的convolutions的事业的大小,我们看下面的图. (The receptive field of two 3 × 3 convolutions is equivalent to that of a 5 × 5 convolution and the receptive field of three 3 × 3 convolutions is equivalent to that of a 7 × 7 convolution.)

Development of Convolutional Neural Network(CNN的发展简介)

从上面的图中,可以看出,经过两层33的convolutions后, 新的单个pixel有原来55的视野, 相当于一层3*3的convolutions。这样做可以带来下面的两个的好处 :

  • First, it contains three ReLU layers instead of one, making the decision function more discriminatory; (可以经过更多的激活层, 可以更有区分度)
  • Second, it can reduce the number of parameters. For example, if the input and output both have C channels, 3 × (3 × 3 × C × C)= 27 × C × C parameters are required for three convolution layers of 3 × 3, and 7 × 7 × C × C =49 × C × C parameters are required for one convolution layer of 7 × 7.(参数减少了, 上面是详细的计算过程)

GoogLeNet/Inception v1 to v3

GoogLeNet概述(网络结构)

GoogLeNet达到了22层。下面是GoogLeNet的两个主要的贡献 :

  • GoogLeNet broaden the network structure and skillfully proposed the inception module.(他提出了inception module, 一共有三个版本, 后面每个会有介绍, 这个结构的提出使得网络变宽)
  • The network with the inception module allowed the model to better describe the input data content while further increasing the depth and width of the network model. (同时, inception module的提出, 还使得网络变深, 使得分类的效果有所提升)
Development of Convolutional Neural Network(CNN的发展简介)

下面我们会分为两个部分来进行介绍,一个是GoogLeNet整体上的变动, 一部分是介绍Inception v1 to v3.

GoogLeNet整体上变动

Global Average Pooling

GoogLeNet将传统的最后一部分的全连接层去掉, 转而使用全面一层的平均值代替(如下图所示), 这样做的好处是减少了参数.

Development of Convolutional Neural Network(CNN的发展简介)

我们可以看一下参数的减少 :

  • Number of weights (connections) = 7×7×1024×1024 = 51.3M(使用FC)
  • In GoogLeNet, global average pooling is used nearly at the end of network by averaging each feature map from 7×7 to 1×1. Number of weights = 0(使用Average Pooling的时候, 是不需要参数的)

Auxiliary Classifiers for Training

我们可以看到上面GoogLeNet整体结构的时候, 发现他有三个输出, 这些是用作辅助网络的训练, 因为网络的层数较深, 这样做可以用来对抗梯度消失(As we can see there are some intermediate softmax branches at the middle, they are used for training only.)

Development of Convolutional Neural Network(CNN的发展简介)

下面是用处的总结 :

  • The loss is added to the total loss, with weight 0.3.
  • Authors claim it can be used for combating gradient vanishing problem, also providing regularization.(用来对抗梯度消失)
  • And it is NOT used in testing or inference time.

Inception v1 to v3

下面介绍Inception Module v1 to v3.

Inception v1

我们使用Inception module来进行网络的扩宽, 如果只是使用Naive Inception Module, 会需要较多的参数, 于是有了Inception module with dimension reduction, 他通过增加1*1的卷积核, 来减少通道的个数.

Development of Convolutional Neural Network(CNN的发展简介)

我们来看一下上面两种Inception Module参数个数的区别。首先是Naive Inception Module :

Development of Convolutional Neural Network(CNN的发展简介)

接着是Inception module with dimension reduction,

Development of Convolutional Neural Network(CNN的发展简介)

我们可以看到通过增加1*1的卷积层(具体如下图所示), 整个module使用的memory减少了一半.

Development of Convolutional Neural Network(CNN的发展简介)

我们总结一下1*1 CONV的特点:

  • 图片大小不变
  • 压缩filter, 如原始为64, 现在为32
  • 每次运算需要的memory变少
    • 如对于原始的3*3*192->(28*28*256)*(3*3*192)
    • 加入1*1conv后, (28*28*256)*(1*1*64)+(28*28*64)*(3*3*192)
    • (28*28)*(3*3*192*256)>(28*28)*(64+3*3*192*64)

Inception v2

Inception v2主要有下面的两个变化 :

  • BN_32 layer is added to normalize the output of each layer to a N (0, 1) Gaussian distribution so that the network can be converges faster and can be initialized more freely.(加入batch normalization)
  • In the model, a stack of two 3 × 3 convolution kernels are used to replace 5 × 5 convolution kernels in the inception v1 module, thus increasing the network depth.(将一个5*5拆成两层的3*3, 进一步加深了网络的层数)
Development of Convolutional Neural Network(CNN的发展简介)

Inception v3

关于Inception v3主要有下面的两个变化:

  • Spatial factorization into asymmetric convolutions.(使用了非对称的卷积, 如下图所示, 将原来n*n的卷积核变为1*n, n*1两个卷积核, 进一步减少参数, 加深网络)
  • The network width has increased, and the network input has changed from 224×224 to 299×299
Development of Convolutional Neural Network(CNN的发展简介)

下面来理解一下非对称卷积:

Development of Convolutional Neural Network(CNN的发展简介)
  • 上图中,两层1*3,3*1相当于一层3*3
  • 参数减少1/3, 相当于是原来的2/3
  • 如input和output的channel为C, 则原来需要3*3*C*C, 现在需要(3*1*C*C)+(1*3*C*C)

ResNet

ResNet的想法和由来

上面的VGG和GoogLeNet通过增加网络深度的方法,使得模型的分类效果变好,那么是否可以继续增加网络的深度呢,实际上是不可以的。

Development of Convolutional Neural Network(CNN的发展简介)
  • A problem arises, degradation problems. (从上面的error可以看出, 出现了网络的退化, 更深的网络效果更差)
  • This cannot be interpreted as overfitting, as overfit should be better in the training set. The degradation problem shows that deep networks cannot be optimized easily and well. (这个不是overfitting, 因为在training set上结果也不好)

那么有什么办法,可以使得在增加网络深度的情况下,不会出现网络的退化的现象呢。

于是ResNet的想法,或是说假设是下面这样的. 他通过设计使用了Residual learning block来达成这样的想法。

The authors argues that stacking layers shouldn’t degrade the network performance, because we could simply stack identity mappings (layer that doesn’t do anything) upon the current network, and the resulting architecture would perform the same.

This indicates that the deeper model should not produce a training error higher than its shallower counterparts.

于是, ResNet就是由很多个Residual Block来组成的, 如下图所示, 图像来源链接, Review: ResNet — Winner of ILSVRC 2015 (Image Classification, Localization, Detection)

Development of Convolutional Neural Network(CNN的发展简介)

ResNet在当年的各项比赛中均获得了较好的成绩.

Development of Convolutional Neural Network(CNN的发展简介)

理解Residual Block

下图是一个基本的Residual Block, 我们来做一下简单的解释.

  • 假设原来是需要学习的是H(x)
  • 现在使用Residual Block时, 则有H(x)=F(x)+x, 相当于要学习F(x)=H(x)-x, 我们可以理解F(x)是网络的残差, 即需要学习的是残差.
  • 我们想象, 只需要把Residual Block参数都设置为0,即层数变深变并不会对结果有影响, 输出还是x.
Development of Convolutional Neural Network(CNN的发展简介)

Resdiual Block的一些改进

下面简单罗列一些Resdiual Block的一些改进, 部分就直接放图片, 具体的可以查看相应的论文来进行了解。

Use Bottleneck

与GoogLeNet一样,使用了1*1的卷积核, 使得网络变得更深.

Development of Convolutional Neural Network(CNN的发展简介)

Identity Mappings in Deep Residual Networks

Development of Convolutional Neural Network(CNN的发展简介)

Wide Residual Networks

Development of Convolutional Neural Network(CNN的发展简介)

ResNeXt

Development of Convolutional Neural Network(CNN的发展简介)

Deep Networks with Stochastic Depth

每一个block有一定可能性被去除, 有些类似Dropout的感觉。

Development of Convolutional Neural Network(CNN的发展简介)

Densely Connected CNN

最后一个介绍的ResNet的改进是Dense Net, 他的主要结构如下所示:

Development of Convolutional Neural Network(CNN的发展简介)
  • Connects all layers directly with each other.(这样也可以用来对抗梯度消失的问题)
  • In this novel architecture, the input of each layer consists of the feature maps of all earlier layer, and its output is passed to each subsequent layer.
  • The feature maps are aggregated with depth-concatenation.

在具体使用的时候,也是分为Dense Block来进行使用的。

Development of Convolutional Neural Network(CNN的发展简介)

下面一张总结Dense Net.

Development of Convolutional Neural Network(CNN的发展简介)

总结

对于上面的经典的CNN网络,我们使用下面的两张图片进行总结。

可以看到,随着网络的发展,网络的层数在变深, 同时准确率在上升.

Development of Convolutional Neural Network(CNN的发展简介)

我们接着看一下网络的复杂度和准确率的关系, 可以看到VGG的整体需要的运算量是较大的,之后的网络GoogLeNet一些改进希望减少参数,整体的结果如下图所示。

Development of Convolutional Neural Network(CNN的发展简介)
  • 微信公众号
  • 关注微信公众号
  • weinxin
  • QQ群
  • 我们的QQ群号
  • weinxin
王 茂南

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: