ResNet介绍及Pytorch实现Resnet

王茂南

3296
文章

75
评论

2020年4月10日07:59:27

评论 2316字阅读7分43秒

摘要关于ResNet的相关介绍, 以及使用Pytorch来自己实现一遍ResNet. 最重要的是, 在这里记录了Pytorch训练的详细步骤, 之后别的训练也可以参考这里的代码.

文章目录(Table of Contents)

简介

关于Resnet的介绍, 我们之前在Development of Convolutional Neural Network(CNN的发展简介)中有所提及, 但是我们并没有仔细讲他的实现过程, 以及没有亲自使用Pytorch来进行实验. 这一篇就会使用Pytorch来完成Resnet的实验, 并用猫狗数据集做一个简单的测试. 这个数据集我们在将图像数据存储为npy文件这里用过, 当时就是简单的CNN做测试, 这次我们使用Resnet来进行测试.

其实, 这一篇文章说是介绍Resnet, 我自己是更想把使用Pytorch训练的整个过程做一下记录, 例如训练过程中的一些写法, 验证集进行验证, 保存最优模型等. 之后有些可以重复使用的时候可以参考这一篇.

关于完整的代码, 还是可以参考GitHub链接:

一些其他的参考资料

详细的关于Pytorch实现ResNet: Residual Networks: Implementing ResNet in Pytorch
关于ResNet的原理介绍: Deep Residual Network Architectural Design
关于ResNet的paper的解读(十分推荐阅读), ResNet Paper Notes

Resnet相关介绍

ResNet在2015年被提出, 他有很深的层数同时不会出现梯度消失的情况. (ResNet can add many layers with strong performance, while previous architectures had a drop off in the effectiveness with each additional layer.)

ResNet通过identity shortcut connections来解决梯度消失的问题. 也就是说, In ResNet, the output from the previous layer, called residual, is added to the output of the current layer.

想法来源

实验发现, 随着深度的变深, 准确率也会下降. 那么是否存在某个层数N, 表示最佳层数. 当网络层数大于N的部分, 就都使用identity mapping.
但是, 很难让网络通过自己学习, 使其后面的weights能达到identity mapping的效果.
于是我们有了residual learning的想法.
- 假设我们之前想学习的映射关系是y=H(x)
- 现在我们想找一个新的映射F(x), 使得y=F(x)+x, 于是就有了H(x)=F(x)+x => F(x)=H(x)-x
- 这个时候, 想要构造identity mapping就会很容易, 只需要将F(x)设为0, 则有y=x
下面是residual block的示意图.

但是在实际操作的时候, shortcut connection部分(也就是直接+x的部分)可能是要乘一个矩阵的. 这是因为如果x在通过F(x)之后维度有变化, 那么x也需要变成相同的维度. The element-wise addition is performed on the two feature maps, going channel by channel (so the dims must be the same)

有下面里两种做法来是的x和F(x)的维度相同:

Projection matrix
or just padding extra zeros to increase dimension (doesn't increase number of parameters)

ResNet一些特殊的设计

Bottleneck Design

在ResNet中, 存在着Bottleneck, 通过11的卷积核来减少进入的维度, 最后输入的时候再通过11的增加维度.

sandwich 3x3 conv layer with two 1x1 conv layers
similar complexity
better representation
1x1 conv reduce tensor dimensonality for 3x3 conv layer

下面给出一个详细的例子.

Exampe: in the following figure, a $256$ dimensional (256 channels) input is fed into a 1x1 which maps it to 64 channels, then a 3x3 which maps it to 64 channels, and then 1 x 1 that maps it back to the original dimensionality of 256 channels.