@mShuaiZhao
2018-01-04T11:20:12.000000Z
字数 4389
阅读 369
CNN
2017.12
Case Studies
get intuition how to design effective network from the cases
LeNet-5(1998)
input 32x32x1
conv(5x5,s=1,no padding) 28x28x6
average pooling(f=2, s=2) 14x14x6
conv(5x5,s=1,no padding) 10x10x16
average pooling(f=2, s=2) 5x5x16
FC 120
FC 84
output
60K parameters
AlexNet(2012)
input 227x227x3
conv(11x11,s=4,no padding) 55x55x96
max pooling(f=3,s=2) 27x27x96
conv(5x5, same padding) 27x27x256
max pooling(f=3,s=2) 13x13x256
conv(3x3, same padding) 13x13x384
conv(3x3, same padding) 13x13x384
conv(3x3, same padding) 13x13x256
max pooling(f=3,s=2) 6x6x256=9216
FC 4096
FC 4096
FC 1000
softmax
ReLU activation function
60M parameters
Local Response Normalization(This type of layer isn't really used much)
do normalization across the whole volume.
You do not want too many neurons with a very high activation.
VGG-16
input 224x224x3
convx2 224x224x64
pool 112x112x64
convx2 112x112x128
pool 56x56x128
...
138M parameters
The main benefit of a very deep network is that it can represent very complex functions. It can also learn features at many different levels of abstraction, from edges (at the lower layers) to very complex features (at the deeper layers).
Very, very deep neural networks are difficult to train because of vanishing and exploding gradient types of problems.
Residual block
main path of this set of layers
shortcut/ skip connection
add before the ReLU
Residual Network
"plain network"
And so, in reality, your training error gets worse if you pick a network that's too deep. But what happens with ResNet is that even as the number of layers gets deeper, you can have the performance of the training error kind of keep on going down. Even if we train a network with over a hundred layers.
But by taking these activations be it X of these intermediate activations and allowing it to go much deeper in the neural network, this really helps with the vanishing and exploding gradient problems and allows you to train much deeper neural networks without really appreciable loss in performance, and maybe at some point, this will plateau, this will flatten out, and it doesn't help that much deeper and deeper networks. But ResNet is not even effective at helping train very deep networks.
1x1 convolution
just multiply by some number
also called Nerworks in networks
Using 1x1 convolution
non-trival operation
change the n_C, add non-linearity
[Szegedy tf al. 2014. Goiing deeper with convolutions]
Motivation for the inception network
So what the inception network or what an inception layer says is, instead choosing what filter size you want in a Conv layer, or even do you want a convolutional layer or a pooling layer? Let's do them all.
The probelm of computational cost
input : 28x28x192
output : 28x28x32
conv : 5x5x192
需要做的乘法次数:
input : 28x28x192
1x1 conv : 1x1x192, 32 filters
output : 28x28x32
5x5 conv : 5x5x32, 32 filters
显著有效的减少了计算的次数。
size of volume 的急剧减小在实际中证明,并没有伤害最后网络的效果。
1x1 conv layer称作"bottleneck" layer.
Inception module
GoogLetNet
We NEED TO GO DEEPER.
freeze some layers
if you only have a small training set
一种方法:不训练一些已经训练好的层,只训练你添加的一些层。
if you have a large training set
you can freeze fewer layer
也可以只采用其网络架构,重新开始训练
seriously consider unless you have a very large dataset
common augmentation method
Mirroring
水平翻转
Random Cropping
随机截取
Ratation,Shearing,Local warping...
Color shifting
R、G、B各个同道加减值
光照不变性
Implementing distortions during training
CPU thread(load data, distortion)
CPU/GPU training
Data vs. hand-engineering
little data(more hand-engineering)
object detection --> image recognition --> speech recognition
lots of data(simple algorithm, less hand-engineering)
two sources of knowledge
Hand engineered features/network architecture/other components
当没有充足的数据的时候,就会花费更多的时间去改进网络结构。
Tips for doing well on benchmarks/winning competitions
Ensembling
独立地训练几个网络,综合它们的输出。
可能有1%或者2%的提升。
开销提升,在实际中不太适用。
Multi-crop at test time
run classifier on multiple versions of test images and average results
10-crop
选图片的中心、四角的crop;图片得到对称图片的中心、四角的crop,送到网络中去最后取结果的平均值。
Use open source code
use architectures of networks published in the literature
use open source implementations if possible
use pretrained models and fine-tune on your dataset