@mShuaiZhao 2018-01-04T11:20:12.000000Z 字数 4389 阅读 369

week02.Ng's CNN Course

CNN 2017.12

Case studies

Case Studies

get intuition how to design effective network from the cases

The main benefit of a very deep network is that it can represent very complex functions. It can also learn features at many different levels of abstraction, from edges (at the lower layers) to very complex features (at the deeper layers).
Very, very deep neural networks are difficult to train because of vanishing and exploding gradient types of problems.
Residual block
- main path of this set of layers
- shortcut/ skip connection
  
  add before the ReLU
Residual Network

"plain network"

And so, in reality, your training error gets worse if you pick a network that's too deep. But what happens with ResNet is that even as the number of layers gets deeper, you can have the performance of the training error kind of keep on going down. Even if we train a network with over a hundred layers.

But by taking these activations be it X of these intermediate activations and allowing it to go much deeper in the neural network, this really helps with the vanishing and exploding gradient problems and allows you to train much deeper neural networks without really appreciable loss in performance, and maybe at some point, this will plateau, this will flatten out, and it doesn't help that much deeper and deeper networks. But ResNet is not even effective at helping train very deep networks.

[Szegedy tf al. 2014. Goiing deeper with convolutions]

Motivation for the inception network

So what the inception network or what an inception layer says is, instead choosing what filter size you want in a Conv layer, or even do you want a convolutional layer or a pooling layer? Let's do them all.
The probelm of computational cost

input : 28x28x192
output : 28x28x32
conv : 5x5x192
需要做的乘法次数：　 $28\times28\times32\times5\times5\times5\times192 \approx 120M$
- Using 1x1 convolution
input : 28x28x192
1x1 conv : 1x1x192, 32 filters
output : 28x28x32
5x5 conv : 5x5x32, 32 filters

显著有效的减少了计算的次数。
size of volume 的急剧减小在实际中证明，并没有伤害最后网络的效果。
1x1 conv layer称作"bottleneck" layer.

common augmentation method
- Mirroring
  水平翻转
- Random Cropping
  随机截取
- Ratation，Shearing，Local warping...
Color shifting

R、G、B各个同道加减值
光照不变性
- PCA Color Augmentation/fancy PCA
Implementing distortions during training

CPU thread(load data, distortion)
CPU/GPU training

Data vs. hand-engineering

little data(more hand-engineering)
object detection --> image recognition --> speech recognition
lots of data(simple algorithm, less hand-engineering)
- two sources of knowledge
  - Labeled data
  - Hand engineered features/network architecture/other components
    
    当没有充足的数据的时候，就会花费更多的时间去改进网络结构。
Tips for doing well on benchmarks/winning competitions
- Ensembling
  
  独立地训练几个网络，综合它们的输出。
  可能有1%或者2%的提升。
  开销提升，在实际中不太适用。
- Multi-crop at test time
  - run classifier on multiple versions of test images and average results
  - 10-crop
    选图片的中心、四角的crop；图片得到对称图片的中心、四角的crop，送到网络中去最后取结果的平均值。
Use open source code
- use architectures of networks published in the literature
- use open source implementations if possible
- use pretrained models and fine-tune on your dataset