@haoqiang 2018-08-24T04:29:04.000000Z 字数 2507 阅读 58

Work Summary

MINIVISION -- HaoQiang

1. Outline

Watermarks Removal
- Background
- Model
- Model Compression
- Deployment
Super Resolution
- Background
- Models
- Model Optimization

Watermarks Removal

1. Background

For our face recongnition task on ID card images, the most challenging problem is watermarks, due to the occlusion and quality deterioration after adding watermarks.

Pipeline:

2. pix2pix

《Image-to-Image Translation with Conditional Adversarial Networks》

Model

Generator (UNet)

Discriminator

Loss Function

Content Loss (L1)

$loss_{L1}=\frac{1}{r^2WH}\sum_{x=1}^{rW}{\sum_{y=1}^{rH}{\left|I_{real}-G(I_{water})\right|}}$

Adversarial Loss

$loss_{Adv}=\frac{1}{N}\sum_{n=1}^{N}{-\log{D(G(I_{water}))}}$

Component Loss

$loss = 10*loss_{L1} + loss_{Adv}$

3. Model Compression

Reduce the number of convolution kernels
Replace Concate by Add
Use depthwise separable convolution

《MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications》

The calculation of standard conv:

$D_k \times D_k \times N \times M \times D_F \times D_F$

The calculation of depthwise separable conv:

$D_k \times D_k \times M \times D_F \times D_F + N \times M \times D_F \times D_F$

calculation reduction

$\frac{D_k \times D_k \times M \times D_F \times D_F + N \times M \times D_F \times D_F}{D_k \times D_k \times N \times M \times D_F \times D_F}=\frac{1}{N}+\frac{1}{D_K^2}$

If depthwise conv's size is set to 3 x 3 ( $D_K=3$ ), it will use between 8 to 9 times less computation than standard conv.

Model	Model Size	Speed (CPU)
Model Before Compression	122.6 MB	112ms
Model After Compression	0.63 MB	37ms

Result

Deployment

process
Train model and save weights --> Define testing graph and load weights --> Froze graph and export '.pb' file --> use opencv dnn modules by c++

net = dnn::readNetFromTensorflow(model);
imputBlob = blobFromImage(img);
output = net.forward("generator/tanh");

Notice:

When you need to do channel processing, set axis=3 not -1 ( for instance, tf.concat([feature1,feature2], axis=3) ).
Remove Dropout layer and make sure BatchNorm layer is in testing mode.
Make sure the model graph only has one input and one ouput.

Super Resolution

1. Background

Because of the low resolution and quality decrease of images after compress encoding, we need to recovery clearer images to augment data and improve model performance.

2. Models

pix2pix

Generator

Discriminator (Patch GAN)

SRGAN

《Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network》

《Is the deconvolution layer the same as a convolutional layer?》

tf.depth_to_space(x, scale=2)

Loss Function

Content Loss (perceptual)
《Perceptual Losses for Real-Time Style Transfer and Super-Resolution》

$loss_{Per} = \frac{1}{W_i H_i}\sum_{x=1}^{W_i}{\sum_{y=1}^{H_i}{(\phi_i(I_{HR})_{x,y}-\phi_i(G(I_{LR}))_{x,y})^2}}$

$\phi_i(I)$ is the feature map of VGG19 layer

$i$ .

Component Loss

$loss = loss_{Per} + 10^{-3}*loss_{Adv}$

Work Summary

1. Outline

Watermarks Removal

1. Background

2. pix2pix

Model

Loss Function

3. Model Compression

Result

Deployment

Super Resolution

1. Background

2. Models

pix2pix

SRGAN

Loss Function

Result

内容目录

选择主题