@devilloser 2018-07-19T07:57:34.000000Z 字数 1822 阅读 846

CVPR2018 video action

action

A Closer Look at Spatiotemporal Convolutions for Action Recgonition

文章链接
A Closer Look at Spatiotemporal Convolutions for Action Recognition

贡献

1）残差中的3D conv和2D conv的作用
2）提出了R(2+1)D conv:replace N*t*d*d conv with N*1*d*d conv and M*t*1*1 conv

advantages

1) it doubles the number of nonlinearities in the network due to the additional ReLU between the 2D and 1D convolution in each block
2) forcing the 3D convolution into separate spatial and temporal components renders the optimization easier

MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition

文章链接
MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition
QQ截图20180710150449.png-16.7kB

贡献

Mixed 2D/3D Convolutional Tube (MiCT)

advantages

efficient

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

文章链接
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

image_1ci1hgnuuhli7nsitr1d7038l1t.png-74.9kB

贡献

optical flow guided feature
image_1ci1hjql63imd9a17jb1a871v843a.png-50.2kB
1×1 conv可以reduce channel

边缘检测算子：

sobel算子

$G_x=\begin{bmatrix} 1& 0 &-1 \\ 2& 0& -2\\ 1& 0& -1 \end{bmatrix}*feature map$
$G_y=\begin{bmatrix} 1& 2 &1 \\ 0& 0&0\\ -1& -2& -1 \end{bmatrix}*feature map$
$G=\sqrt{{G_x}^2+{G_y}^2}$
当G大于某个阈值时认为是边界

laplace算子

image_1ci1ifvge6io18i3m3n1ee53ra3n.png-5.2kB
经验值 $z_5=4$ , $z_2,z_4,z_6,z_8=-1$

Non-local Neural Networks

文章链接
Non-localNeuralNetworks

贡献

讲non-local mean算法套用在conv network中

实现

non-local mean:
$y_i=\frac{1}{C(x)}\sum_{\forall j}f(x_i,x_j)g(x_i)$

Gaussian

$f(x_i,x_j)=e^{x_i^Tx_j}$

embedded Gaussian

$f(x_i,x_j)=e^{\theta(x_i)^T\phi(x_j)}$
set $C(x)=\sum_{\forall j}f(x_i,x_j)$
image_1ci1k9u6t1uhdeqqoc01takv7944.png-50.6kB
image_1cioojjagvrgu4h1neb1s349ja9.png-84.8kB

dot

$f(x_i,x_j)=\theta(x_i)^T\phi(x_j)$

Concatenation

$f(x_i,x_j)=ReLU(W_f^T[\theta(x_i),\phi(x_j)])$
set $C(x)=N$

总结

temporal信息的提取：
non-local network
3D conv
对成熟模型的feature map重新提取信息

CVPR2018 video action

A Closer Look at Spatiotemporal Convolutions for Action Recgonition

贡献

advantages

MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition

贡献

advantages

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

贡献

边缘检测算子：

sobel算子

laplace算子

Non-local Neural Networks

贡献

实现

Gaussian

embedded Gaussian

dot

Concatenation

总结

内容目录

选择主题