@zhenni94 2015-09-09T15:15:23.000000Z 字数 19914 阅读 3553

Notes of caffe

Installation

Intallation in Windows: VS2013+Cuda7.0+Python2.7

Reference:

(Update 2015/08/18) caffe-windows
Complie fast rcnn in Windows (Using caffe-windows)
windows-caffe, matlab, python wrapper installation guide: http://www.cnblogs.com/trantor/p/4570097.html
Not use caffe-windows, build by ourself: https://initialneil.wordpress.com/2015/01/11/build-caffe-in-windows-with-visual-studio-2013-cuda-6-5-opencv-2-4-9/
Some edition of codes for the problems that caffe-windows cannot solve: https://github.com/BVLC/caffe/pull/2136/files?diff=split

If use cpu only:

add -DCPU_ONLY -UUSE_CUDNN to Property->C/C++->Command Line -> Addition Options
or add #define CPU_ONLY to file caffe/util/device_alternate.hpp
add return nullptr; in function gpu_data() and mutable_gpu_data() in SyncedMemory.cpp

Matlab wrapper

Python wrapper (For Fast R-CNN)

Python install: python 2.7 64-bit
- Add environment path C:\Python27($Python27_ROOT)
Compile pycaffe project:
- Modify the python root in Configuration Properties -> C/C++ -> General->Additional Include Directories ($python_ROOT27\Lib\site-packages\numpy\core\include; $python27_ROOT\include;)
- Modify the lib path: Configuration Properties -> linker -> General ($python_ROOT\libs)
Download and install Graphviz2.38, add environment path $Graphviz2.38_ROOT\bin
Install python packages (in /python/requirements.txt with easydict added):
- Pip install:
  - Download: https://pip.pypa.io/en/stable/installing.html
  - Unzip and install: python setup.py install
  - Add environment path: $Python27_ROOT\Scripts
- Install packages: pip install PACKAGE_NAME
  - Some packages failed to compile can be download the unofficial ones instead athttp://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
  - lmdb may not be installed, which is not affected the project much
Add cv2 module into python: Copy cv2.pyd from the $opencv_ROOTbuild/python/2.7/x64 to ($Python27Root)/lib/site-packeges.
Compile cython_bbox and cython_nms:
1. Install VCForPython27
2. Copy caffe_windows_root/python directory to fast_rcnn_root/caffe-fast-rcnn。
3. Edit the codes:
  - Edit fast_rcnn_root/lib/utils/nms.pyx:Line 25: modify np.int_t to np.intp_t
  - Edit fast_rcnn_root/lib/setup.py: Line 18 & 23, remove "-Wno-cpp" "-Wno-unused-function", where [] left is OK。
4. cd fast_rcnn_root/lib, and run python setup.py install
  - Error: cannot find "vcvarsall.bat"
    Reference: http://stackoverflow.com/questions/2817869/error-unable-to-find-vcvarsall-bat
    
    Solution
    Step 1: Open the appropriate Visual C++ 2008 Command Prompt
    Open the Start menu or Start screen, and search for "Visual C++ 2008 32-bit Command Prompt" (if your python is 32-bit) or "Visual C++ 2008 64-bit Command Prompt" (if your python is 64-bit). Run it. The command prompt should say Visual C++ 2008 ... in the title bar.
    Step 2: Set environment variables
    Set these environment variables in the command prompt you just opened.
    SET DISTUTILS_USE_SDK=1
    SET MSSdk=1
    Reference http://bugs.python.org/issue23246
    Step 3: Build and install
    cd to the package you want to build, and run python setup.py build, then python setup.py install. If you want to install in to a virtualenv, activate it before you build.
5. Copy two files cython_bbox.pyd and cython_nms.pyd from $python27_ROOT/Lib/site-packages/utils to $fast_rcnn_ROOT/lib/utils

Run Fast R-CNN

Download code: https://github.com/rbgirshick/fast-rcnn
Download fast-rcnn model
- Modify the code in $fast_rcnn_ROOT/data/script: change wget to curl
- Use a unix shell, like Git Bash and run: sh ./*.sh
Demo: Go to $fast_rcnn_ROOT, and run python tools/demo.py
Download VOC2007 (or write sh file to run)

Modify if use caffe-windows

Modify caffe/proto/caffe.proto and run extract_proto.bat and build :

//!!!!!Remove !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
// remove ROIPoolinng in V1Layer
// Message that stores parameters used by PythonLayer
message PythonParameter {
optional string module = 1;
optional string layer = 2;
 // This value is set to the attribute `param_str_` of your custom
 // `PythonLayer` object in Python before calling `setup()` method. This could
// be a number, a string, a dictionary in Python dict format or JSON etc. You
// may parse this string in `setup` method and use them in `forward` and
// `backward`.
//!!!!!Add !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
optional string param_str = 3 [default = ''];
}

Modify caffe/include/python_layer.h and caffe/src/caffe/layer_factory.cppabout python_layer
Modify all codes about layers/roi_pooling_layer.cpp and layers/smooth_L1_loss_layer.cpp, add fast_rcnn_layers.h, remove the corresponding code in vision_layer.h and loss_layer.h.
Add include file fast_rcnn_layers.h in files contains REGISTER_LAYER about the two new layers.

Train and test: https://github.com/rbgirshick/fast-rcnn#usage
- Modify code in fast-rcnn-master\lib\datasets__init__.py: change MATLAB = 'matlab' to MATLAB = 'matlab.exe'

Reference : Run Fast R-CNN

Notes for implementation of caffe project

Tutorial for caffe : Caffe tutorial

Reference

Implememtation of layers

Tutorial for Layers

layer.hpp

和layer相关的头文件有：

common_layers.hpp
data_layers.hpp
layer.hpp
loss_layers.hpp
neuron_layers.hpp
vision_layers.hpp

其中layer.hpp是抽象出来的基类，其他都是在其基础上的继承，也即剩下的五个头文件。在layer.hpp头文件里，包含了这几个头文件：

#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/util/device_alternate.hpp"

在device_alternate.hpp中，通过#ifdef CPU_ONLY定义了一些宏来取消GPU的调用：

#define STUB_GPU(classname)
#define STUB_GPU_FORWARD(classname, funcname)
#define STUB_GPU_BACKWARD(classname, funcname)

layer中有这三个主要参数：

LayerParameter layer_param_;                // 这个是protobuf文件中存储的layer参数
vector<share_ptr<Blob<Dtype>>> blobs_;      // 这个存储的是layer的参数，在程序中用的
vector<bool> param_propagate_down_;         // 这个bool表示是否计算各个blob参数的diff，即传播误差

Layer类的构建函数explicit Layer(const LayerParameter& param) : layer_param_(param)会尝试从protobuf文件读取参数。( The only thing we do is to copy blobs if there are any. )
其三个主要接口：

virtual void SetUp(const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top)
inline Dtype Forward(const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top);
inline void Backward(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const <Blob<Dtype>*>* bottom);

SetUp函数需要根据实际的参数设置进行实现，对各种类型的参数初始化；Forward和Backward对应前向计算和反向更新，输入统一都是bottom，输出为top，其中Backward里面有个propagate_down参数，用来表示该Layer是否反向传播参数。

在Forward和Backward的具体实现里，会根据Caffe::mode()进行对应的操作，即使用cpu或者gpu进行计算，两个都实现了对应的接口Forward_cpu、Forward_gpu和Backward_cpu、Backward_gpu，这些接口都是virtual，具体还是要根据layer的类型进行对应的计算（注意：有些layer并没有GPU计算的实现，所以封装时加入了CPU的计算作为后备）。另外，还实现了ToProto的接口，将Layer的参数写入到protocol buffer文件中。

data_layers.hpp

data_layers.hpp这个头文件包含了这几个头文件：

#include "boost/scoped_ptr.hpp"
#include "hdf5.h"
#include "leveldb/db.h"
#include "lmdb.h"
#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/filler.hpp"
#include "caffe/internal_thread.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

data_layer作为原始数据的输入层，处于整个网络的最底层，它可以从数据库leveldb、lmdb中读取数据，也可以直接从内存中读取，还可以从hdf5，甚至是原始的图像读入数据。

关于这几个数据库，简介如下：
- LevelDB是Google公司搞的一个高性能的key/value存储库，调用简单，数据是被Snappy压缩，据说效率很多，可以减少磁盘I/O。
- LMDB（Lightning Memory-Mapped Database），是个和levelDB类似的key/value存储库，但效果似乎更好些，其首页上写道“ultra-fast，ultra-compact”
- HDF（Hierarchical Data Format是一种为存储和处理大容量科学数据而设计的文件格式及相应的库文件，当前最流行的版本是HDF5,其文件包含两种基本数据对象：
- 群组（group）：类似文件夹，可以包含多个数据集或下级群组；
- 数据集（dataset）：数据内容，可以是多维数组，也可以是更复杂的数据类型。

caffe/filler.hpp的作用是在网络初始化时，根据layer的定义进行初始参数的填充，下面的代码很直观，根据FillerParameter指定的类型进行对应的参数填充。

// A function to get a specific filler from the specification given in
// FillerParameter. Ideally this would be replaced by a factory pattern,
// but we will leave it this way for now.
template <typename Dtype>
Filler<Dtype>* GetFiller(const FillerParameter& param) {
  const std::string& type = param.type();
  if (type == "constant") {
    return new ConstantFiller<Dtype>(param);
  } else if (type == "gaussian") {
    return new GaussianFiller<Dtype>(param);
  } else if (type == "positive_unitball") {
    return new PositiveUnitballFiller<Dtype>(param);
  } else if (type == "uniform") {
    return new UniformFiller<Dtype>(param);
  } else if (type == "xavier") {
    return new XavierFiller<Dtype>(param);
  } else {
    CHECK(false) << "Unknown filler name: " << param.type();
  }
  return (Filler<Dtype>*)(NULL);
}

internal_thread.hpp里面封装了pthread函数，继承的子类可以得到一个单独的线程，主要作用是在计算当前的一批数据时，在后台获取新一批的数据。

neuron_layers.hpp

输入了data后，就要计算了，比如常见的sigmoid、tanh等等，这些都计算操作被抽象成了neuron_layers.hpp里面的类NeuronLayer，这个层只负责具体的计算，因此明确定义了输入ExactNumBottomBlobs()和ExactNumTopBlobs()都是常量1,即输入一个blob，输出一个blob。

common_layers.hpp

NeruonLayer仅仅负责简单的一对一计算，而剩下的那些复杂的计算则通通放在了common_layers.hpp中。像ArgMaxLayer、ConcatLayer、FlattenLayer、SoftmaxLayer、SplitLayer和SliceLayer等各种对blob增减修改的操作。

loss_layers.hpp

前面的data_layer和common_layer都是中间计算层，虽然会涉及到反向传播，但传播的源头来自于loss_layer，即网络的最终端。这一层因为要计算误差，所以输入都是2个blob，输出1个blob。

vision_layers.hpp

vision_layer主要是图像卷积的操作，像convolusion、pooling、LRN(Local Response Normalization )都在里面。里面有个im2col的实现，主要是为了加速卷积的，具体见下Convolution Layer。

Some math functions

caffe_cpu_gemm : C ← αA × B + βC

// A: M*K; B: K*N; C : M*N
template <typename Dtype>
void caffe_cpu_gemm(const CBLAS_TRANSPOSE TransA,
    const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,
    const Dtype alpha, const Dtype* A, const Dtype* B, const Dtype beta,
    Dtype* C)

caffe_cpu_gemv: Y ← αAX + βY

// A: M*N; X: N*1; Y: M*1
void caffe_cpu_gemv<float>(const CBLAS_TRANSPOSE TransA, const int M,
    const int N, const float alpha, const float* A, const float* x,
    const float beta, float* y)

The function of blas : url

Innerproduct layer

Tutorial: http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

variables:
- M_ 表示的样本数
- K_ 表示单个样本的特征长度
- N_ 表示输出神经元的个数

Forward_cpu

// y <- wx (y <- xw')
caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasTrans, M_, N_, K_, (Dtype)1.,
      bottom_data, weight, (Dtype)0., top_data);
  if (bias_term_) {
// y <- y + b 
    caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, 1, (Dtype)1.,
        bias_multiplier_.cpu_data(),
        this->blobs_[1]->cpu_data(), (Dtype)1., top_data);
  }

Backward_cpu

top_diff: $\delta^{(l+1)}$
bottom_data: $a^{(l)}$
this->blobs_[0]->mutable_cpu_diff(): $\Delta W^{(l)}$
this->blobs_[1]->mutable_cpu_diff(): $\Delta b^{(l)}$
bottom[0]->mutable_cpu_diff(): $\delta^{(l)}$
Update $\Delta W^{(l)}$ : $\displaystyle \Delta W^{(l)} = \Delta W^{(l)} + \nabla\_{W^{(l)}} J(W,b; x,y) = \Delta W^{(l)} + \delta^{(l+1)}(a^{(l)})^T$
$\displaystyle \frac{\partial}{\partial W_{ij}^{(l)}} J(W,b; x,y) = a_j^{(l)}\delta_i^{(l+1)}$

// Gradient with respect to weight
caffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans, N_, K_, M_, (Dtype)1.,
    top_diff, bottom_data, (Dtype)0., this->blobs_[0]->mutable_cpu_diff());

Update $\Delta b^{(l)}$ : $\displaystyle \Delta b^{(l)} = \Delta b^{(l)} + \nabla\_{b^{(l)}} J(W,b; x,y) = \Delta b^{(l)} + \delta^{(l+1)}$
$\displaystyle \frac{\partial}{\partial b_{i}^{(l)}} J(W,b; x,y) = \delta_i^{(l+1)}$

// Gradient with respect to bias
caffe_cpu_gemv<Dtype>(CblasTrans, M_, N_, (Dtype)1., top_diff,
    bias_multiplier_.cpu_data(), (Dtype)0.,
    this->blobs_[1]->mutable_cpu_diff());

Update $\delta^{(l)}$ : $\displaystyle \delta^{(l)} = \left((W^{(l)})^T \delta^{(l+1)}\right)\times f'(z^{(l)})$ . ·
$f'(\cdot)$ implemented in reluLayer

// Gradient with respect to bottom data
caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, K_, N_, (Dtype)1.,
    top_diff, this->blobs_[0]->cpu_data(), (Dtype)0.,
    bottom[0]->mutable_cpu_diff());

Convolution layer

Tutorial: http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/

Using im2col to convert the images to a matrix(columns). Computation becomes multiplication pf matrices.
im2col
im2col2

template <typename Dtype>
void im2col_cpu(const Dtype* data_im, const int channels,
        const int height, const int width, const int kernel_h, const int kernel_w,
        const int pad_h, const int pad_w,
        const int stride_h, const int stride_w,
        Dtype* data_col) {
    int height_col = (height + 2 * pad_h - kernel_h) / stride_h + 1;
    int width_col = (width + 2 * pad_w - kernel_w) / stride_w + 1;
    int channels_col = channels * kernel_h * kernel_w;
    for (int c = 0; c < channels_col; ++c) {
        int w_offset = c % kernel_w;
        int h_offset = (c / kernel_w) % kernel_h;
        int c_im = c / kernel_h / kernel_w;
        for (int h = 0; h < height_col; ++h) {
            for (int w = 0; w < width_col; ++w) {
                int h_pad = h * stride_h - pad_h + h_offset;
                int w_pad = w * stride_w - pad_w + w_offset;
                if (h_pad >= 0 && h_pad < height && w_pad >= 0 && w_pad < width)
                    data_col[(c * height_col + h) * width_col + w] =
                        data_im[(c_im * height + h_pad) * width + w_pad];
                else
                    data_col[(c * height_col + h) * width_col + w] = 0;
            }
        }
    }
}

After using im2col function, it is similar as Innerproduct Layer

Forward_cpu

for (int n = 0; n < this->num_; ++n) {
    this->forward_cpu_gemm(bottom_data + bottom[i]->offset(n), weight,
        top_data + top[i]->offset(n));
    if (this->bias_term_) {
        const Dtype* bias = this->blobs_[1]->cpu_data();
        this->forward_cpu_bias(top_data + top[i]->offset(n), bias);
    }
}

template <typename Dtype>
void BaseConvolutionLayer<Dtype>::forward_cpu_gemm(const Dtype* input,
    const Dtype* weights, Dtype* output, bool skip_im2col) {
  const Dtype* col_buff = input;
  if (!is_1x1_) {
    if (!skip_im2col) {
      conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());
    }
    col_buff = col_buffer_.cpu_data();
  }
  for (int g = 0; g < group_; ++g) {
    caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ /
        group_, conv_out_spatial_dim_, kernel_dim_ / group_,
        (Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g,
        (Dtype)0., output + output_offset_ * g);
  }
}

template <typename Dtype>
void BaseConvolutionLayer<Dtype>::forward_cpu_bias(Dtype* output,
    const Dtype* bias) {
  caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_,
      height_out_ * width_out_, 1, (Dtype)1., bias, bias_multiplier_.cpu_data(),
      (Dtype)1., output);
}

Backward_cpu

$\displaystyle \nabla\_{W^{(l)}_k} J(W,b; x,y) = \sum\_{i = 1}^{m}(a_i^{(l)})* \text{rot90}(\delta_k^{(l+1)}, 2)$

$\displaystyle \nabla\_{b^{(l)}_k} J(W,b; x,y) = \sum\_{a, b}(\delta_k^{(l+1)})\_{a, b}$

$\delta_k^{(l)} = \text{upsample}\left((W_k^{(l)})^T\delta_k^{(l+1)}\right)\times f'(z_k^{(l)})$
where $k$ indexes the filter number.

// Bias gradient, if necessary.
if (this->bias_term_ && this->param_propagate_down_[1]) {
    Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();
    for (int n = 0; n < this->num_; ++n) {
        this->backward_cpu_bias(bias_diff, top_diff + top[i]->offset(n));
    }
}
for (int n = 0; n < this->num_; ++n) {
    // gradient w.r.t. weight. Note that we will accumulate diffs.
    if (this->param_propagate_down_[0]) {
      this->weight_cpu_gemm(bottom_data + bottom[i]->offset(n),
          top_diff + top[i]->offset(n), weight_diff);
    }
    // gradient w.r.t. bottom data, if necessary.
    if (propagate_down[i]) {
      this->backward_cpu_gemm(top_diff + top[i]->offset(n), weight,
      bottom_diff + bottom[i]->offset(n));
    }
}

template <typename Dtype>
void BaseConvolutionLayer<Dtype>::backward_cpu_bias(Dtype* bias,
    const Dtype* input) {
  caffe_cpu_gemv<Dtype>(CblasNoTrans, num_output_, height_out_ * width_out_, 1.,
      input, bias_multiplier_.cpu_data(), 1., bias);
}

template <typename Dtype>
void BaseConvolutionLayer<Dtype>::weight_cpu_gemm(const Dtype* input,
    const Dtype* output, Dtype* weights) {
  const Dtype* col_buff = input;
  if (!is_1x1_) {
    conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());
    col_buff = col_buffer_.cpu_data();
  }
  for (int g = 0; g < group_; ++g) {
    caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasTrans, conv_out_channels_ / group_,
        kernel_dim_ / group_, conv_out_spatial_dim_,
        (Dtype)1., output + output_offset_ * g, col_buff + col_offset_ * g,
        (Dtype)1., weights + weight_offset_ * g);
  }
}

template <typename Dtype>
void BaseConvolutionLayer<Dtype>::backward_cpu_gemm(const Dtype* output,
    const Dtype* weights, Dtype* input) {
  Dtype* col_buff = col_buffer_.mutable_cpu_data();
  if (is_1x1_) {
    col_buff = input;
  }
  for (int g = 0; g < group_; ++g) {
    caffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans, kernel_dim_ / group_,
        conv_out_spatial_dim_, conv_out_channels_ / group_,
        (Dtype)1., weights + weight_offset_ * g, output + output_offset_ * g,
        (Dtype)0., col_buff + col_offset_ * g);
  }
  if (!is_1x1_) {
    conv_col2im_cpu(col_buff, input);
  }
}

Deconvolution layer

Differences between deconvoluntion layer and convolution layer:

Forward: change forward_cpu_gemm to backward_cpu_gemm
Backward: change backward_cpu_gemm to forward_cpu_gemm

this->backward_cpu_gemm(bottom_data + bottom[i]->offset(n), weight,
          top_data + top[i]->offset(n));
this->forward_cpu_gemm(top_diff + top[i]->offset(n), weight,
          bottom_diff + bottom[i]->offset(n),
          this->param_propagate_down_[0]);

Softmax Layer

Not included Multinomial Logistic Loss

// Copyright 2013 Yangqing Jia
//
#include <algorithm>
#include <vector>
#include "caffe/layer.hpp"
#include "caffe/vision_layers.hpp"
#include "caffe/util/math_functions.hpp"
using std::max;
namespace caffe {
/**
 * 建立softmax网络层
 */
template <typename Dtype>
void SoftmaxLayer<Dtype>::SetUp(const vector<Blob<Dtype>*>& bottom,
    vector<Blob<Dtype>*>* top) {
  CHECK_EQ(bottom.size(), 1) << "Softmax Layer takes a single blob as input.";
  CHECK_EQ(top->size(), 1) << "Softmax Layer takes a single blob as output.";
  //输出分配空间
  (*top)[0]->Reshape(bottom[0]->num(), bottom[0]->channels(),
    bottom[0]->height(), bottom[0]->width());
  //sum_multiplier_这里都是1，用于辅助计算，可以看作一个行向量，或者行数为1的矩阵
  sum_multiplier_.Reshape(1, bottom[0]->channels(),
    bottom[0]->height(), bottom[0]->width());
  Dtype* multiplier_data = sum_multiplier_.mutable_cpu_data();
  for (int i = 0; i < sum_multiplier_.count(); ++i) {
  multiplier_data[i] = 1.;
  }
  //临时变量scale_分配空间，大小为num,可以看作一个列向量
  scale_.Reshape(bottom[0]->num(), 1, 1, 1);
}
template <typename Dtype>
void SoftmaxLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
  vector<Blob<Dtype>*>* top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();
  Dtype* top_data = (*top)[0]->mutable_cpu_data();
  Dtype* scale_data = scale_.mutable_cpu_data();
  //把输出看成是num层，每层dim个元素
  int num = bottom[0]->num();
  int dim = bottom[0]->count() / bottom[0]->num();
  memcpy(top_data, bottom_data, sizeof(Dtype) * bottom[0]->count());
  // we need to subtract the max to avoid numerical issues, compute the exp,
  // and then normalize.
  //找出每一层的最大值
  for (int i = 0; i < num; ++i) {
  scale_data[i] = bottom_data[i*dim];
  for (int j = 0; j < dim; ++j) {
    scale_data[i] = max(scale_data[i], bottom_data[i * dim + j]);
  }
  }
  // subtraction  通过矩阵相乘的方式来计算，有num层的top_data，每层元素减去该层的最大值。太巧妙了
  caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num, dim, 1, -1.,
  scale_data, sum_multiplier_.cpu_data(), 1., top_data);
  // C = alpha*op( A )*op( B ) + beta*C
  // Perform exponentiation 计算自然对数
  caffe_exp<Dtype>(num * dim, top_data, top_data);
  // sum after exp 每一层各自求和放到scale_data中
  caffe_cpu_gemv<Dtype>(CblasNoTrans, num, dim, 1., top_data,
    sum_multiplier_.cpu_data(), 0., scale_data);
  // Do division 每一层各自除以该层的和
  for (int i = 0; i < num; ++i) {
  caffe_scal<Dtype>(dim, Dtype(1.) / scale_data[i], top_data + i * dim);
  }
}
template <typename Dtype>
Dtype SoftmaxLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
  const bool propagate_down,
  vector<Blob<Dtype>*>* bottom) {
  const Dtype* top_diff = top[0]->cpu_diff();
  const Dtype* top_data = top[0]->cpu_data();
  Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();
  Dtype* scale_data = scale_.mutable_cpu_data();
  int num = top[0]->num();
  int dim = top[0]->count() / top[0]->num();
  memcpy(bottom_diff, top_diff, sizeof(Dtype) * top[0]->count());
  // Compute inner1d(top_diff, top_data) and subtract them from the bottom diff
  for (int i = 0; i < num; ++i) {
  scale_data[i] = caffe_cpu_dot<Dtype>(dim, top_diff + i * dim,
    top_data + i * dim);//每一层，top_diff和top_data计算内积
  }
  // subtraction  每一层bottom_diff的元素减去该层的对应的内积
  caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num, dim, 1, -1.,
    scale_data, sum_multiplier_.cpu_data(), 1., bottom_diff);
  // elementwise multiplication 元素各自相乘
  caffe_mul<Dtype>(top[0]->count(), bottom_diff, top_data, bottom_diff);
  return Dtype(0);
}
INSTANTIATE_CLASS(SoftmaxLayer);
}  // namespace caffe

Add a layer to caffe

v3 : (in `layer_factory`)

There are two ways to register a layer. Assuming that we have a layer like:

template <typename Dtype>
class MyAwesomeLayer : public Layer<Dtype> {
    // your implementations
};

and its type is its C++ class name, but without the "Layer" at the end ("MyAwesomeLayer" -> "MyAwesome").

If the layer is going to be created simply by its constructor, in your c++ file, add the following line: REGISTER_LAYER_CLASS(MyAwesome);
Or, if the layer is going to be created by another creator function, in the format of:

template <typename Dtype>
Layer<Dtype*> GetMyAwesomeLayer(const LayerParameter& param) {
    // your implementation
}

(for example, when your layer has multiple backends, see GetConvolutionLayer for a use case), then you can register the creator function instead, like REGISTER_LAYER_CREATOR(MyAwesome, GetMyAwesomeLayer)

Note: Each layer type should only be registered once.

v2 :

在./src/caffe/proto/caffe.proto 中增加对应layer的paramter message；
在./include/caffe/***layers.hpp中增加该layer的类的声明，*表示有common_layers.hpp,data_layers.hpp, neuron_layers.hpp, vision_layers.hpp 和loss_layers.hpp等；
在./src/caffe/layers/目录下新建.cpp和.cu文件，进行类实现。
在./src/caffe/gtest/中增加layer的测试代码，对所写的layer前传和反传进行测试，测试还包括速度。

v1 :

假设新增加的层命名为：NEW
1. 在vsrc/proto*的LayerParameter 的 LayerType下加 NEW= A_NUMBER；
2. 在src/layer_factory.cpp中，加 case LayerParameter_LayerType_NEW: return new NewLayer<Dtype>(param);
3. 在src/layers/下加 new_layer.cpp和 new_layer.cu代码；
4. 在include/caffe/vision_layers.hpp下增加代码（也可能在其他的.hpp下增加，如 common_layer.hpp, neuron_layer.hpp等，具体视增加的layer类型决定）；
5. 在upgrade_proto.cpp下增加对应的注册的代码。

Deeplab

Source: https://bitbucket.org/deeplab/deeplab-public/src
Homepage: http://ccvl.stat.ucla.edu/software/deeplab/

Hole algorithm

hole_alg

Implementation

Add hole_h and hole_w

im2col_mod

Notes of caffe

Installation

Matlab wrapper

Python wrapper (For Fast R-CNN)

Run Fast R-CNN

Notes for implementation of caffe project

Reference

Implememtation of layers

layer.hpp

data_layers.hpp

neuron_layers.hpp

common_layers.hpp

loss_layers.hpp

vision_layers.hpp

Some math functions

Innerproduct layer

Forward_cpu

Backward_cpu

Convolution layer

Forward_cpu

Backward_cpu

Deconvolution layer

Softmax Layer

Add a layer to caffe

v3 : (in layer_factory)

v2 :

v1 :

Deeplab

Hole algorithm

Implementation

内容目录

v3 : (in `layer_factory`)