@zhenni94
2015-09-09T15:15:23.000000Z
字数 19914
阅读 3608
Intallation in Windows: VS2013+Cuda7.0+Python2.7
Reference:
If use cpu only:
-DCPU_ONLY -UUSE_CUDNN to Property->C/C++->Command Line -> Addition Options #define CPU_ONLY to file caffe/util/device_alternate.hppreturn nullptr; in function gpu_data() and mutable_gpu_data() in SyncedMemory.cpp C:\Python27($Python27_ROOT)pycaffe project: Configuration Properties -> C/C++ -> General->Additional Include Directories ($python_ROOT27\Lib\site-packages\numpy\core\include; $python27_ROOT\include;)Configuration Properties -> linker -> General ($python_ROOT\libs)$Graphviz2.38_ROOT\bin /python/requirements.txt with easydict added): python setup.py install$Python27_ROOT\Scriptspip install PACKAGE_NAME lmdb may not be installed, which is not affected the project muchcv2.pyd from the $opencv_ROOTbuild/python/2.7/x64 to ($Python27Root)/lib/site-packeges.cython_bbox and cython_nms: caffe_windows_root/python directory to fast_rcnn_root/caffe-fast-rcnn。fast_rcnn_root/lib/utils/nms.pyx:Line 25: modify np.int_t to np.intp_tfast_rcnn_root/lib/setup.py: Line 18 & 23, remove "-Wno-cpp" "-Wno-unused-function", where [] left is OK。cd fast_rcnn_root/lib, and run python setup.py install Solution
Step 1: Open the appropriate Visual C++ 2008 Command Prompt
Open the Start menu or Start screen, and search for "Visual C++ 2008 32-bit Command Prompt" (if your python is 32-bit) or "Visual C++ 2008 64-bit Command Prompt" (if your python is 64-bit). Run it. The command prompt should say Visual C++ 2008 ... in the title bar.
Step 2: Set environment variables
Set these environment variables in the command prompt you just opened.
SET DISTUTILS_USE_SDK=1
SET MSSdk=1
Reference http://bugs.python.org/issue23246
Step 3: Build and install
cd to the package you want to build, and run python setup.py build, then python setup.py install. If you want to install in to a virtualenv, activate it before you build.
cython_bbox.pyd and cython_nms.pyd from $python27_ROOT/Lib/site-packages/utils to $fast_rcnn_ROOT/lib/utils$fast_rcnn_ROOT/data/script: change wget to curlGit Bash and run: sh ./*.sh$fast_rcnn_ROOT, and run python tools/demo.pysh file to run)Modify if use caffe-windows
Modify caffe/proto/caffe.proto and run extract_proto.bat and build :
//!!!!!Remove !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!// remove ROIPoolinng in V1Layer// Message that stores parameters used by PythonLayermessage PythonParameter {optional string module = 1;optional string layer = 2;// This value is set to the attribute `param_str_` of your custom// `PythonLayer` object in Python before calling `setup()` method. This could// be a number, a string, a dictionary in Python dict format or JSON etc. You// may parse this string in `setup` method and use them in `forward` and// `backward`.//!!!!!Add !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!optional string param_str = 3 [default = ''];}
Modify caffe/include/python_layer.h and caffe/src/caffe/layer_factory.cppabout python_layer
layers/roi_pooling_layer.cpp and layers/smooth_L1_loss_layer.cpp, add fast_rcnn_layers.h, remove the corresponding code in vision_layer.h and loss_layer.h.fast_rcnn_layers.h in files contains REGISTER_LAYER about the two new layers.fast-rcnn-master\lib\datasets__init__.py: change MATLAB = 'matlab' to MATLAB = 'matlab.exe'Reference : Run Fast R-CNN
Tutorial for caffe : Caffe tutorial
Tutorial for Layers
和layer相关的头文件有:
common_layers.hppdata_layers.hpplayer.hpploss_layers.hppneuron_layers.hppvision_layers.hpp
其中layer.hpp是抽象出来的基类,其他都是在其基础上的继承,也即剩下的五个头文件。在layer.hpp头文件里,包含了这几个头文件:
#include "caffe/blob.hpp"#include "caffe/common.hpp"#include "caffe/proto/caffe.pb.h"#include "caffe/util/device_alternate.hpp"
在device_alternate.hpp中,通过#ifdef CPU_ONLY定义了一些宏来取消GPU的调用:
#define STUB_GPU(classname)#define STUB_GPU_FORWARD(classname, funcname)#define STUB_GPU_BACKWARD(classname, funcname)
layer中有这三个主要参数:
LayerParameter layer_param_; // 这个是protobuf文件中存储的layer参数vector<share_ptr<Blob<Dtype>>> blobs_; // 这个存储的是layer的参数,在程序中用的vector<bool> param_propagate_down_; // 这个bool表示是否计算各个blob参数的diff,即传播误差
Layer类的构建函数explicit Layer(const LayerParameter& param) : layer_param_(param)会尝试从protobuf文件读取参数。( The only thing we do is to copy blobs if there are any. )
其三个主要接口:
virtual void SetUp(const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top)inline Dtype Forward(const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top);inline void Backward(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const <Blob<Dtype>*>* bottom);
SetUp函数需要根据实际的参数设置进行实现,对各种类型的参数初始化;Forward和Backward对应前向计算和反向更新,输入统一都是bottom,输出为top,其中Backward里面有个propagate_down参数,用来表示该Layer是否反向传播参数。
在Forward和Backward的具体实现里,会根据Caffe::mode()进行对应的操作,即使用cpu或者gpu进行计算,两个都实现了对应的接口Forward_cpu、Forward_gpu和Backward_cpu、Backward_gpu,这些接口都是virtual,具体还是要根据layer的类型进行对应的计算(注意:有些layer并没有GPU计算的实现,所以封装时加入了CPU的计算作为后备)。另外,还实现了ToProto的接口,将Layer的参数写入到protocol buffer文件中。
data_layers.hpp这个头文件包含了这几个头文件:
#include "boost/scoped_ptr.hpp"#include "hdf5.h"#include "leveldb/db.h"#include "lmdb.h"#include "caffe/blob.hpp"#include "caffe/common.hpp"#include "caffe/filler.hpp"#include "caffe/internal_thread.hpp"#include "caffe/layer.hpp"#include "caffe/proto/caffe.pb.h"
data_layer作为原始数据的输入层,处于整个网络的最底层,它可以从数据库leveldb、lmdb中读取数据,也可以直接从内存中读取,还可以从hdf5,甚至是原始的图像读入数据。
关于这几个数据库,简介如下:
- LevelDB是Google公司搞的一个高性能的key/value存储库,调用简单,数据是被Snappy压缩,据说效率很多,可以减少磁盘I/O。
- LMDB(Lightning Memory-Mapped Database),是个和levelDB类似的key/value存储库,但效果似乎更好些,其首页上写道“ultra-fast,ultra-compact”
- HDF(Hierarchical Data Format是一种为存储和处理大容量科学数据而设计的文件格式及相应的库文件,当前最流行的版本是HDF5,其文件包含两种基本数据对象:
- 群组(group):类似文件夹,可以包含多个数据集或下级群组;
- 数据集(dataset):数据内容,可以是多维数组,也可以是更复杂的数据类型。
caffe/filler.hpp的作用是在网络初始化时,根据layer的定义进行初始参数的填充,下面的代码很直观,根据FillerParameter指定的类型进行对应的参数填充。
// A function to get a specific filler from the specification given in// FillerParameter. Ideally this would be replaced by a factory pattern,// but we will leave it this way for now.template <typename Dtype>Filler<Dtype>* GetFiller(const FillerParameter& param) {const std::string& type = param.type();if (type == "constant") {return new ConstantFiller<Dtype>(param);} else if (type == "gaussian") {return new GaussianFiller<Dtype>(param);} else if (type == "positive_unitball") {return new PositiveUnitballFiller<Dtype>(param);} else if (type == "uniform") {return new UniformFiller<Dtype>(param);} else if (type == "xavier") {return new XavierFiller<Dtype>(param);} else {CHECK(false) << "Unknown filler name: " << param.type();}return (Filler<Dtype>*)(NULL);}
internal_thread.hpp里面封装了pthread函数,继承的子类可以得到一个单独的线程,主要作用是在计算当前的一批数据时,在后台获取新一批的数据。
输入了data后,就要计算了,比如常见的sigmoid、tanh等等,这些都计算操作被抽象成了neuron_layers.hpp里面的类NeuronLayer,这个层只负责具体的计算,因此明确定义了输入ExactNumBottomBlobs()和ExactNumTopBlobs()都是常量1,即输入一个blob,输出一个blob。
NeruonLayer仅仅负责简单的一对一计算,而剩下的那些复杂的计算则通通放在了common_layers.hpp中。像ArgMaxLayer、ConcatLayer、FlattenLayer、SoftmaxLayer、SplitLayer和SliceLayer等各种对blob增减修改的操作。
前面的data_layer和common_layer都是中间计算层,虽然会涉及到反向传播,但传播的源头来自于loss_layer,即网络的最终端。这一层因为要计算误差,所以输入都是2个blob,输出1个blob。
vision_layer主要是图像卷积的操作,像convolusion、pooling、LRN(Local Response Normalization )都在里面。里面有个im2col的实现,主要是为了加速卷积的,具体见下Convolution Layer。
caffe_cpu_gemm : C ← αA × B + βC
// A: M*K; B: K*N; C : M*Ntemplate <typename Dtype>void caffe_cpu_gemm(const CBLAS_TRANSPOSE TransA,const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,const Dtype alpha, const Dtype* A, const Dtype* B, const Dtype beta,Dtype* C)
caffe_cpu_gemv: Y ← αAX + βY
// A: M*N; X: N*1; Y: M*1void caffe_cpu_gemv<float>(const CBLAS_TRANSPOSE TransA, const int M,const int N, const float alpha, const float* A, const float* x,const float beta, float* y)
The function of blas : url
Tutorial: http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
variables:
- M_ 表示的样本数
- K_ 表示单个样本的特征长度
- N_ 表示输出神经元的个数
// y <- wx (y <- xw')caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasTrans, M_, N_, K_, (Dtype)1.,bottom_data, weight, (Dtype)0., top_data);if (bias_term_) {// y <- y + bcaffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, 1, (Dtype)1.,bias_multiplier_.cpu_data(),this->blobs_[1]->cpu_data(), (Dtype)1., top_data);}
bottom[0]->mutable_cpu_diff():
Update
// Gradient with respect to weightcaffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans, N_, K_, M_, (Dtype)1.,top_diff, bottom_data, (Dtype)0., this->blobs_[0]->mutable_cpu_diff());
// Gradient with respect to biascaffe_cpu_gemv<Dtype>(CblasTrans, M_, N_, (Dtype)1., top_diff,bias_multiplier_.cpu_data(), (Dtype)0.,this->blobs_[1]->mutable_cpu_diff());
reluLayer
// Gradient with respect to bottom datacaffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, K_, N_, (Dtype)1.,top_diff, this->blobs_[0]->cpu_data(), (Dtype)0.,bottom[0]->mutable_cpu_diff());
Tutorial: http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/
Using im2col to convert the images to a matrix(columns). Computation becomes multiplication pf matrices.
template <typename Dtype>void im2col_cpu(const Dtype* data_im, const int channels,const int height, const int width, const int kernel_h, const int kernel_w,const int pad_h, const int pad_w,const int stride_h, const int stride_w,Dtype* data_col) {int height_col = (height + 2 * pad_h - kernel_h) / stride_h + 1;int width_col = (width + 2 * pad_w - kernel_w) / stride_w + 1;int channels_col = channels * kernel_h * kernel_w;for (int c = 0; c < channels_col; ++c) {int w_offset = c % kernel_w;int h_offset = (c / kernel_w) % kernel_h;int c_im = c / kernel_h / kernel_w;for (int h = 0; h < height_col; ++h) {for (int w = 0; w < width_col; ++w) {int h_pad = h * stride_h - pad_h + h_offset;int w_pad = w * stride_w - pad_w + w_offset;if (h_pad >= 0 && h_pad < height && w_pad >= 0 && w_pad < width)data_col[(c * height_col + h) * width_col + w] =data_im[(c_im * height + h_pad) * width + w_pad];elsedata_col[(c * height_col + h) * width_col + w] = 0;}}}}
After using im2col function, it is similar as Innerproduct Layer
for (int n = 0; n < this->num_; ++n) {this->forward_cpu_gemm(bottom_data + bottom[i]->offset(n), weight,top_data + top[i]->offset(n));if (this->bias_term_) {const Dtype* bias = this->blobs_[1]->cpu_data();this->forward_cpu_bias(top_data + top[i]->offset(n), bias);}}
template <typename Dtype>void BaseConvolutionLayer<Dtype>::forward_cpu_gemm(const Dtype* input,const Dtype* weights, Dtype* output, bool skip_im2col) {const Dtype* col_buff = input;if (!is_1x1_) {if (!skip_im2col) {conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());}col_buff = col_buffer_.cpu_data();}for (int g = 0; g < group_; ++g) {caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ /group_, conv_out_spatial_dim_, kernel_dim_ / group_,(Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g,(Dtype)0., output + output_offset_ * g);}}
template <typename Dtype>void BaseConvolutionLayer<Dtype>::forward_cpu_bias(Dtype* output,const Dtype* bias) {caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_,height_out_ * width_out_, 1, (Dtype)1., bias, bias_multiplier_.cpu_data(),(Dtype)1., output);}
where
// Bias gradient, if necessary.if (this->bias_term_ && this->param_propagate_down_[1]) {Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();for (int n = 0; n < this->num_; ++n) {this->backward_cpu_bias(bias_diff, top_diff + top[i]->offset(n));}}for (int n = 0; n < this->num_; ++n) {// gradient w.r.t. weight. Note that we will accumulate diffs.if (this->param_propagate_down_[0]) {this->weight_cpu_gemm(bottom_data + bottom[i]->offset(n),top_diff + top[i]->offset(n), weight_diff);}// gradient w.r.t. bottom data, if necessary.if (propagate_down[i]) {this->backward_cpu_gemm(top_diff + top[i]->offset(n), weight,bottom_diff + bottom[i]->offset(n));}}
template <typename Dtype>void BaseConvolutionLayer<Dtype>::backward_cpu_bias(Dtype* bias,const Dtype* input) {caffe_cpu_gemv<Dtype>(CblasNoTrans, num_output_, height_out_ * width_out_, 1.,input, bias_multiplier_.cpu_data(), 1., bias);}
template <typename Dtype>void BaseConvolutionLayer<Dtype>::weight_cpu_gemm(const Dtype* input,const Dtype* output, Dtype* weights) {const Dtype* col_buff = input;if (!is_1x1_) {conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());col_buff = col_buffer_.cpu_data();}for (int g = 0; g < group_; ++g) {caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasTrans, conv_out_channels_ / group_,kernel_dim_ / group_, conv_out_spatial_dim_,(Dtype)1., output + output_offset_ * g, col_buff + col_offset_ * g,(Dtype)1., weights + weight_offset_ * g);}}
template <typename Dtype>void BaseConvolutionLayer<Dtype>::backward_cpu_gemm(const Dtype* output,const Dtype* weights, Dtype* input) {Dtype* col_buff = col_buffer_.mutable_cpu_data();if (is_1x1_) {col_buff = input;}for (int g = 0; g < group_; ++g) {caffe_cpu_gemm<Dtype>(CblasTrans, CblasNoTrans, kernel_dim_ / group_,conv_out_spatial_dim_, conv_out_channels_ / group_,(Dtype)1., weights + weight_offset_ * g, output + output_offset_ * g,(Dtype)0., col_buff + col_offset_ * g);}if (!is_1x1_) {conv_col2im_cpu(col_buff, input);}}
Differences between deconvoluntion layer and convolution layer:
Forward: change forward_cpu_gemm to backward_cpu_gemm
Backward: change backward_cpu_gemm to forward_cpu_gemm
this->backward_cpu_gemm(bottom_data + bottom[i]->offset(n), weight,top_data + top[i]->offset(n));this->forward_cpu_gemm(top_diff + top[i]->offset(n), weight,bottom_diff + bottom[i]->offset(n),this->param_propagate_down_[0]);
Not included Multinomial Logistic Loss
// Copyright 2013 Yangqing Jia//#include <algorithm>#include <vector>#include "caffe/layer.hpp"#include "caffe/vision_layers.hpp"#include "caffe/util/math_functions.hpp"using std::max;namespace caffe {/*** 建立softmax网络层*/template <typename Dtype>void SoftmaxLayer<Dtype>::SetUp(const vector<Blob<Dtype>*>& bottom,vector<Blob<Dtype>*>* top) {CHECK_EQ(bottom.size(), 1) << "Softmax Layer takes a single blob as input.";CHECK_EQ(top->size(), 1) << "Softmax Layer takes a single blob as output.";//输出分配空间(*top)[0]->Reshape(bottom[0]->num(), bottom[0]->channels(),bottom[0]->height(), bottom[0]->width());//sum_multiplier_这里都是1,用于辅助计算,可以看作一个行向量,或者行数为1的矩阵sum_multiplier_.Reshape(1, bottom[0]->channels(),bottom[0]->height(), bottom[0]->width());Dtype* multiplier_data = sum_multiplier_.mutable_cpu_data();for (int i = 0; i < sum_multiplier_.count(); ++i) {multiplier_data[i] = 1.;}//临时变量scale_分配空间,大小为num,可以看作一个列向量scale_.Reshape(bottom[0]->num(), 1, 1, 1);}template <typename Dtype>void SoftmaxLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,vector<Blob<Dtype>*>* top) {const Dtype* bottom_data = bottom[0]->cpu_data();Dtype* top_data = (*top)[0]->mutable_cpu_data();Dtype* scale_data = scale_.mutable_cpu_data();//把输出看成是num层,每层dim个元素int num = bottom[0]->num();int dim = bottom[0]->count() / bottom[0]->num();memcpy(top_data, bottom_data, sizeof(Dtype) * bottom[0]->count());// we need to subtract the max to avoid numerical issues, compute the exp,// and then normalize.//找出每一层的最大值for (int i = 0; i < num; ++i) {scale_data[i] = bottom_data[i*dim];for (int j = 0; j < dim; ++j) {scale_data[i] = max(scale_data[i], bottom_data[i * dim + j]);}}// subtraction 通过矩阵相乘的方式来计算,有num层的top_data,每层元素减去该层的最大值。太巧妙了caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num, dim, 1, -1.,scale_data, sum_multiplier_.cpu_data(), 1., top_data);// C = alpha*op( A )*op( B ) + beta*C// Perform exponentiation 计算自然对数caffe_exp<Dtype>(num * dim, top_data, top_data);// sum after exp 每一层各自求和放到scale_data中caffe_cpu_gemv<Dtype>(CblasNoTrans, num, dim, 1., top_data,sum_multiplier_.cpu_data(), 0., scale_data);// Do division 每一层各自除以该层的和for (int i = 0; i < num; ++i) {caffe_scal<Dtype>(dim, Dtype(1.) / scale_data[i], top_data + i * dim);}}template <typename Dtype>Dtype SoftmaxLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,const bool propagate_down,vector<Blob<Dtype>*>* bottom) {const Dtype* top_diff = top[0]->cpu_diff();const Dtype* top_data = top[0]->cpu_data();Dtype* bottom_diff = (*bottom)[0]->mutable_cpu_diff();Dtype* scale_data = scale_.mutable_cpu_data();int num = top[0]->num();int dim = top[0]->count() / top[0]->num();memcpy(bottom_diff, top_diff, sizeof(Dtype) * top[0]->count());// Compute inner1d(top_diff, top_data) and subtract them from the bottom difffor (int i = 0; i < num; ++i) {scale_data[i] = caffe_cpu_dot<Dtype>(dim, top_diff + i * dim,top_data + i * dim);//每一层,top_diff和top_data计算内积}// subtraction 每一层bottom_diff的元素减去该层的对应的内积caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num, dim, 1, -1.,scale_data, sum_multiplier_.cpu_data(), 1., bottom_diff);// elementwise multiplication 元素各自相乘caffe_mul<Dtype>(top[0]->count(), bottom_diff, top_data, bottom_diff);return Dtype(0);}INSTANTIATE_CLASS(SoftmaxLayer);} // namespace caffe
layer_factory)There are two ways to register a layer. Assuming that we have a layer like:
template <typename Dtype>class MyAwesomeLayer : public Layer<Dtype> {// your implementations};
and its type is its C++ class name, but without the "Layer" at the end ("MyAwesomeLayer" -> "MyAwesome").
REGISTER_LAYER_CLASS(MyAwesome);
template <typename Dtype>Layer<Dtype*> GetMyAwesomeLayer(const LayerParameter& param) {// your implementation}
(for example, when your layer has multiple backends, see GetConvolutionLayer for a use case), then you can register the creator function instead, like REGISTER_LAYER_CREATOR(MyAwesome, GetMyAwesomeLayer)
Note: Each layer type should only be registered once.
./src/caffe/proto/caffe.proto 中增加对应layer的paramter message;./include/caffe/***layers.hpp中增加该layer的类的声明,*表示有common_layers.hpp,data_layers.hpp, neuron_layers.hpp, vision_layers.hpp 和loss_layers.hpp等;./src/caffe/layers/目录下新建.cpp和.cu文件,进行类实现。./src/caffe/gtest/中增加layer的测试代码,对所写的layer前传和反传进行测试,测试还包括速度。假设新增加的层命名为:NEW
1. 在vsrc/proto*的LayerParameter 的 LayerType下 加 NEW= A_NUMBER;
2. 在src/layer_factory.cpp中, 加 case LayerParameter_LayerType_NEW: return new NewLayer<Dtype>(param);
3. 在src/layers/下 加 new_layer.cpp和 new_layer.cu代码;
4. 在include/caffe/vision_layers.hpp下增加代码(也可能在其他的.hpp下增加,如 common_layer.hpp, neuron_layer.hpp等,具体视增加的layer类型决定);
5. 在upgrade_proto.cpp下增加对应的注册的代码。

Add hole_h and hole_w
