@natsumi 2017-04-17T10:15:36.000000Z 字数 4954 阅读 758

# BP算法在encog中的实现

机器学习

http://blog.csdn.net/yt7589/article/details/52277407

## 反向传播过程的梯度计算

j表示0层神经元下标，i表示1层神经元下标，h表示第2层神经元下标……
$t_j$表示输出节点j对应的理想输出值。
E表示所有输出节点的误差之和，即
$E = \frac{1}{2} \sum_j \left( t_j - y_{0j} \right)^2$

### 输出层

\begin{align} \Delta w_{ij} & = - \varepsilon \frac{\partial E}{\partial w_{ij}} \\ & = - \varepsilon \frac{\partial E}{\partial z_{0j}} \cdot \frac{\partial z_{0j}}{\partial w_{ij}} \\ & = - \varepsilon \cdot \frac{\partial E}{\partial y_{0j}} \cdot \frac{\partial y_{0j}}{\partial z_{0j}} \cdot \frac{\partial z_{0j}}{\partial w_{ij}} \\ \end{align}

\begin{align} \Delta w_{ij} & = -\varepsilon \cdot \left[ - (t_j - y_{0j}) \right] \cdot \left[ y_{0j}(1-y_{0j}) \right] \cdot y_{1i} \\ & = \varepsilon \cdot (t_j - y_{0j}) \cdot \left[ y_{0j}(1-y_{0j}) \right] \cdot y_{1i} \end{align}

### 第1层

\begin{align} \Delta w_{hi} & = - \varepsilon \frac{\partial E}{\partial w_{hi}} \\ & = - \varepsilon \sum_j \left( \frac{\partial E_j}{\partial z_{0j}} \cdot \frac{\partial z_{0j}}{\partial y_{1i}} \right) \cdot \frac{\partial y_{1i}}{\partial z_{1i}} \cdot \frac{\partial z_{1i}}{\partial w_{hi}}\\ & = -\varepsilon \sum_j \left( w_{ij} \frac{\partial E_j}{\partial z_{0j}} \right) \cdot \left[ y_{0j}(1-y_{0j}) \right] \cdot y_{2h} \end{align}

## encog中的实现

//StochasticGradientDescent.java    private void processLevel(final int currentLevel) {        final int fromLayerIndex = flat.getLayerIndex()[currentLevel + 1];        final int toLayerIndex = flat.getLayerIndex()[currentLevel];        final int fromLayerSize = flat.getLayerCounts()[currentLevel + 1];        final int toLayerSize = flat.getLayerFeedCounts()[currentLevel];        double dropoutRate = 0;        final int index = this.flat.getWeightIndex()[currentLevel];        final ActivationFunction activation = this.flat                .getActivationFunctions()[currentLevel];        // handle weights        // array references are made method local to avoid one indirection        final double[] layerDelta = this.layerDelta;        final double[] weights = this.flat.getWeights();        final double[] gradients = this.gradients;        final double[] layerOutput = this.flat.getLayerOutput();        final double[] layerSums = this.flat.getLayerSums();        int yi = fromLayerIndex;        //...循环计算梯度    }

    public void process(final MLDataPair pair) {        errorCalculation = new ErrorCalculation();        double[] actual = new double[this.flat.getOutputCount()];        flat.compute(pair.getInputArray(), actual);        errorCalculation.updateError(actual, pair.getIdealArray(), pair.getSignificance());        // Calculate error for the output layer.        this.errorFunction.calculateError(                flat.getActivationFunctions()[0], this.flat.getLayerSums(),this.flat.getLayerOutput(),                pair.getIdeal().getData(), actual, this.layerDelta, 0,                pair.getSignificance());        // Apply regularization, if requested.        if( this.l1> Encog.DEFAULT_DOUBLE_EQUAL                || this.l2>Encog.DEFAULT_DOUBLE_EQUAL  ) {            double[] lp = new double[2];            calculateRegularizationPenalty(lp);            for(int i=0;i<actual.length;i++) {                double p = (lp[0]*this.l1) + (lp[1]*this.l2);                this.layerDelta[i]+=p;            }        }        // Propagate backwards (chain rule from calculus).        for (int i = this.flat.getBeginTraining(); i < this.flat                .getEndTraining(); i++) {            processLevel(i);        }    }

private ErrorFunction errorFunction = new CrossEntropyErrorFunction();

errorCalculation.updateError(...)调用的是CrossEntropyErrorFunction类的updateError方法，如下。其实就是计算了输出层节点输出和理想值之间的误差（乘了一个系数），写入了layerDelta数组总输出节点对应的位置。其他位置数值的含义我们结合下面循环计算梯度的过程来看。个人认为可以理解为梯度计算中的一些中间量。

//CrossEntropyErrorFunction.java    @Override    public void calculateError(ActivationFunction af, double[] b, double[] a,            double[] ideal, double[] actual, double[] error, double derivShift,             double significance) {        for(int i=0;i<actual.length;i++) {            error[i] = (ideal[i] - actual[i]) *significance;        }           }

        for (int y = 0; y < fromLayerSize; y++) {            final double output = layerOutput[yi];            double sum = 0;            int wi = index + y;            final int loopEnd = toLayerIndex+toLayerSize;            for (int xi = toLayerIndex; xi < loopEnd; xi++, wi += fromLayerSize) {                gradients[wi] += output * layerDelta[xi];                sum += weights[wi] * layerDelta[xi];            }            layerDelta[yi] = sum                    * (activation.derivativeFunction(layerSums[yi], layerOutput[yi]));            yi++;        }

processLevel是从输出层到输入层被调用的，所以计算输出层梯度的时候layerDelta只有输出层节点赋了值（理想值和实际值的差×系数，简称误差）。layerDelta直接和output相乘得到第一层到输出层的所有权值的梯度。layerDelta对应下式第一项，output是currentLayer+1层节点的输出，对应下式第三项，第二项我也不知道去哪儿了。。。。。。

$\frac{\partial E}{\partial w_{ij}} = (t_j - y_{0j}) \cdot \left[ y_{0j}(1-y_{0j}) \right] \cdot y_{1i}$

sum可以理解为下式中的求和部分，出了内层循环后计算sum乘以激活函数的导数，也就是计算出了下式的第一和第二部分的乘积，赋值给节点i（公式中的i在节点数组中的下标是yi）对应的layerDelta[yi]，在调用processLevel(currentLayer+1)过程中 做准备的，而layerDelta[yi]是用到的。

$\frac{\partial E}{\partial w_{hi}} = \sum_j \left( w_{ij} \frac{\partial E_j}{\partial z_{0j}} \right) \cdot \left[ y_{0j}(1-y_{0j}) \right] \cdot y_{2h}$

• 私有
• 公开
• 删除