@atry
2017-11-16T02:04:53.000000Z
字数 8053
阅读 1501
神经网络与函数式编程
在本系列的前几篇文章中,我们学习了深度学习和函数式编程的对应关系,以及如何用DeepLearning.scala创建函数式风格的神经网络。你可能会好奇,DeepLearning.scala是如何提供这些能力的。在接下来几篇文章中,我将揭示DeepLearning.scala如何实现以下这些功能的内部细节:
在本篇文章中,我们将首先谈谈多态函数。
DeepLearning.scala中内置了矩阵乘法的函数dot。dot接受两个多维数组INDArray作为参数,返回一个多维数组的计算图,比如可以这样用:
val ndArray1: INDArray = ???val ndArray2: INDArray = ???val ndArrayLayer: INDArrayLayer = dot(ndArray1, ndArray2)
如果用这个dot函数实现全连接层的话,两个参数中有一个会是权重INDArrayWeight,比如:
val x: INDArray = ???val w: INDArrayWeight = ???val y: INDArrayLayer = dot(x, w)
此外,通常神经网络有多层,除了第一层以外,其他层的输入都是上一层的输出,那么这种情况下,dot的两个参数中还会有一个是其他层输出的计算图INDArrayLayer,比如:
val x1: INDArray = ???val w1: INDArrayWeight = ???val x2: INDArrayLayer = dot(x1, w1)val w2: INDArrayWeight = ???val y: INDArrayLayer = dot(x2, w2)
结果就是,我们需要定义一个dot函数,能支持以上所有用法,就必须能支持各种不同的参数类型。
理想情况下,两个参数都应该支持INDArray、INDArrayLayer、INDArrayWeight三个类型,排列起来有九种签名:
def dot(operand0: INDArray, operand0: INDArray): INDArrayLayerdef dot(operand0: INDArrayLayer, operand0: INDArray): INDArrayLayerdef dot(operand0: INDArrayWeight, operand0: INDArray): INDArrayLayerdef dot(operand0: INDArray, operand0: INDArrayLayer): INDArrayLayerdef dot(operand0: INDArrayLayer, operand0: INDArrayLayer): INDArrayLayerdef dot(operand0: INDArrayWeight, operand0: INDArrayLayer): INDArrayLayerdef dot(operand0: INDArray, operand0: INDArrayWeight): INDArrayLayerdef dot(operand0: INDArrayLayer, operand0: INDArrayWeight): INDArrayLayerdef dot(operand0: INDArrayWeight, operand0: INDArrayWeight): INDArrayLayer
如果要重载这么多函数的话,就太过冗余了。
DeepLearning类型类我们的做法是定义一个
DeepLearningAux模式的依赖类型的类型类,其中利用simulacrum生成繁琐的boilerplate代码:
@simulacrum.typeclasstrait DeepLearning[Differentiable] {type Datatype Deltadef forward(differentiable: Differentiable): Do[Tape[Data, Delta]]}object DeepLearning {type Aux[Differentiable, Data0, Delta0] = DeepLearning[Differentiable] {type Data = Data0type Delta = Delta0}}
由于DeepLearning是个依赖类型的类型类,Data与Delta分别表示计算图的值类型与反向传播的导数类型。所以为Differentiable召唤DeepLearning实例时,可以在编译时求出Data与Delta。比如DeepLearning.scala内置插件提供了DeepLearning.Aux[INDArray, INDArray, INDArray]、DeepLearning.Aux[INDArrayLayer, INDArray, INDArray]和DeepLearning.Aux[INDArrayWeight, INDArray, INDArray]:
implicit def indArrayLiteralDeepLearning: DeepLearning.Aux[INDArray, INDArray, INDArray] = ???implicit def indArrayLayerDeepLearning: DeepLearning.Aux[INDArrayLayer, INDArray, INDArray] = ???implicit def indArrayWeightDeepLearning: DeepLearning.Aux[INDArrayWeight, INDArray, INDArray] = ???
那么召唤DeepLearning[INDArray]、DeepLearning[INDArrayLayer]或DeepLearning[INDArrayWeight]都可以在编译时把Data和Delta推断为INDArray。
val summonINDArrayDeepLearning = DeepLearning[INDArray]type INDArrayData = summonINDArrayDeepLearning.Datatype INDArrayDelta = summonINDArrayDeepLearning.Deltaval summonINDArrayLayerDeepLearning = DeepLearning[INDArrayLayer]type INDArrayLayerData = summonINDArrayLayerDeepLearning.Datatype INDArrayLayerDelta = summonINDArrayLayerDeepLearning.Deltaval summonINDArrayWeightDeepLearning = DeepLearning[INDArrayWeight]type INDArrayWeightData = summonINDArrayWeightDeepLearning.Datatype INDArrayWeightDelta = summonINDArrayWeightDeepLearning.Delta
比如上面几行代码中,INDArrayData、INDArrayDelta、INDArrayLayerData、INDArrayLayerDelta、INDArrayWeightData、INDArrayWeightDelta都是INDArray。
而假如要召唤DeepLearning[DoubleLayer],由于存在以下隐式值:
implicit def doubleLayerDeepLearning: DeepLearning.Aux[DoubleLayer, Double, Double] = ???
那么Data和Delta就会是Double:
val summonDoubleLayerDeepLearning = DeepLearning[DoubleLayer]type DoubleLayerData = summonDoubleLayerDeepLearning.Datatype DoubleLayerDelta = summonDoubleLayerDeepLearning.Delta
DeepLearning类型类实现dot有了DeepLearning类型类之后,我们把dot两个参数实现成泛型类型Operand0、Operand1,然后利用隐式参数DeepLearning.Aux来证明它们是可差分的多维数组。
def dot[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(implicitdeeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],): INDArrayLayer = {val do0: Do[Tape[INDArray, INDArray]] = deeplearning0.forward(operand0)val do1: Do[Tape[INDArray, INDArray]] = deeplearning1.forward(operand1)???}
这样一来,deeplearning0和deeplearning1要满足DeepLearning.Aux[Operand0, INDArray, INDArray]类型的话,就只能是DeepLearning.Aux[INDArray, INDArray, INDArray]、DeepLearning.Aux[INDArrayLayer, INDArray, INDArray]或者
DeepLearning.Aux[INDArrayWeight, INDArray, INDArray],那么也就把Operand0和Operand1限制为INDArray、INDArrayLayer或INDArrayWeight了。
由于所有的DeepLearning实例都实现了forward方法,所以dot内部可以统一把Operand0和Operand1转为Do[Tape[INDArray, INDArray]]。
这样一来,dot就可以在参数中同时支持各种多维数组类型,包括多维数组的计算图和多维数组的权重,然后统一处理了。
尽管我们的dot可以支持以上九种签名,但有的时候还是不够。比如max函数既可以支持多维数组之间的逐元素比较,也可以用来让多维数组和标量浮点比较,以便写出max(ndArray, 0.0)实现ReLU激活函数。
理想情况下,max应该支持额外的四倍签名:
def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(implicitdeeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],): INDArrayLayer = ???def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(implicitdeeplearning0: DeepLearning.Aux[Operand0, Double, Double],deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],): INDArrayLayer = ???def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(implicitdeeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],deeplearning1: DeepLearning.Aux[Operand1, Double, Double],): INDArrayLayer = ???def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(implicitdeeplearning0: DeepLearning.Aux[Operand0, Double, Double],deeplearning1: DeepLearning.Aux[Operand1, Double, Double],): DoubleLayer = ???
不幸的是,Scala编译器不支持这样的重载定义,有个两个原因:
Operand0和Operand1,也就没办法确定选用哪个重载函数了。我们用Shapeless中Poly来解决重载问题。
我们把max定义为Poly2:
object max extends Poly2
然后提供上述4个max.Case:
implicit def maxDoubleDouble[Operand0, Operand1](implicitdeepLearning0: DeepLearning.Aux[Operand0, Double, Double],deepLearning1: DeepLearning.Aux[Operand1, Double, Double]) = max.at[Operand0, Operand1] { (operand0, operand1) =>???}implicit def maxDoubleINDArray[Operand0, Operand1](implicitdeepLearning0: DeepLearning.Aux[Operand0, Double, Double],deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]) = max.at[Operand0, Operand1] { (operand0, operand1) =>???}implicit def maxINDArrayDouble[Operand0, Operand1](implicitdeepLearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],deepLearning1: DeepLearning.Aux[Operand1, Double, Double]) = max.at[Operand0, Operand1] { (operand0, operand1) =>???}implicit def maxINDArrayINDArray[Operand0, Operand1](implicitdeepLearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]) = max.at[Operand0, Operand1] { (operand0, operand1) =>???}
上面每一个Case函数根据Operand0和Operand1是普通值、计算图还是权重,又可以展开成9种Case。
最终就可以在调用max时支持四九三十六种Case,相当于36种签名。
比如:
val operand0: DoubleWeight = ???val operand1: INDArrayLayer = ???max(operand0, operand1)
在搜索到隐式参数之后,函数调用等价于:
max(operand0, operand1)(maxDoubleINDArray[DoubleWeight, INDArrayLayer](doubleLayerDeepLearning, indArrayLayerDeepLearning))
除了多态函数以外,DeepLearning.scala的内置插件中还提供了一些中缀操作的多态方法,比如四则运算。这些多态方法是通过转发到shapeless.Poly2上实现的:
object + extends Poly2object - extends Poly2object * extends Poly2object / extends Poly2implicit final class PolymorphicOps[Operand0](operand0: Operand0) {def +[Operand1](operand1: Operand1)(implicit methodCase: +.Case[Operand0, Operand1]): methodCase.Result = methodCase(operand0, operand1)def -[Operand1](operand1: Operand1)(implicit methodCase: -.Case[Operand0, Operand1]): methodCase.Result = methodCase(operand0, operand1)def *[Operand1](operand1: Operand1)(implicit methodCase: *.Case[Operand0, Operand1]): methodCase.Result = methodCase(operand0, operand1)def /[Operand1](operand1: Operand1)(implicit methodCase: /.Case[Operand0, Operand1]): methodCase.Result = methodCase(operand0, operand1)}
比如:
implicit doubleDivINDArray[Operand0, Operand1](implicitdeepLearning0: DeepLearning.Aux[Operand0, Double, Double],deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]) = /.at[Operand0, Operand1] { (operand0, operand1) =>???}val operand0: DoubleWeight = ???val operand1: INDArrayLayer = ???operand0 / operand1
在搜索到隐式参数之后,函数调用等价于:
PolymorphicOps(operand0)./(operand1)(doubleDivINDArray[DoubleWeight, INDArrayLayer](doubleLayerDeepLearning, indArrayLayerDeepLearning))
通过类型类DeepLearning和shapeless.Poly2,我们支持了多态函数和多态方法。用这种方式实现的多态函数和多态方法具有扩展性,只要增加新的隐式值,就能支持同名函数的新签名。
和其他功能一样,本篇文章中介绍的隐式值也是可以由插件实现。我将在本系列的下一篇文章中揭示DeepLearning.scala插件系统的内部实现细节。届时你将发现,如此强大的插件系统,其核心部分却异常简单。