@atry 2017-11-16T02:04:53.000000Z 字数 8053 阅读 1449

神经网络与函数式编程（六）依赖类型的类型类和多态函数

神经网络与函数式编程

在本系列的前几篇文章中，我们学习了深度学习和函数式编程的对应关系，以及如何用DeepLearning.scala创建函数式风格的神经网络。你可能会好奇，DeepLearning.scala是如何提供这些能力的。在接下来几篇文章中，我将揭示DeepLearning.scala如何实现以下这些功能的内部细节：

多态函数
反向传播
插件

在本篇文章中，我们将首先谈谈多态函数。

动机

DeepLearning.scala中内置了矩阵乘法的函数dot。dot接受两个多维数组INDArray作为参数，返回一个多维数组的计算图，比如可以这样用：

val ndArray1: INDArray = ???
val ndArray2: INDArray = ???
val ndArrayLayer: INDArrayLayer = dot(ndArray1, ndArray2)

如果用这个dot函数实现全连接层的话，两个参数中有一个会是权重INDArrayWeight，比如：

val x: INDArray = ???
val w: INDArrayWeight = ???
val y: INDArrayLayer = dot(x, w)

此外，通常神经网络有多层，除了第一层以外，其他层的输入都是上一层的输出，那么这种情况下，dot的两个参数中还会有一个是其他层输出的计算图INDArrayLayer，比如：

val x1: INDArray = ???
val w1: INDArrayWeight = ???
val x2: INDArrayLayer = dot(x1, w1)
val w2: INDArrayWeight = ???
val y: INDArrayLayer = dot(x2, w2)

结果就是，我们需要定义一个dot函数，能支持以上所有用法，就必须能支持各种不同的参数类型。
理想情况下，两个参数都应该支持INDArray、INDArrayLayer、INDArrayWeight三个类型，排列起来有九种签名：

def dot(operand0: INDArray, operand0: INDArray): INDArrayLayer
def dot(operand0: INDArrayLayer, operand0: INDArray): INDArrayLayer
def dot(operand0: INDArrayWeight, operand0: INDArray): INDArrayLayer
def dot(operand0: INDArray, operand0: INDArrayLayer): INDArrayLayer
def dot(operand0: INDArrayLayer, operand0: INDArrayLayer): INDArrayLayer
def dot(operand0: INDArrayWeight, operand0: INDArrayLayer): INDArrayLayer
def dot(operand0: INDArray, operand0: INDArrayWeight): INDArrayLayer
def dot(operand0: INDArrayLayer, operand0: INDArrayWeight): INDArrayLayer
def dot(operand0: INDArrayWeight, operand0: INDArrayWeight): INDArrayLayer

如果要重载这么多函数的话，就太过冗余了。

`DeepLearning`类型类

我们的做法是定义一个
DeepLearningAux模式的依赖类型的类型类，其中利用simulacrum生成繁琐的boilerplate代码：

@simulacrum.typeclass
trait DeepLearning[Differentiable] {
  type Data
  type Delta
  def forward(differentiable: Differentiable): Do[Tape[Data, Delta]]
}
object DeepLearning {
  type Aux[Differentiable, Data0, Delta0] = DeepLearning[Differentiable] {
    type Data = Data0
    type Delta = Delta0
  }
}

由于DeepLearning是个依赖类型的类型类，Data与Delta分别表示计算图的值类型与反向传播的导数类型。所以为Differentiable召唤DeepLearning实例时，可以在编译时求出Data与Delta。比如DeepLearning.scala内置插件提供了DeepLearning.Aux[INDArray, INDArray, INDArray]、DeepLearning.Aux[INDArrayLayer, INDArray, INDArray]和DeepLearning.Aux[INDArrayWeight, INDArray, INDArray]：

implicit def indArrayLiteralDeepLearning: DeepLearning.Aux[INDArray, INDArray, INDArray] = ???
implicit def indArrayLayerDeepLearning: DeepLearning.Aux[INDArrayLayer, INDArray, INDArray] = ???
implicit def indArrayWeightDeepLearning: DeepLearning.Aux[INDArrayWeight, INDArray, INDArray] = ???

那么召唤DeepLearning[INDArray]、DeepLearning[INDArrayLayer]或DeepLearning[INDArrayWeight]都可以在编译时把Data和Delta推断为INDArray。

val summonINDArrayDeepLearning = DeepLearning[INDArray]
type INDArrayData = summonINDArrayDeepLearning.Data
type INDArrayDelta = summonINDArrayDeepLearning.Delta
val summonINDArrayLayerDeepLearning = DeepLearning[INDArrayLayer]
type INDArrayLayerData = summonINDArrayLayerDeepLearning.Data
type INDArrayLayerDelta = summonINDArrayLayerDeepLearning.Delta
val summonINDArrayWeightDeepLearning = DeepLearning[INDArrayWeight]
type INDArrayWeightData = summonINDArrayWeightDeepLearning.Data
type INDArrayWeightDelta = summonINDArrayWeightDeepLearning.Delta

比如上面几行代码中，INDArrayData、INDArrayDelta、INDArrayLayerData、INDArrayLayerDelta、INDArrayWeightData、INDArrayWeightDelta都是INDArray。

而假如要召唤DeepLearning[DoubleLayer]，由于存在以下隐式值：

implicit def doubleLayerDeepLearning: DeepLearning.Aux[DoubleLayer, Double, Double] = ???

那么Data和Delta就会是Double：

val summonDoubleLayerDeepLearning = DeepLearning[DoubleLayer]
type DoubleLayerData = summonDoubleLayerDeepLearning.Data
type DoubleLayerDelta = summonDoubleLayerDeepLearning.Delta

利用`DeepLearning`类型类实现`dot`

有了DeepLearning类型类之后，我们把dot两个参数实现成泛型类型Operand0、Operand1，然后利用隐式参数DeepLearning.Aux来证明它们是可差分的多维数组。

def dot[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  implicit
  deeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],
): INDArrayLayer = {
  val do0: Do[Tape[INDArray, INDArray]] = deeplearning0.forward(operand0)
  val do1: Do[Tape[INDArray, INDArray]] = deeplearning1.forward(operand1)
  ???
}

这样一来，deeplearning0和deeplearning1要满足DeepLearning.Aux[Operand0, INDArray, INDArray]类型的话，就只能是DeepLearning.Aux[INDArray, INDArray, INDArray]、DeepLearning.Aux[INDArrayLayer, INDArray, INDArray]或者
DeepLearning.Aux[INDArrayWeight, INDArray, INDArray]，那么也就把Operand0和Operand1限制为INDArray、INDArrayLayer或INDArrayWeight了。

由于所有的DeepLearning实例都实现了forward方法，所以dot内部可以统一把Operand0和Operand1转为Do[Tape[INDArray, INDArray]]。

这样一来，dot就可以在参数中同时支持各种多维数组类型，包括多维数组的计算图和多维数组的权重，然后统一处理了。

多态方法

尽管我们的dot可以支持以上九种签名，但有的时候还是不够。比如max函数既可以支持多维数组之间的逐元素比较，也可以用来让多维数组和标量浮点比较，以便写出max(ndArray, 0.0)实现ReLU激活函数。

理想情况下，max应该支持额外的四倍签名：

def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  implicit
  deeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],
): INDArrayLayer = ???
def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  implicit
  deeplearning0: DeepLearning.Aux[Operand0, Double, Double],
  deeplearning1: DeepLearning.Aux[Operand1, INDArray, INDArray],
): INDArrayLayer = ???
def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  implicit
  deeplearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  deeplearning1: DeepLearning.Aux[Operand1, Double, Double],
): INDArrayLayer = ???
def max[Operand0, Operand1](operand0: Operand0, operand0: Operand1)(
  implicit
  deeplearning0: DeepLearning.Aux[Operand0, Double, Double],
  deeplearning1: DeepLearning.Aux[Operand1, Double, Double],
): DoubleLayer = ???

不幸的是，Scala编译器不支持这样的重载定义，有个两个原因：

这四个函数的签名在类型擦除之后都一样，导致生成的Java字节码冲突。
Scala编译器必须在隐式参数搜索之前确定调用哪个重载函数，而上述四个重载函数在搜索到隐式参数以前都无法确定Operand0和Operand1，也就没办法确定选用哪个重载函数了。

我们用Shapeless中Poly来解决重载问题。

我们把max定义为Poly2：

object max extends Poly2

然后提供上述4个max.Case：

implicit def maxDoubleDouble[Operand0, Operand1](
  implicit
  deepLearning0: DeepLearning.Aux[Operand0, Double, Double],
  deepLearning1: DeepLearning.Aux[Operand1, Double, Double]
) = max.at[Operand0, Operand1] { (operand0, operand1) =>
  ???
}
implicit def maxDoubleINDArray[Operand0, Operand1](
  implicit
  deepLearning0: DeepLearning.Aux[Operand0, Double, Double],
  deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]
) = max.at[Operand0, Operand1] { (operand0, operand1) =>
  ???
}
implicit def maxINDArrayDouble[Operand0, Operand1](
  implicit
  deepLearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  deepLearning1: DeepLearning.Aux[Operand1, Double, Double]
) = max.at[Operand0, Operand1] { (operand0, operand1) =>
  ???
}
implicit def maxINDArrayINDArray[Operand0, Operand1](
  implicit
  deepLearning0: DeepLearning.Aux[Operand0, INDArray, INDArray],
  deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]
) = max.at[Operand0, Operand1] { (operand0, operand1) =>
  ???
}

上面每一个Case函数根据Operand0和Operand1是普通值、计算图还是权重，又可以展开成9种Case。

最终就可以在调用max时支持四九三十六种Case，相当于36种签名。

比如：

val operand0: DoubleWeight = ???
val operand1: INDArrayLayer = ???
max(operand0, operand1)

在搜索到隐式参数之后，函数调用等价于：

max(operand0, operand1)(maxDoubleINDArray[DoubleWeight, INDArrayLayer](doubleLayerDeepLearning, indArrayLayerDeepLearning))

多态方法

除了多态函数以外，DeepLearning.scala的内置插件中还提供了一些中缀操作的多态方法，比如四则运算。这些多态方法是通过转发到shapeless.Poly2上实现的：

object + extends Poly2
object - extends Poly2
object * extends Poly2
object / extends Poly2
implicit final class PolymorphicOps[Operand0](operand0: Operand0) {
  def +[Operand1](operand1: Operand1)(
    implicit methodCase: +.Case[Operand0, Operand1]
  ): methodCase.Result = methodCase(operand0, operand1)
  def -[Operand1](operand1: Operand1)(
    implicit methodCase: -.Case[Operand0, Operand1]
  ): methodCase.Result = methodCase(operand0, operand1)
  def *[Operand1](operand1: Operand1)(
    implicit methodCase: *.Case[Operand0, Operand1]
  ): methodCase.Result = methodCase(operand0, operand1)
  def /[Operand1](operand1: Operand1)(
    implicit methodCase: /.Case[Operand0, Operand1]
  ): methodCase.Result = methodCase(operand0, operand1)
}

比如：

implicit doubleDivINDArray[Operand0, Operand1](
  implicit
  deepLearning0: DeepLearning.Aux[Operand0, Double, Double],
  deepLearning1: DeepLearning.Aux[Operand1, INDArray, INDArray]
) = /.at[Operand0, Operand1] { (operand0, operand1) =>
  ???
}
val operand0: DoubleWeight = ???
val operand1: INDArrayLayer = ???
operand0 / operand1

在搜索到隐式参数之后，函数调用等价于：

PolymorphicOps(operand0)./(operand1)(doubleDivINDArray[DoubleWeight, INDArrayLayer](doubleLayerDeepLearning, indArrayLayerDeepLearning))

结论

通过类型类DeepLearning和shapeless.Poly2，我们支持了多态函数和多态方法。用这种方式实现的多态函数和多态方法具有扩展性，只要增加新的隐式值，就能支持同名函数的新签名。

和其他功能一样，本篇文章中介绍的隐式值也是可以由插件实现。我将在本系列的下一篇文章中揭示DeepLearning.scala插件系统的内部实现细节。届时你将发现，如此强大的插件系统，其核心部分却异常简单。

神经网络与函数式编程（六）依赖类型的类型类和多态函数

动机

DeepLearning类型类

利用DeepLearning类型类实现dot

多态方法

多态方法

结论

内容目录

`DeepLearning`类型类

利用`DeepLearning`类型类实现`dot`