@Team 2019-06-08T06:08:02.000000Z 字数 10294 阅读 4116

终于来了！TensorFlow 2.0入门指南（上篇）

叶虎

TensorFlow虽是深度学习领域最广泛使用的框架，但是对比PyTorch这一动态图框架，采用静态图（Graph模式）的TensorFlow确实是难用。好在最近TensorFlow支持了eager模式，对标PyTorch的动态执行机制。更进一步地，Google在最近推出了全新的版本TensorFlow 2.0，2.0版本相比1.0版本不是简单地更新，而是一次重大升级（虽然目前只发布了preview版本）。简单地来说，TensorFlow 2.0默认采用eager执行模式，而且重整了很多混乱的模块。毫无疑问，2.0版本将会逐渐替换1.0版本，所以很有必要趁早入手TensorFlow 2.0。这篇文章将简明扼要地介绍TensorFlow 2.0，以求快速入门。

Eager执行

TensorFlow的Eager执行时一种命令式编程（imperative programming），这和原生Python是一致的，当你执行某个操作时是立即返回结果的。而TensorFlow一直是采用Graph模式，即先构建一个计算图，然后需要开启Session，喂进实际的数据才真正执行得到结果。显然，eager执行更简洁，我们可以更容易debug自己的代码，这也是为什么PyTorch更简单好用的原因。一个简单的例子如下：

x = tf.ones((2, 2), dtype=tf.dtypes.float32)
y = tf.constant([[1, 2],
                 [3, 4]], dtype=tf.dtypes.float32)
z = tf.matmul(x, y)
print(z)
# tf.Tensor(
# [[4. 6.]
#  [4. 6.]], shape=(2, 2), dtype=float32)
print(z.numpy())
# [[4. 6.]
# [4. 6.]]

可以看到在eager执行下，每个操作后的返回值是tf.Tensor，其包含具体值，不再像Graph模式下那样只是一个计算图节点的符号句柄。由于可以立即看到结果，这非常有助于程序debug。更进一步地，调用tf.Tensor.numpy()方法可以获得Tensor所对应的numpy数组。

这种eager执行的另外一个好处是可以使用Python原生功能，比如下面的条件判断：

random_value = tf.random.uniform([], 0, 1)
x = tf.reshape(tf.range(0, 4), [2, 2])
print(random_value)
if random_value.numpy() > 0.5:
    y = tf.matmul(x, x)
else:
    y = tf.add(x, x)

这种动态控制流主要得益于eager执行得到Tensor可以取出numpy值，这避免了使用Graph模式下的tf.cond和tf.while等算子。

另外一个重要的问题，在egaer模式下如何计算梯度。在Graph模式时，我们在构建模型前向图时，同时也会构建梯度图，这样实际喂数据执行时可以很方便计算梯度。但是eager执行是动态的，这就需要每一次执行都要记录这些操作以计算梯度，这是通过tf.GradientTape来追踪所执行的操作以计算梯度，下面是一个计算实例：

w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w + 2. * w + 5.
grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 4.]], shape=(1, 1), dtype=float32)

对于eager执行，每个tape会记录当前所执行的操作，这个tape只对当前计算有效，并计算相应的梯度。PyTorch也是动态图模式，但是与TensorFlow不同，它是每个需要计算Tensor会拥有grad_fn以追踪历史操作的梯度。

TensorFlow 2.0引入的eager提高了代码的简洁性，而且更容易debug。但是对于性能来说，eager执行相比Graph模式会有一定的损失。这不难理解，毕竟原生的Graph模式是先构建好静态图，然后才真正执行。这对于在分布式训练、性能优化和生产部署方面具有优势。但是好在，TensorFlow 2.0引入了tf.function和AutoGraph来缩小eager执行和Graph模式的性能差距，其核心是将一系列的Python语法转化为高性能的graph操作。

AutoGraph

AutoGraph在TensorFlow 1.x已经推出，主要是可以将一些常用的Python代码转化为TensorFlow支持的Graph代码。一个典型的例子是在TensorFlow中我们必须使用tf.while和tf.cond等复杂的算子来实现动态流程控制，但是现在我们可以使用Python原生的for和if等语法写代码，然后采用AutoGraph转化为TensorFlow所支持的代码，如下面的例子：

def square_if_positive(x):
    if x > 0:
        x = x * x
    else:
        x = 0.0
    return x
# eager 模式
print('Eager results: %2.2f, %2.2f' % (square_if_positive(tf.constant(9.0)),
                                       square_if_positive(tf.constant(-9.0))))
# graph 模式
tf_square_if_positive = tf.autograph.to_graph(square_if_positive)
with tf.Graph().as_default():
  # The result works like a regular op: takes tensors in, returns tensors.
  # You can inspect the graph using tf.get_default_graph().as_graph_def()
    g_out1 = tf_square_if_positive(tf.constant( 9.0))
    g_out2 = tf_square_if_positive(tf.constant(-9.0))
    with tf.compat.v1.Session() as sess:
        print('Graph results: %2.2f, %2.2f\n' % (sess.run(g_out1), sess.run(g_out2)))

上面我们定义了一个square_if_positive函数，它内部使用的Python的原生的if语法，对于TensorFlow 2.0的eager执行，这是没有问题的。然而这是TensorFlow 1.x所不支持的，但是使用AutoGraph可以将这个函数转为Graph函数，你可以将其看成一个常规TensorFlow op，其可以在Graph模式下运行（tf2 没有Session，这是tf1.x的特性，想使用tf1.x的话需要调用tf.compat.v1）。大家要注意eager模式和Graph模式的差异，尽管结果是一样的，但是Graph模式更高效。
从本质上讲，AutoGraph是将Python代码转为TensorFlow原生的代码，我们可以进一步看到转化后的代码：

print(tf.autograph.to_code(square_if_positive))
#################################################
from __future__ import print_function
def tf__square_if_positive(x):
  try:
    with ag__.function_scope('square_if_positive'):
      do_return = False
      retval_ = None
      cond = ag__.gt(x, 0)
      def if_true():
        with ag__.function_scope('if_true'):
          x_1, = x,
          x_1 = x_1 * x_1
          return x_1
      def if_false():
        with ag__.function_scope('if_false'):
          x = 0.0
          return x
      x = ag__.if_stmt(cond, if_true, if_false)
      do_return = True
      retval_ = x
      return retval_
  except:
    ag__.rewrite_graph_construction_error(ag_source_map__)
tf__square_if_positive.autograph_info__ = {}

可以看到AutoGraph转化的代码定义了两个条件函数，然后调用if_stmt op，应该就是类似tf.cond的op。
AutoGraph支持很多Python特性，比如循环：

def sum_even(items):
    s = 0
    for c in items:
        if c % 2 > 0:
            continue
        s += c
    return s
print('Eager result: %d' % sum_even(tf.constant([10,12,15,20])))
tf_sum_even = tf.autograph.to_graph(sum_even)
with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
    print('Graph result: %d\n\n' % sess.run(tf_sum_even(tf.constant([10,12,15,20]))))

对于大部分Python特性AutoGraph是支持的，但是其仍然有限制，具体可以见Capabilities and Limitations。

此外，要注意的一点是，经过AutoGraph转换的新函数是可以eager模式下执行的，但是性能却并不会比转换前的高，你可以对比：

x = tf.constant([10, 12, 15, 20])
print("Eager at orginal code:", timeit.timeit(lambda: sum_even(x), number=100))
print("Eager at autograph code:", timeit.timeit(lambda: tf_sum_even(x), number=100))
with tf.Graph().as_default(), tf.compat.v1.Session() as sess:
    graph_op = tf_sum_even(tf.constant([10, 12, 15, 20]))
    sess.run(graph_op)  # remove first call
    print("Graph at autograph code:", timeit.timeit(lambda: sess.run(graph_op), number=100))
##########################################
Eager at orginal code: 0.05176109499999981
Eager at autograph code: 0.11203173799999977
Graph at autograph code: 0.03418808900000059

从结果上看，Graph模式下的执行效率是最高的，原来的代码在eager模式下效率次之，经AutoGraph转换后的代码效率最低。

所以，在TensorFlow 2.0，我们一般不会直接使用tf.autograph，因为eager执行下效率没有提升。要真正达到Graph模式下的效率，要依赖tf.function这个更强大的利器。

性能优化：tf.function

尽管eager执行更简洁，但是Graph模式却是性能更高，为了减少这个性能gap，TensorFlow 2.0引入了tf.function，先给出官方对tf.function的说明：

function constructs a callable that executes a TensorFlow graph (tf.Graph) created by tracing the TensorFlow operations in func. This allows the TensorFlow runtime to apply optimizations and exploit parallelism in the computation defined by func.

简单来说，就是tf.function可以将一个func中的TensorFlow操作构建为一个Graph，这样在调用时是执行这个Graph，这样计算性能更优。比如下面的例子：

def f(x, y):
    print(x, y)
    return tf.reduce_mean(tf.multiply(x ** 2, 3) + y)
g = tf.function(f)
x = tf.constant([[2.0, 3.0]])
y = tf.constant([[3.0, -2.0]])
# `f` and `g` will return the same value, but `g` will be executed as a
# TensorFlow graph.
assert f(x, y).numpy() == g(x, y).numpy()
# tf.Tensor([[2. 3.]], shape=(1, 2), dtype=float32) tf.Tensor([[ 3. -2.]], shape=(1, 2), dtype=float32)
# Tensor("x:0", shape=(1, 2), dtype=float32) Tensor("y:0", shape=(1, 2), dtype=float32)

如上面的例子，被tf.function装饰的函数将以Graph模式执行，可以把它想象一个封装了Graph的TF op，直接调用它也会立即得到Tensor结果，但是其内部是高效执行的。我们在内部打印Tensor时，eager执行会直接打印Tensor的值，而Graph模式打印的是Tensor句柄，其无法调用numpy方法取出值，这和TF 1.x的Graph模式是一致的。
由于tf.function装饰的函数是Graph执行，其执行速度一般要比eager模式要快，当Graph包含很多小操作时差距更明显，可以比较下卷积和LSTM的性能差距：

import timeit
conv_layer = tf.keras.layers.Conv2D(100, 3)
@tf.function
def conv_fn(image):
  return conv_layer(image)
image = tf.zeros([1, 200, 200, 100])
# warm up
conv_layer(image); conv_fn(image)
print("Eager conv:", timeit.timeit(lambda: conv_layer(image), number=10))
print("Function conv:", timeit.timeit(lambda: conv_fn(image), number=10))
# 单纯的卷积差距不是很大
# Eager conv: 0.44013839924952197
# Function conv: 0.3700763391782858
lstm_cell = tf.keras.layers.LSTMCell(10)
@tf.function
def lstm_fn(input, state):
  return lstm_cell(input, state)
input = tf.zeros([10, 10])
state = [tf.zeros([10, 10])] * 2
# warm up
lstm_cell(input, state); lstm_fn(input, state)
print("eager lstm:", timeit.timeit(lambda: lstm_cell(input, state), number=10))
print("function lstm:", timeit.timeit(lambda: lstm_fn(input, state), number=10))
# 对于LSTM比较heavy的计算，Graph执行要快很多
# eager lstm: 0.025562446062237565
# function lstm: 0.0035498656569271647

要想灵活使用tf.function，必须深入理解它背后的机理，这里简单地谈一下。在TF 1.x时，首先要创建静态计算图，然后新建Session真正执行不同的运算：

import tensorflow as tf
x = tf.placeholder(tf.float32)
y = tf.square(x)
z = tf.add(x, y)
sess = tf.Session()
z0 = sess.run([z], feed_dict={x: 2.})        # 6.0
z1 = sess.run([z], feed_dict={x: 2., y: 2.}) # 4.0

尽管上面只定义了一个graph，但是两次不同的sess执行（运行时）其实是执行两个不同的程序或者说subgraph：

def compute_z0(x):
  return tf.add(x, tf.square(x))
def compute_z1(x, y):
  return tf.add(x,  y)

这里我们将两个不同的subgraph封装到了两个python函数中。更进一步地，我们可以不再需要Session，当执行这两个函数时，直接调用对应的计算图就可以，这就是tf.function的功效：

import tensorflow as tf
@tf.function
def compute_z1(x, y):
  return tf.add(x, y)
@tf.function
def compute_z0(x):
  return compute_z1(x, tf.square(x))
z0 = compute_z0(2.)
z1 = compute_z1(2., 2.)

可以说tf.function内部管理了一系列Graph，并控制了Graph的执行。另外一个问题时，虽然函数内部定义了一系列的操作，但是对于不同的输入，是需要不同的计算图。如函数的输入Tensor的shape或者dtype不同，那么计算图是不同的，好在tf.function支持这种多态性（polymorphism）

# Functions are polymorphic
@tf.function
def double(a):
  print("Tracing with", a)
  return a + a
print(double(tf.constant(1)))
print(double(tf.constant(1.1)))
print(double(tf.constant([1, 2])))
# Tracing with Tensor("a:0", shape=(), dtype=int32)
# tf.Tensor(2, shape=(), dtype=int32)
# Tracing with Tensor("a:0", shape=(), dtype=float32)
# tf.Tensor(2.2, shape=(), dtype=float32)
# Tracing with Tensor("a:0", shape=(2,), dtype=int32)
# tf.Tensor([2 4], shape=(2,), dtype=int32)

注意函数内部的打印，当输入tensor的shape或者类型发生变化，打印的东西也是相应改变。所以，它们的计算图（静态的）并不一样。tf.function这种多态特性其实是背后追踪了（tracing）不同的计算图。具体来说，被tf.function装饰的函数f接受一定的Tensors，并返回0到任意到Tensor，当装饰后的函数F被执行时：

根据输入Tensors的shape和dtypes确定一个"trace_cache_key"；
每个"trace_cache_key"映射了一个Graph，当新的"trace_cache_key"要建立时，f将构建一个新的Graph，若"trace_cache_key"已经存在，那么直需要从缓存中查找已有的Graph即可；
将输入Tensors喂进这个Graph，然后执行得到输出Tensors。

这种多态性是我们需要的，因为有时候我们希望输入不同shape或者dtype的Tensors，但是当"trace_cache_key"越来越多时，意味着你要cache了庞大的Graph，这点是要注意的。另外，tf.function提供了input_signature，这个参数采用tf.TensorSpec指定了输入到函数的Tensor的shape和dtypes，如下面的例子：

@tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])
def f(x):
    return tf.add(x, 1.)
print(f(tf.constant(1.0)))  # tf.Tensor(2.0, shape=(), dtype=float32)
print(f(tf.constant([1.0,]))) # tf.Tensor([2.], shape=(1,), dtype=float32)
print(f(tf.constant([1])))  # ValueError: Python inputs incompatible with input_signature

此时，输入Tensor的dtype必须是float32，但是shape不限制，当类型不匹配时会出错。

tf.function的另外一个参数是autograph，默认是True，意思是在构建Graph时将自动使用AutoGraph，这样你可以在函数内部使用Python原生的条件判断以及循环语句，因为它们会被tf.cond和tf.while_loop转化为Graph代码。注意的一点是判断分支和循环必须依赖于Tensors才会被转化，当autograph为False时，如果存在判断分支和循环必须依赖于Tensors的情况将会出错。如下面的例子：

def sum_even(items):
  s = 0
  for c in items:
    if c % 2 > 0:
      continue
    s += c
  return s
sum_even_autograph_on = tf.function(sum_even, autograph=True)
sum_even_autograph_off = tf.function(sum_even, autograph=False)
x = tf.constant([10, 12, 15, 20])
sum_even(x) # OK 
sum_even_autograph_on(x) # OK
sum_even_autograph_off(x) # TypeError: Tensor objects are only iterable when eager execution is enabled

很容易理解，应用tf.function之后是Graph模式，Tensors是不能被遍历的，但是采用AutoGraph可以将其转换为Graph代码，所以可以成功。大部分情况，我们还是默认开启autograph。

最要的是tf.function可以应用到类方法中，并且可以引用tf.Variable，可以看下面的例子：

class ScalarModel(object):
  def __init__(self):
    self.v = tf.Variable(0)
  @tf.function
  def increment(self, amount):
    self.v.assign_add(amount)
model1 = ScalarModel()
model1.increment(tf.constant(3))
assert int(model1.v) == 3
model1.increment(tf.constant(4))
assert int(model1.v) == 7
model2 = ScalarModel()  # model1和model2 拥有不同变量
model2.increment(tf.constant(5))
assert int(model2.v) == 5

后面会讲到，这个特性可以应用到tf.Keras的模型构建中。上面这个例子还有一点，就是可以在function中使用tf.assign这类具有副作用（改变Variable的值）的操作，这对于模型训练比较重要。

前面说过，python原生的print函数只会在构建Graph时打印一次Tensor句柄。如果想要打印Tensor的具体值，要使用tf.print：

@tf.function
def print_element(items):
    for c in items:
      tf.print(c)
x = tf.constant([1, 5, 6, 8, 3])
print_element(x)

这里就对tf.function做这些介绍，但是实际上其还有更多复杂的使用须知，详情可以参考TensorFlow 2.0: Functions, not Sessions。

参考

TensorFlow官网

终于来了！TensorFlow 2.0入门指南（上篇）

Eager执行

AutoGraph

性能优化：tf.function

参考

内容目录

选择主题