@zqbinggong 2018-05-29T05:44:41.000000Z 字数 4951 阅读 1148

tf.Learn

《tensorflow实战》

分布式Estimator

自定义模型介绍

包含各种各样的机器学习和深度学习的类
接受自定义模型，目前接受以下几种不同的函数签名：
- （features, targert）--> (predictions, loss, train_op)
- （features, targert, mode）--> (predictions, loss, train_op)
- （features, targert, mode, params）--> (predictions, loss, train_op)
- - 其中mode可以被用来定义函数的使用阶段，如training、evaluating以及prediction，这些常用的模式可以在ModeKeys里找到
- - params是可以由自定义模型来调节的参数，使用fit函数时可以给更多的参数

import tensorflow as tf
from tensorflow.contrib import layers
from tensorflow.contrib import learn
from sklearn import datasets, cross_validation
def my_model(features, target):
    target = tf.one_hot(target, 3, 1, 0)
    # 堆叠神经网络，，每一层分别有10,20,10个隐藏节点
    features = layers.stack(features, layers.fully_connected, [10,20,10])
    prediction, loss = learn.models.logistic_regression_zero_init(features, target)
    train_op = layers.optimize_loss(
        loss, tf.contrib.framework.get_global_step(), optimizer='Adagrad', learning_rate=0.1)
    return {'class': tf.argmax(prediction, 1), 'prop': prediction}, loss, train_op
iris = datasets.load_iris()
x_train, x_test, y_train, y_test = cross_validation.train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=35)
classifier = learn.Estimator(model_fn=my_model)
classifier.fit(x_train, y_train, steps=700)
predictions = classifier.predict(x_test)

建立自己的机器学习Estimator

learn.Estimate类（继承了BaseEstimator）

参数:
- model_fn
- model_dir
- config : configuration object
- params : will be passed into 'model_fn'
- feature_engineering_fn
基本函数：
- _get_train_op(features, labels): 被用来在每个训练迭代时对模型的参数进行优化，如果想要实现自己的Estimator，可以修改复写这个函数来实现自己的逻辑
- _get_eval_ops(features, labels, metrics): 让BaseEstimator的子类来使用自定义的metrics评估每个模型训练的迭代，在contrib.metrics可以找到许多直接使用的metrics
- _get_predict_ops(features), 用来实现自定义的预测，在这个函数里可以对预测值进行进一步的处理，比如把概率转化成预测结果
TensorForestEstimator

调节RunConfig运行时参数

RunConfig时Learn的一个类，用来帮助用户调节运行时参数，例如：
- num_cores : 选择使用核的数量
- num_ps_replicas : 服务器的数量

Experiment 和 LearnRunner

前者是一个简单易用的建立模型实验的类，包含了建模所需要的所有信息，例如Estimator，训练数据，评估指标等
后者是用来方便做实验的一个模块

深度学习Estimator

深度神经网络

以DNNClassifier为例：

# 此处在fit和evaluate中还有很多其他参数，可以实现更多的自定义逻辑
# 先在_input_fn里建立数据，使用layers模块建立两个特征列--年龄和性别
def _input_fn(num_epochs=None):
    features = {'age': tf.train.limit_epochs(tf.constant([[.8],[.2],[.1]]), num_epochs=num_epochs),
                # 返回tensor num_epochs次， 并raise on 'OutOfRange' error
                'language': tf.SparseTensor(values=['en', 'fr', 'zh'],
                                            indices=[[0, 0], [0, 1], [2, 0]], dense_shape=[3, 2])
                # dense.shape = dense_shape, dense[tuple(indices[i])] = values[i]
            }
    return features, tf.constant([[1], [0], [0]], dtype=tf.int32) # 特征和label
language_column = tf.contrib.layers.sparse_column_with_hash_bucket(
    'language', hash_bucket_size=20)
feature_columns = [
    tf.contrib.layers.embedding_column(language_column, dimension=1),
    tf.contrib.layers.real_valued_column('age')
]
# 将特征列、每层的隐藏神经元数、标识类别数等传入DNNClassifier里建立模型
classifier = tf.contrib.learn.DNNClassifier(
    n_classes=2,
    feature_columns=feature_columns,# 注意这里feature_columns相当于placeholder，它需要fit()接收的参数input_fn的返回值作为feed
    # weight_column_name=  , 考虑数据带有权重，这时需要将权重也加入到feature_columns
    hidden_units=[3, 3],
    config=tf.contrib.learn.RunConfig(tf_random_seed=1)
)
classifier.fit(input_fn=_input_fn, steps=100)
scores = classifier.evaluate(input_fn=_input_fn, steps=1)

广度深度模型

深度神经网络和logistic regression的结合，谷歌研究发现，将不同的特征通过两种不同的方式结合在一起，更能体现应用的意义以及更有效的推荐结果，这类似于Ensemble
与DNNClassifier和LinearClassifier相比有更多的参数可以受用，并且可以将不同特征列选择使用到DNNClassifier或者LinearClassifier中，比如讲上述的年龄交给DNN，语言交给Linear

机器学习Estimator

DataFrame

监督器Monitors

提供各种logging及监督控制训练的过程，以便让用户清楚的知道模型是否在进行有效的训练
tf有5个等级的log，按严重性从小到达为：debug, info, warn, error, fatal，通过tf.logging.set_verbosity(tf.logging.INFO)将等级改为INFO
高阶Monitor类：
- 用CaptureVariable将一个指定的变量的值存储到一个Collection里
- 用PrintTensor打印Tensor的值
- 用SummarySaver存储Summary所需的protocol buffer
- 用ValidationMonitor在训练时打印多个评估Metrics，以及监督模型的训练以便停止训练防止模型的过拟合
事实上，可以通过tensotboard可视化生成的log和checkpoint文件
以ValidationMonitor为例：

# load the data
iris_trian = tf.contrib.learn.datasets.base.load_csv(filename='', target_dtype=np.int)
iris_test = tf.contrib.learn.datasets.base.load_csv(filename='', target_dtype=np.int)
# define a metrics dict to evaluate the model
validation_metrics = {'accuracy':tf.contrib.metrics.streaming_accuracy,
                      'precision': tf.contrib.metrics.streaming_precision,
                      'recall': tf.contrib.metrics.streaming_recall}
# use the metrics-dict to construct the validation_monitor
validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
    iris_test.data,
    iris_test.target, # offer the data and target to estimate the model
    every_n_steps=50, # run this monitor every 50 steps
    metrics=validation_metrics,
    early_stopping_metric='loss', # early stopping depending on the 'loss' metric
    early_stopping_metric_minimize=True, # if True, we should minimize the early_stopping_metric
    early_stopping_rounds=200
)
# next, we construct a DNNClassifier,
# which has 3 layers and the number of hidden units of each layer are 10,15,10
# note that there we can assign multiple monitors to monitor different functions
classifier = tf.contrib.learn.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[10, 15, 10],
    n_classes=3,
    model_dir='',
    config=tf.contrib.learn.RunConfig(save_checkpoints_secs=2)
)
classifier.fit(x=iris_trian.data, y=iris_train.target, steps=10000, monitors=[validation_monitor])
accuracy_score = classifier.evaluate(x=iris_test.data, y=iris_test.target)['accuracy'] 
# corresponding to validation_metrics dict.

tf.Learn

分布式Estimator

自定义模型介绍

建立自己的机器学习Estimator

learn.Estimate类（继承了BaseEstimator）

调节RunConfig运行时参数

Experiment 和 LearnRunner

深度学习Estimator

深度神经网络

广度深度模型

机器学习Estimator

DataFrame

监督器Monitors

内容目录