2.3-tensoflow2-基础教程-Estimator

最新推荐文章于 2021-08-10 17:26:55 发布

HJZ11

最新推荐文章于 2021-08-10 17:26:55 发布

阅读量558

点赞数

分类专栏： # 深度学习3-Tensorflow

本文链接：https://blog.csdn.net/HJZ11/article/details/108715631

版权

深度学习3-Tensorflow 专栏收录该内容

33 篇文章 1 订阅

订阅专栏

文章目录

1.预创建的Estimator

Estimator 是 Tensorflow 完整模型的高级表示，它被设计用于轻松扩展和异步训练。
在 Tensorflow 2.0 中，Keras API 可以完成许多相同的任务，而且被认为是一个更易学习的API。
Tensorflow提供了一组tf.estimator(例如，LinearRegressor)来实现常见的机器学习算法。

为了编写基于预创建的 Estimator 的 Tensorflow 项目，您必须完成以下工作：

创建一个或多个输入函数
定义模型的特征列
实例化一个 Estimator，指定特征列和各种超参数。
在 Estimator 对象上调用一个或多个方法，传递合适的输入函数以作为数据源。

def input_fn(features, labels, training=True, batch_size=256):
    """An input function for training or evaluating"""
    # 将输入转换为数据集。
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # 如果在训练模式下混淆并重复数据。
    if training:
        dataset = dataset.shuffle(1000).repeat()
    
    return dataset.batch(batch_size)

# 特征列描述了如何使用输入。
my_feature_columns = []
for key in train.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

实例化 Estimator
鸢尾花为题是一个经典的分类问题。幸运的是，Tensorflow 提供了几个预创建的 Estimator 分类器，其中包括：

 1. tf.estimator.DNNClassifier 用于多类别分类的深度模型
 2. tf.estimator.DNNLinearCombinedClassifier 用于广度与深度模型
 3. tf.estimator.LinearClassifier 用于基于线性模型的分类器

对于鸢尾花问题，tf.estimator.DNNClassifier 似乎是最好的选择。您可以这样实例化该 Estimator：

# 构建一个拥有两个隐层，隐藏节点分别为 30 和 10 的深度神经网络。
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # 隐层所含结点数量分别为 30 和 10.
    hidden_units=[30, 10],
    # 模型必须从三个类别中做出选择。
    n_classes=3)

# 训练模型。
classifier.train(
    input_fn=lambda: input_fn(train, train_y, training=True),
    steps=5000)

评估经过训练的模型
现在模型已经经过训练，您可以获取一些关于模型性能的统计信息。代码块将在测试数据上对经过训练的模型的准确率（accuracy）进行评估：

eval_result = classifier.evaluate(
    input_fn=lambda: input_fn(test, test_y, training=False))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

利用经过训练的模型进行预测（推理）
我们已经有一个经过训练的模型，可以生成准确的评估结果。我们现在可以使用经过训练的模型，根据一些无标签测量结果预测鸢尾花的品种。与训练和评估一样，我们使用单个函数调用进行预测：

# 由模型生成预测
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

def input_fn(features, batch_size=256):
    """An input function for prediction."""
    # 将输入转换为无标签数据集。
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

predictions = classifier.predict(
    input_fn=lambda: input_fn(predict_x))

predict 方法返回一个 Python 可迭代对象，为每个样本生成一个预测结果字典。以下代码输出了一些预测及其概率：

for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print('Prediction is "{}" ({:.1f}%), expected "{}"'.format(
        SPECIES[class_id], 100 * probability, expec))

2.线性模型

Build a linear model with Estimators
泰坦尼克号数据集
完整代码解析

linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
linear_est.train(train_input_fn)
result = linear_est.evaluate(eval_input_fn)

clear_output()
print(result)

age_x_gender = tf.feature_column.crossed_column(['age', 'sex'], hash_bucket_size=100)

#After adding the combination feature to the model, let's train the model again:

derived_feature_columns = [age_x_gender]
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns+derived_feature_columns)
linear_est.train(train_input_fn)
result = linear_est.evaluate(eval_input_fn)

clear_output()
print(result)

3.提升树

在 Tensorflow 中训练提升树（Boosted Trees）模型

使用基于 tf.estimator API的决策树来训练梯度提升模型的端到端演示

泰坦尼克数据集
在这里插入图片描述
梯度提升（Gradient Boosting） Estimator 可以利用数值和分类特征。

#在训练提升树（Boosted Trees）模型之前，让我们先训练一个线性分类器（逻辑回归模型）。最好的做法是从更简单的模型开始建立基准。
linear_est = tf.estimator.LinearClassifier(feature_columns)

# 训练模型。
linear_est.train(train_input_fn, max_steps=100)

# 评估。
result = linear_est.evaluate(eval_input_fn)
clear_output()
print(pd.Series(result))

#下面让我们训练提升树（Boosted Trees）模型。提升树（Boosted Trees）是支持回归（BoostedTreesRegressor）和分类（BoostedTreesClassifier）的。由于目标是预测一个生存与否的标签，您将使用 BoostedTreesClassifier。

# 由于数据存入内存中，在每层使用全部数据会更快。
# 上面一个 batch 定义为整个数据集。
n_batches = 1
est = tf.estimator.BoostedTreesClassifier(feature_columns,
                                          n_batches_per_layer=n_batches)

# 一旦建立了指定数量的树，模型将停止训练，
# 而不是基于训练步数。
est.train(train_input_fn, max_steps=100)

# 评估。
result = est.evaluate(eval_input_fn)
clear_output()
print(pd.Series(result))

4.提升树模型理解

梯度提升树（Gradient Boosted Trees）：模型理解

局部可解释性（Local interpretability）

将输出定向特征贡献（DFCs）来解释单个预测
pred_dicts = list(est.experimental_predict_with_explanations(eval_input_fn))

在这里插入图片描述

全局特征重要性（Global feature importances）

此外，您或许想了解模型这个整体而不是单个预测。接下来，您将计算并使用：

通过 est.experimental_feature_importances 得到基于增益的特征重要性（Gain-based feature importances）
排列特征重要性（Permutation feature importances）
使用 est.experimental_predict_with_explanations 得到总 DFCs。

1.基于增益的特征重要性（Gain-based feature importances）
TensorFlow 的提升树估算器（estimator）内置了函数 est.experimental_feature_importances 用于计算基于增益的特征重要性。

importances = est.experimental_feature_importances(normalize=True)
df_imp = pd.Series(importances)

# 可视化重要性。
N = 8
ax = (df_imp.iloc[0:N][::-1]
    .plot(kind='barh',
          color=sns_colors[0],
          title='Gain feature importances',
          figsize=(10, 6)))
ax.grid(False, axis='y')

在这里插入图片描述

平均绝对 DFCs
您还可以得到绝对DFCs的平均值来从全局的角度分析影响。

# 绘图。
dfc_mean = df_dfc.abs().mean()
N = 8
sorted_ix = dfc_mean.abs().sort_values()[-N:].index  # 求平均并按绝对值排序。
ax = dfc_mean[sorted_ix].plot(kind='barh',
                       color=sns_colors[1],
                       title='Mean |directional feature contributions|',
                       figsize=(10, 6))
ax.grid(False, axis='y')

在这里插入图片描述

您可以看到 DFCs 如何随特征的值变化而变化。

FEATURE = 'fare'
feature = pd.Series(df_dfc[FEATURE].values, index=dfeval[FEATURE].values).sort_index()
ax = sns.regplot(feature.index.values, feature.values, lowess=True)
ax.set_ylabel('contribution')
ax.set_xlabel(FEATURE)
ax.set_xlim(0, 100)
plt.show()

在这里插入图片描述

2.排列特征重要性（Permutation feature importances）
def permutation_importances(est, X_eval, y_eval, metric, features):
    """
    分别对每列，打散列中的值并观察其对评估集的影响。
    
    在训练过程中，有一种类似的方法，请参阅文章（来源：http://explained.ai/rf-importance/index.html）
    中有关 “Drop-column importance” 的部分。
    """
    baseline = metric(est, X_eval, y_eval)
    imp = []
    for col in features:
        save = X_eval[col].copy()
        X_eval[col] = np.random.permutation(X_eval[col])
        m = metric(est, X_eval, y_eval)
        X_eval[col] = save
        imp.append(baseline - m)
    return np.array(imp)

def accuracy_metric(est, X, y):
    """TensorFlow 估算器精度"""
    eval_input_fn = make_input_fn(X,
                                  y=y,
                                  shuffle=False,
                                  n_epochs=1)
    return est.evaluate(input_fn=eval_input_fn)['accuracy']
features = CATEGORICAL_COLUMNS + NUMERIC_COLUMNS
importances = permutation_importances(est, dfeval, y_eval, accuracy_metric,
                                      features)
df_imp = pd.Series(importances, index=features)

sorted_ix = df_imp.abs().sort_values().index
ax = df_imp[sorted_ix][-5:].plot(kind='barh', color=sns_colors[2], figsize=(10, 6))
ax.grid(False, axis='y')
ax.set_title('Permutation feature importance')
plt.show()

在这里插入图片描述

官网动图不错！这里放不上

5.从Keras model到Estimator model

TensorFlow Estimators are fully supported in TensorFlow, and can be created from new and existing tf.keras models.

#To build a simple, fully-connected network (i.e. multi-layer perceptron):
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(4,)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(3)
])

#Compile the model and get a summary.

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              optimizer='adam')
model.summary()

#Create an input function
def input_fn():
  split = tfds.Split.TRAIN
  dataset = tfds.load('iris', split=split, as_supervised=True)
  dataset = dataset.map(lambda features, labels: ({'dense_input':features}, labels))
  dataset = dataset.batch(32).repeat()
  return dataset

#Test out your input_fn
for features_batch, labels_batch in input_fn().take(1):
  print(features_batch)
  print(labels_batch)

#Create an Estimator from the tf.keras model.
#A tf.keras.Model can be trained with the tf.estimator API by converting the model to an tf.estimator.Estimator object with tf.keras.estimator.model_to_estimator.
import tempfile
model_dir = tempfile.mkdtemp()
keras_estimator = tf.keras.estimator.model_to_estimator(
    keras_model=model, model_dir=model_dir)

#Train and evaluate the estimator.
keras_estimator.train(input_fn=input_fn, steps=500)
eval_result = keras_estimator.evaluate(input_fn=input_fn, steps=10)
print('Eval result: {}'.format(eval_result))

HJZ11

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
2.3-tensoflow2-基础教程-Estimator

文章目录1.预创建的Estimator2.线性模型3.提升树4.提升树模型理解5.从Keras model到Estimator model1.预创建的EstimatorEstimator 是 Tensorflow 完整模型的高级表示，它被设计用于轻松扩展和异步训练。在 Tensorflow 2.0 中，Keras API 可以完成许多相同的任务，而且被认为是一个更易学习的API。Tensorflow提供了一组tf.estimator(例如，LinearRegressor)来实现常见的机器学习算法。
复制链接

扫一扫

专栏目录