[入门6]建立输入功能与tf.estimator

最新推荐文章于 2024-07-18 18:43:19 发布

dongfeig54321

最新推荐文章于 2024-07-18 18:43:19 发布

阅读量108

点赞数

文章标签：人工智能 python

原文链接：http://www.cnblogs.com/yinghuali/p/7681917.html

版权

[入门6]建立输入功能与tf.estimator

本教程介绍在tf.estimator创建输入功能。你会得到如何构造的概况input_fn进行预处理和饲料数据到您的模型。然后，你会实现一个input_fn为食的培训，评估和预测数据到神经网络回归预测中位房屋的价值。

自定义输入管道与input_fn

的input_fn用于特征和目标数据传递给train， evaluate和predict方法Estimator。用户可以做的特征工程或预处理内input_fn。下面是取自一个例子tf.estimator快速入门教程：

import numpy as np

training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TRAINING, target_dtype=np.int, features_dtype=np.float32)

train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(training_set.data)},
    y=np.array(training_set.target),
    num_epochs=None,
    shuffle=True)

classifier.train(input_fn=train_input_fn, steps=2000)

一个input_fn剖析

下面的代码示出了用于输入功能的基本骨架：

def my_input_fn():

    # Preprocess your data here...

    # ...then return 1) a mapping of feature columns to Tensors with
    # the corresponding feature data, and 2) a Tensor containing labels
    return feature_cols, labels

输入函数体包含了预处理输入数据，如擦洗了坏榜样或特定的逻辑特征缩放。

输入功能必须返回一个包含最终特征和标签数据被馈送到模型（如在上面的代码骨架示出）下面的两个值：

feature_cols

含有键/值对地图特征的列名与dict Tensor S（或 SparseTensor 多个）含有相应的特征数据。

labels

一个 Tensor 包含您的标签（目标）值：模型旨在值来预测。

转换特征数据应用于张量

如果要素/标签数据是一个Python阵列或存储在大熊猫 dataframes或 numpy的阵列，可以使用下面的方法来构建input_fn：

import numpy as np
# numpy input_fn.
my_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(x_data)},
    y=np.array(y_data),
    ...)

import pandas as pd
# pandas input_fn.
my_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=pd.DataFrame({"x": x_data}),
    y=pd.Series(y_data),
    ...)

对于稀疏，分类数据（数据，其中大部分值都是0），你反而要填充 SparseTensor，这是实例化三个参数：

dense_shape

张量的形状。取指示在每一维的元素的数量的列表。例如，dense_shape=[3,6]指定了一个二维3×6张量，dense_shape=[2,3,4]指定一个三维张量2x3x4，并dense_shape=[9]指定与9个元素的一维张量。

indices

在张量中包含非零值的元素的索引。注意到术语的列表，其中每个术语是本身含有非零元素的索引的列表。（元素为零索引-即[0,0]是一种用于在二维张量在第一行的第一列中的元素的索引值。）例如，indices=[[1,3], [2,4]]指定与1的索引[元件， 3]和[2,4]具有非零值。

values

值的一维的张量。期限i在values对应到足月i的indices，并指定其值。例如，给定indices=[[1,3], [2,4]]，该参数values=[18, 3.6]指定了元素[1,3]的张量具有18的值，和元件[2,4]的张量的具有3.6的值。

下面的代码定义的二维SparseTensor用3行5列。具有索引[0,1]的元素具有值6，并且与索引元素[2,4]具有0.5的值（所有其他值是0）：

sparse_tensor = tf.SparseTensor(indices=[[0,1], [2,4]],
                                values=[6, 0.5],
                                dense_shape=[3, 5])

这相当于下面的密集张量：

[[0, 6, 0, 0, 0]
 [0, 0, 0, 0, 0]
 [0, 0, 0, 0, 0.5]]

欲了解更多有关SparseTensor，请参阅tf.SparseTensor。

传递input_fn数据对模型

为了养活数据对模型进行训练，你只需通过你创建你的输入功能train作为价值操作input_fn 参数，例如

classifier.train(input_fn=my_input_fn, steps=2000)

请注意，input_fn参数必须接受一个函数对象（即 input_fn=my_input_fn），而不是一个函数调用的返回值（input_fn=my_input_fn()）。这意味着，如果你试图将参数传递给 input_fn你的train调用，如下面的代码，这将导致 TypeError：

classifier.train(input_fn=my_input_fn(training_set), steps=2000)

但是，如果你想成为能够参数的输入功能，还有其他的方法来这样做。您可以使用一个包装函数，它没有参数为你input_fn，并用它与所需的参数来调用你的输入功能。例如：

def my_input_fn(data_set):
  ...

def my_input_fn_training_set():
  return my_input_fn(training_set)

classifier.train(input_fn=my_input_fn_training_set, steps=2000)

另外，您也可以使用Python的functools.partial 功能来构建一个新的函数对象具有固定的所有参数值：

classifier.train(
    input_fn=functools.partial(my_input_fn, data_set=training_set),
    steps=2000)

第三种选择是来包装你input_fn调用一个 lambda 并把它传递给input_fn参数：

classifier.train(input_fn=lambda: my_input_fn(training_set), steps=2000)

如上图所示，接受数据的参数设计你的输入管道的一分大优势设置是，你可以在同一传递input_fn到evaluate 和predict通过只是改变了数据集参数，如操作：

classifier.evaluate(input_fn=lambda: my_input_fn(test_set), steps=2000)

这种方法提高了代码的可维护性：无需定义多个 input_fn（例如input_fn_train，input_fn_test，input_fn_predict）对于每个类型的操作。

最后，你可以使用的方法tf.estimator.inputs来创建input_fn 从numpy的大熊猫或数据集。额外的好处是，你可以使用更多的参数，比如num_epochs和shuffle控制如何input_fn 在数据迭代：

import pandas as pd

def get_input_fn_from_pandas(data_set, num_epochs=None, shuffle=True):
  return tf.estimator.inputs.pandas_input_fn(
      x=pdDataFrame(...),
      y=pd.Series(...),
      num_epochs=num_epochs,
      shuffle=shuffle)

import numpy as np

def get_input_fn_from_numpy(data_set, num_epochs=None, shuffle=True):
  return tf.estimator.inputs.numpy_input_fn(
      x={...},
      y=np.array(...),
      num_epochs=num_epochs,
      shuffle=shuffle)

神经网络模型的波士顿房屋的价值

在本教程的其余部分，你会写输入功能用于预处理从拉波士顿住房数据的一个子集UCI房屋数据集，并用它来数据馈送到神经网络回归预测房屋中位值。

在波士顿CSV数据集，你会用它来训练你的神经网络包含以下特征数据为波士顿郊区：

特征	描述
犯罪	人均犯罪率
ZN	住宅用地部分划为允许25,000平方呎地段
INDUS	土地分数就是非零售业务
NOX	在每10万分之一氮氧化物浓度
RM	每个住宅平均客房
年龄	自用住宅部分1940年以前建造
DIS	距离波士顿地区就业服务中心
税	物业税税率每$ 10,000个
PTRATIO	师生比例

和你的模型将预测的标签是MEDV，业主自用住宅的数千美元的中间值。

建立

下载以下数据集： boston_train.csv， boston_test.csv和 boston_predict.csv。

以下部分提供了如何创建输入功能的一步一步的演练，养活这些数据集到神经网络回归，培训和评估模型，并进行房屋价值的预测。完整的，最终的代码可以在这里找到。

导入房屋数据

首先，设置您的进口量（包括pandas和tensorflow），并设置日志详细程度，以 INFO进行更详细的日志输出：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import itertools

import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

定义的数据集的列名COLUMNS。从标签区分功能之外，还定义FEATURES和LABEL。然后读三米的CSV（，tf.train， tf.test和预测）为大熊猫 DataFrame S：

COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
           "dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
            "age", "dis", "tax", "ptratio"]
LABEL = "medv"

training_set = pd.read_csv("boston_train.csv", skipinitialspace=True,
                           skiprows=1, names=COLUMNS)
test_set = pd.read_csv("boston_test.csv", skipinitialspace=True,
                       skiprows=1, names=COLUMNS)
prediction_set = pd.read_csv("boston_predict.csv", skipinitialspace=True,
                             skiprows=1, names=COLUMNS)

定义FeatureColumns和创建回归

接下来，创建列表FeatureColumn输入数据，从而正式指定的功能集，用于训练秒。因为在房屋数据集合中的所有功能，包括连续的值，你可以创建自己的 FeatureColumn使用S tf.contrib.layers.real_valued_column()函数：

feature_cols = [tf.feature_column.numeric_column(k) for k in FEATURES]

注：对于特征列的更深入的概述，参见此介绍，以及用于示出如何定义的示例FeatureColumns分类数据，请参阅线性模型教程。

现在，实例DNNRegressor为神经网络回归模型。你需要在这里提供两个参数：hidden_units，指定每个隐藏层节点的数量超参数（在这里，每10个节点的两个隐藏层），以及feature_columns包含列表FeatureColumns刚刚定义：

regressor = tf.estimator.DNNRegressor(feature_columns=feature_cols,
                                      hidden_units=[10, 10],
                                      model_dir="/tmp/boston_model")

构建input_fn

通过将数据输入到regressor，写一个接受一个工厂方法大熊猫 Dataframe并返回一个input_fn：

def get_input_fn(data_set, num_epochs=None, shuffle=True):
  return tf.estimator.inputs.pandas_input_fn(
      x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),
      y = pd.Series(data_set[LABEL].values),
      num_epochs=num_epochs,
      shuffle=shuffle)

需要注意的是输入数据传入input_fn的data_set参数，它表示该功能可以处理任何的DataFrame你已经导入S：， training_set，test_set和prediction_set。

提供了两个附加参数：：控制历元的数量来迭代数据。对于训练，将其设置为，所以保持，直至达到所要求的列车步数返回数据。对于评价和预测，将其设置为1，所以会遍历一次数据，然后提高。该错误将传达出停止评估或预测。：是否要洗牌的数据。用于评估和预测，此设定为，使在数据顺序地进行迭代。对于列车，将其设置为。 num_epochsNone
input_fninput_fnOutOfRangeErrorEstimator shuffleFalseinput_fnTrue

训练回归

训练神经网络回归，运行train与training_set 传递到input_fn如下：

regressor.train(input_fn=get_input_fn(training_set), steps=5000)

您应该看到类似日志输出到下面，其报告为每100步训练损失：

INFO:tensorflow:Step 1: loss = 483.179
INFO:tensorflow:Step 101: loss = 81.2072
INFO:tensorflow:Step 201: loss = 72.4354
...
INFO:tensorflow:Step 1801: loss = 33.4454
INFO:tensorflow:Step 1901: loss = 32.3397
INFO:tensorflow:Step 2001: loss = 32.0053
INFO:tensorflow:Step 4801: loss = 27.2791
INFO:tensorflow:Step 4901: loss = 27.2251
INFO:tensorflow:Saving checkpoints for 5000 into /tmp/boston_model/model.ckpt.
INFO:tensorflow:Loss for final step: 27.1674.

进行预测

最后，你可以用模型来预测平均房屋价值的 prediction_set，它包含的特征数据，但对于六个例子没有标签

y = regressor.predict(
    input_fn=get_input_fn(prediction_set, num_epochs=1, shuffle=False))
# .predict() returns an iterator of dicts; convert to a list and print
# predictions
predictions = list(p["predictions"] for p in itertools.islice(y, 6))
print("Predictions: {}".format(str(predictions)))

您的结果应包含在上千美元，如六宫值预测：

Predictions: [ 33.30348587  17.04452896  22.56370163  34.74345398  14.55953979
  19.58005714]

完整代码如下：

#  Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
#  Licensed under the Apache License, Version 2.0 (the "License");
#  you may not use this file except in compliance with the License.
#  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.
"""DNNRegressor with custom input_fn for Housing dataset."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import itertools

import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
           "dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
            "age", "dis", "tax", "ptratio"]
LABEL = "medv"


def get_input_fn(data_set, num_epochs=None, shuffle=True):
  return tf.estimator.inputs.pandas_input_fn(
      x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),
      y=pd.Series(data_set[LABEL].values),
      num_epochs=num_epochs,
      shuffle=shuffle)


def main(unused_argv):
  # Load datasets
  training_set = pd.read_csv("boston_train.csv", skipinitialspace=True,
                             skiprows=1, names=COLUMNS)
  test_set = pd.read_csv("boston_test.csv", skipinitialspace=True,
                         skiprows=1, names=COLUMNS)

  # Set of 6 examples for which to predict median house values
  prediction_set = pd.read_csv("boston_predict.csv", skipinitialspace=True,
                               skiprows=1, names=COLUMNS)

  # Feature cols
  feature_cols = [tf.feature_column.numeric_column(k) for k in FEATURES]

  # Build 2 layer fully connected DNN with 10, 10 units respectively.
  regressor = tf.estimator.DNNRegressor(feature_columns=feature_cols,
                                        hidden_units=[10, 10],
                                        model_dir="boston_model")

  # Train
  regressor.train(input_fn=get_input_fn(training_set), steps=5000)

  # Evaluate loss over one epoch of test_set.
  ev = regressor.evaluate(
      input_fn=get_input_fn(test_set, num_epochs=1, shuffle=False))
  loss_score = ev["loss"]
  print("Loss: {0:f}".format(loss_score))

  # Print out predictions over a slice of prediction_set.
  y = regressor.predict(
      input_fn=get_input_fn(prediction_set, num_epochs=1, shuffle=False))
  # .predict() returns an iterator of dicts; convert to a list and print
  # predictions
  predictions = list(p["predictions"] for p in itertools.islice(y, 6))
  print("Predictions: {}".format(str(predictions)))

if __name__ == "__main__":
  tf.app.run()

转载于:https://www.cnblogs.com/yinghuali/p/7681917.html

dongfeig54321

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[入门6]建立输入功能与tf.estimator

[入门6]建立输入功能与tf.estimator目录自定义输入管道与input_fn一个input_fn剖析转换特征数据应用于张量传递input_fn数据对模型神经网络模型的波士顿房屋的价值建立导入房屋数据定义FeatureColumns和创建回归构建input_fn训练回归评估模型进行预测本教程...
复制链接

扫一扫