第一阶段-入门详细图文讲解tensorflow1.4 -(八)tf.estimator构建数据预处理bostonHouse

本次训练的目的:模型预测的标签是MEDV,即波士顿自住住宅的中值,以千美元计。

Feature Description
CRIM 人均犯罪率
ZN 住宅用地面积分为25,000平方英尺以上
INDUS 非零售业务的土地部分
NOX 一氧化氮浓度每1000万份中
RM 每间住宅的平均房间数
AGE 在1940年之前建成的自住住房的比例
DIS 距离波士顿地区就业中心
TAX 每10,000美元的房产税税率
PTRATIO 师生比例

Download the following data sets: boston_train.csv, boston_test.csv, boston_predict.csv.
百度网盘地址:https://pan.baidu.com/s/1nuFeq9N

step1,导入房屋数据

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import itertools

import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

使用tf.logging.INFO打印一些细节。

COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
           "dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
            "age", "dis", "tax", "ptratio"]
LABEL = "medv"

training_set = pd.read_csv("boston_train.csv", skipinitialspace=True,
                           skiprows=1, names=COLUMNS)
test_set = pd.read_csv("boston_test.csv", skipinitialspace=True,
                       skiprows=1, names=COLUMNS)
prediction_set = pd.read_csv("boston_predict.csv", skipinitialspace=True,
                             skiprows=1, names=COLUMNS)

设置一些COLUMNS 名,注意区分FEATURES ,LABEL 。使用panda的dataFrame读取CSV文件。

step2,定义特征列并创建一个回归算法

feature_cols = [tf.feature_column.numeric_column(k) for k in FEATURES]

注意:包含所有输入的特征。更多特征选择请看,the Doucement

regressor = tf.estimator.DNNRegressor(feature_columns=feature_cols,
                                      hidden_units=[10, 10],
                                      model_dir="/tmp/boston_model")

第一个深度神经网络回归,包含两个隐藏层分别是10个节点。

step3,构建一个输入的预处理函数

def get_input_fn(data_set, num_epochs=None, shuffle=True):
  return tf.estimator.inputs.pandas_input_fn(
      x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),
      y = pd.Series(data_set[LABEL].values),
      num_epochs=num_epochs,
      shuffle=shuffle)

上述是一个工程方法,用于将dataFrame输入到get_input_fn函数,return的结果直接给回归器使用。当然这个工程方法可以处理train ,test ,predict的数据。

step4,训练神经网络回归器

regressor.train(input_fn=get_input_fn(training_set), steps=5000)

每一百步输出损失值。

INFO:tensorflow:Step 1: loss = 483.179
INFO:tensorflow:Step 101: loss = 81.2072
INFO:tensorflow:Step 201: loss = 72.4354
...
INFO:tensorflow:Step 1801: loss = 33.4454
INFO:tensorflow:Step 1901: loss = 32.3397
INFO:tensorflow:Step 2001: loss = 32.0053
INFO:tensorflow:Step 4801: loss = 27.2791
INFO:tensorflow:Step 4901: loss = 27.2251
INFO:tensorflow:Saving checkpoints for 5000 into /tmp/boston_model/model.ckpt.
INFO:tensorflow:Loss for final step: 27.1674.

step5,评估模型

ev = regressor.evaluate(
    input_fn=get_input_fn(test_set, num_epochs=1, shuffle=False))
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))

获得结果:

INFO:tensorflow:Eval steps [0,1) for training step 5000.
INFO:tensorflow:Saving evaluation summary for 5000 step: loss = 11.9221
Loss: 11.922098

step6,做预测

y = regressor.predict(
    input_fn=get_input_fn(prediction_set, num_epochs=1, shuffle=False))
# .predict() returns an iterator of dicts; convert to a list and print
# predictions
predictions = list(p["predictions"] for p in itertools.islice(y, 6))
print("Predictions: {}".format(str(predictions)))

结果应该包含6个房子房价的中值的结果:

Predictions: [ 33.30348587  17.04452896  22.56370163  34.74345398  14.55953979
  19.58005714]

ImportError: No module named ‘pandas’
使用pip命令安装一下依赖

pip install --user  pandas 

合并上述代码:

# -*- coding: utf-8 -*-
"""
Created on Wed Dec 20 11:14:00 2017

@author: suncl
"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import itertools

import pandas as pd
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
           "dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
            "age", "dis", "tax", "ptratio"]
LABEL = "medv"


def input_fn(data_set):
  feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
  labels = tf.constant(data_set[LABEL].values)
  return feature_cols, labels


def main(unused_argv):
  # Load datasets
  training_set = pd.read_csv("boston_train.csv", skipinitialspace=True,
                             skiprows=1, names=COLUMNS)
  test_set = pd.read_csv("boston_test.csv", skipinitialspace=True,
                         skiprows=1, names=COLUMNS)

  # Set of 6 examples for which to predict median house values
  prediction_set = pd.read_csv("boston_predict.csv", skipinitialspace=True,
                               skiprows=1, names=COLUMNS)

  # Feature cols
  feature_cols = [tf.contrib.layers.real_valued_column(k)
                  for k in FEATURES]

  # Build 2 layer fully connected DNN with 10, 10 units respectively.
  regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols,
                                            hidden_units=[10, 10],
                                            model_dir="/tmp/boston_model")

  # Fit
  regressor.fit(input_fn=lambda: input_fn(training_set), steps=5000)

  # Score accuracy
  ev = regressor.evaluate(input_fn=lambda: input_fn(test_set), steps=1)
  loss_score = ev["loss"]
  print("Loss: {0:f}".format(loss_score))

  # Print out predictions
  y = regressor.predict(input_fn=lambda: input_fn(prediction_set))
  # .predict() returns an iterator; convert to a list and print predictions
  predictions = list(itertools.islice(y, 6))
  print("Predictions: {}".format(str(predictions)))

if __name__ == "__main__":
  tf.app.run()

结果:

INFO:tensorflow:Starting evaluation at 2017-12-20-03:21:21
INFO:tensorflow:Restoring parameters from /tmp/boston_model\model.ckpt-5000
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2017-12-20-03:21:21
INFO:tensorflow:Saving dict for global step 5000: global_step = 5000, loss = 14.032
Loss: 14.031997

Predictions: [34.09457, 19.621197, 22.47575, 35.776104, 15.688363, 19.804939]

本blog结束。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值