本次训练的目的:模型预测的标签是MEDV,即波士顿自住住宅的中值,以千美元计。
Feature | Description |
CRIM | 人均犯罪率 |
ZN | 住宅用地面积分为25,000平方英尺以上 |
INDUS | 非零售业务的土地部分 |
NOX | 一氧化氮浓度每1000万份中 |
RM | 每间住宅的平均房间数 |
AGE | 在1940年之前建成的自住住房的比例 |
DIS | 距离波士顿地区就业中心 |
TAX | 每10,000美元的房产税税率 |
PTRATIO | 师生比例 |
Download the following data sets: boston_train.csv, boston_test.csv, boston_predict.csv.
百度网盘地址:https://pan.baidu.com/s/1nuFeq9N
step1,导入房屋数据
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import itertools
import pandas as pd
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.INFO)
使用tf.logging.INFO打印一些细节。
COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
"dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
"age", "dis", "tax", "ptratio"]
LABEL = "medv"
training_set = pd.read_csv("boston_train.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
test_set = pd.read_csv("boston_test.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
prediction_set = pd.read_csv("boston_predict.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
设置一些COLUMNS 名,注意区分FEATURES ,LABEL 。使用panda的dataFrame读取CSV文件。
step2,定义特征列并创建一个回归算法
feature_cols = [tf.feature_column.numeric_column(k) for k in FEATURES]
注意:包含所有输入的特征。更多特征选择请看,the Doucement
regressor = tf.estimator.DNNRegressor(feature_columns=feature_cols,
hidden_units=[10, 10],
model_dir="/tmp/boston_model")
第一个深度神经网络回归,包含两个隐藏层分别是10个节点。
step3,构建一个输入的预处理函数
def get_input_fn(data_set, num_epochs=None, shuffle=True):
return tf.estimator.inputs.pandas_input_fn(
x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),
y = pd.Series(data_set[LABEL].values),
num_epochs=num_epochs,
shuffle=shuffle)
上述是一个工程方法,用于将dataFrame输入到get_input_fn函数,return的结果直接给回归器使用。当然这个工程方法可以处理train ,test ,predict的数据。
step4,训练神经网络回归器
regressor.train(input_fn=get_input_fn(training_set), steps=5000)
每一百步输出损失值。
INFO:tensorflow:Step 1: loss = 483.179
INFO:tensorflow:Step 101: loss = 81.2072
INFO:tensorflow:Step 201: loss = 72.4354
...
INFO:tensorflow:Step 1801: loss = 33.4454
INFO:tensorflow:Step 1901: loss = 32.3397
INFO:tensorflow:Step 2001: loss = 32.0053
INFO:tensorflow:Step 4801: loss = 27.2791
INFO:tensorflow:Step 4901: loss = 27.2251
INFO:tensorflow:Saving checkpoints for 5000 into /tmp/boston_model/model.ckpt.
INFO:tensorflow:Loss for final step: 27.1674.
step5,评估模型
ev = regressor.evaluate(
input_fn=get_input_fn(test_set, num_epochs=1, shuffle=False))
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))
获得结果:
INFO:tensorflow:Eval steps [0,1) for training step 5000.
INFO:tensorflow:Saving evaluation summary for 5000 step: loss = 11.9221
Loss: 11.922098
step6,做预测
y = regressor.predict(
input_fn=get_input_fn(prediction_set, num_epochs=1, shuffle=False))
# .predict() returns an iterator of dicts; convert to a list and print
# predictions
predictions = list(p["predictions"] for p in itertools.islice(y, 6))
print("Predictions: {}".format(str(predictions)))
结果应该包含6个房子房价的中值的结果:
Predictions: [ 33.30348587 17.04452896 22.56370163 34.74345398 14.55953979
19.58005714]
ImportError: No module named ‘pandas’
使用pip命令安装一下依赖
pip install --user pandas
合并上述代码:
# -*- coding: utf-8 -*-
"""
Created on Wed Dec 20 11:14:00 2017
@author: suncl
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import itertools
import pandas as pd
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.INFO)
COLUMNS = ["crim", "zn", "indus", "nox", "rm", "age",
"dis", "tax", "ptratio", "medv"]
FEATURES = ["crim", "zn", "indus", "nox", "rm",
"age", "dis", "tax", "ptratio"]
LABEL = "medv"
def input_fn(data_set):
feature_cols = {k: tf.constant(data_set[k].values) for k in FEATURES}
labels = tf.constant(data_set[LABEL].values)
return feature_cols, labels
def main(unused_argv):
# Load datasets
training_set = pd.read_csv("boston_train.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
test_set = pd.read_csv("boston_test.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
# Set of 6 examples for which to predict median house values
prediction_set = pd.read_csv("boston_predict.csv", skipinitialspace=True,
skiprows=1, names=COLUMNS)
# Feature cols
feature_cols = [tf.contrib.layers.real_valued_column(k)
for k in FEATURES]
# Build 2 layer fully connected DNN with 10, 10 units respectively.
regressor = tf.contrib.learn.DNNRegressor(feature_columns=feature_cols,
hidden_units=[10, 10],
model_dir="/tmp/boston_model")
# Fit
regressor.fit(input_fn=lambda: input_fn(training_set), steps=5000)
# Score accuracy
ev = regressor.evaluate(input_fn=lambda: input_fn(test_set), steps=1)
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))
# Print out predictions
y = regressor.predict(input_fn=lambda: input_fn(prediction_set))
# .predict() returns an iterator; convert to a list and print predictions
predictions = list(itertools.islice(y, 6))
print("Predictions: {}".format(str(predictions)))
if __name__ == "__main__":
tf.app.run()
结果:
INFO:tensorflow:Starting evaluation at 2017-12-20-03:21:21
INFO:tensorflow:Restoring parameters from /tmp/boston_model\model.ckpt-5000
INFO:tensorflow:Evaluation [1/1]
INFO:tensorflow:Finished evaluation at 2017-12-20-03:21:21
INFO:tensorflow:Saving dict for global step 5000: global_step = 5000, loss = 14.032
Loss: 14.031997
Predictions: [34.09457, 19.621197, 22.47575, 35.776104, 15.688363, 19.804939]
本blog结束。