摘要: tensorflow estimator 的示例代码学习
程序有 premade_estimator.py 和 iris_data.py
iris_data.py 读取 training data 和 test data 以及定义 estimator 用的数据格式
-----------------------------------------------------------------------------------------------------------
1. iris_data.py 程序修改
iris_data 远程下载训练集和测试集。
http://download.tensorflow.org/data/iris_training.csv
http://download.tensorflow.org/data/iris_test.csv
但是实际测试无法使用。这里有这两个文件:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_training.csv
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_test.csv
下载后保存为 xlsx,并修改 iris_data.py 中文件的下载和读取部分为:
def load_data(y_name='Species'):
# x-features y-labels
train = pandas.read_excel('iris_training.xlsx',names = CSV_COLUMN_NAMES, header = 0)
train_features,train_labels = train, train.pop(y_name)
test = pandas.read_excel('iris_test.xlsx',names = CSV_COLUMN_NAMES, header = 0)
test_features, test_labels = test, test.pop(y_name)
return (train_features,train_labels),(test_features,test_labels)
即:原来的 may_load 部分可以删除。改写 load_data,使用 read_excel。发现不同版本的中 returen 的
变量有些为 train_features, labels 有些为 train_x, y. 统一修改为 features 和 labels 更方便阅读。
------------------------------------------------------------------------------------------------------------
2. premade_estimator.py
添加 tensorflow 和 iris_data 模块
import tensorflow as tf
import iris_data
从 iris_data 读取 training 和 test 数据
# Fetch the data
(train_features, train_labels), (test_features, test_labels) = iris_data.load_data()
-------------------------------------------------------------------------------------------------------------
将 training_features data 添加到 tf.feature_column 中
my_feature_columns = []
for key in train_features.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
其中
tf.feature_column
#tools for ingesting and representing features
tf.feature_column.numeric_column(...)
#Represents real valued or numerical features
将 train_features 中的每一个 keys 添加到 tensorflow.feature_column 中
-------------------------------------------------------------------------------------------------------------
实例化一个 estimator
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
hidden_units=[10, 10],
n_classes=3)
其中
tf.estimator.DNNClassifier
# A classifier for TensorFlow DNN models.
feature_columns
# input the feature_cloumn of the model
hidden_units = [m,n]
# the length of hidden_units define the number of hidden layers
# m and n define the number of nodes in each layer
n_classes
# the classes to be clarified
-------------------------------------------------------------------------------------------------------------
训练一个模型 Train the Model
classifier.train(
input_fn=lambda:iris_data.train_input_fn(train_features, train_labels,args.batch_size),
steps=args.train_steps)
train_input_fn 引用自 iris_data 定义的函数
def train_input_fn(features, labels, batch_size):
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
dataset = dataset.shuffle(1000).repeat().batch(batch_size)
return dataset
分析其中的模块
tf.data.Dataset
# A Dataset can be used to represent an input pipeline as a collection of elements (nested
# structures #of tensors) and a "logical plan" of transformations that act on those elements.
# 高层 TensorFlow API,用于读取数据并转化成 train 方法所需的格式
tf.data.Dataset.from_tensor_slices
# Creates a Dataset whose elements are slices of the given tensors.
dataset.shuffle
# Randomly shuffles the elements of this dataset 随机的训练样本会使训练效果更好
# 通过函数 tf.data.Dataset.shuffle 将样本随机化
dataset.repeat
# Repeats this dataset count times
dataset.batch
# Combines consecutive elements of this dataset into batches
(dict(features),labels) # features (dic) and labels (seris) combines as a turple
DNNClassifier.train 的第一个参数 input_fn 要求的是一个函数:A function that provides input data for training as minibatches.
而且要求这个函数的返回值是 tf.data.dataset object 或者是 turple
注意在输入 input_fn 函数使用用的 lambda 表达式:lambda
表达式是一行函数。它们在其他语言中也被称为匿名函数。如果你不想在程序中对一个函数使用两次,你也许会想用lambda表达式,它们和普通的函数完全一样。
-------------------------------------------------------------------------------------------------------------
评估一个模型 Evaluate the model
为了评估模型的有效性,每个 estimator 都提供了 evaluate
方法
eval_result = classifier.evaluate(
input_fn=lambda:eval_input_fn(test_features, test_labels, args.batch_size))
print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
注意评估一个模型的有效性需要调用的是测试数据集。 classifier.evaluate 的调用方法与 train 函数类似
def eval_input_fn(features, labels, batch_size):
features=dict(features)
if labels is None:
# No labels, use only features.
inputs = features
else:
inputs = (features, labels)
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices(inputs)
# Batch the examples
assert batch_size is not None, "batch_size must not be None"
dataset = dataset.batch(batch_size)
# Return the dataset.
return dataset
注释:
#The assert statement exists in almost every programming language. When you do...
assert condition
#you're telling the program to test that condition, and trigger an error if the condition is false.
#In Python, it's roughly equivalent to this:
if not condition:
raise AssertionError()
------------------------------------------------------------------------------------------------------------
3.总结
如何构建一个 estimator
如何测试一个 estimator
如果构建 estimator 用的数据