本文是我自己对官方文档的记录
基于TensorFlow预先定义的一些Estimator,编写程序,需要遵从以下步骤:
1. 创建一个或多个输入函数
2. 定义模型的特征列(feature_column)
3. 实例化Estimator,指定特征列和各种超参数
4. 在Estimator对象上调用一个或多个方法,传递适当的输入函数作为数据源。
(1)输入函数应该是一个返回 tf.data.Dataset 的函数,返回的dataset 输出two-element tuple:
features: (是python中的一个字典,key(键)是feature的名字,value(值)是包含所有该键对应的值的数组)
label:包含所有样本标签值的数组。
(2)feature_column用于描述model如何使用原始输入数据。当创建Estimator时,需要传入feature_column来告诉模型,会传入什么特征。在本例中,因为传入的特征为4个数值,因此我们创建feature_column告诉Estimator model,每个特征都使用32位浮点数来表示。
(3)实例化Estimator
鸢尾花是一个经典的分类问题,TensorFlow提供了现成的model,
(4)训练,评估和测试
调用model.train()方法
model.evaluate()
代码:
import tensorflow as tf
import pandas as pd
import argparse
TRAIN_URL = "http://download.tensorflow.org/data/iris_training.csv"
TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth',
'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
def maybeDownload():
pathTrain = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1],TRAIN_URL)
pathTest = tf.keras.utils.get_file(TEST_URL.split('/')[-1],TEST_URL)
return pathTrain,pathTest
def loadData(label_name='Species'):
pathTrain ,pathTest =maybeDownload()
Train = pd.read_csv(pathTrain,names=CSV_COLUMN_NAMES,header=0)
Test = pd.read_csv(pathTest,names=CSV_COLUMN_NAMES,header=0)
Train_X,Train_Y = Train,Train.pop(label_name)
Test_X,Test_Y = Test, Test.pop(label_name)
return (Train_X,Train_Y), (Test_X,Test_Y)
def trainInFunc (features ,labels ,batchsize):
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
dataset = dataset.shuffle(1000).repeat().batch(batchsize)
return dataset
def testInFunc (features, labels, batchsize):
features = dict(features)
if labels is None:
input = features
else:
input = (features, labels)
dataset = tf.data.Dataset.from_tensor_slices(input)
assert batchsize is not None, "batch_size must not be None"
dataset = dataset.batch(batchsize)
return dataset
parser = argparse.ArgumentParser()
parser.add_argument('--batch_size', default=100, type=int, help='batch size')
parser.add_argument('--train_steps', default=1000, type=int,
help='number of training steps')
def main(argv):
arg = parser.parse_args(argv[1:])
(Train_X, Train_Y) ,(Test_X, Test_Y) = loadData()
feature_column = []
for key in Test_X.keys():
feature_column.append(tf.feature_column.numeric_column(key=key))
classifier = tf.estimator.DNNClassifier(hidden_units=[10,10],feature_columns=feature_column,n_classes=3)
classifier.train(lambda :trainInFunc(Train_X,Train_Y,100),steps=1000)
accuracy = classifier.evaluate(lambda :testInFunc(Test_X,Test_Y,100))
print(accuracy)
if __name__ == '__main__':
tf.logging.set_verbosity(tf.logging.INFO)
tf.app.run(main)