TensorFlow 学习初步- 预制 Estimator

最新推荐文章于 2022-09-20 19:21:05 发布

Gustav_II

最新推荐文章于 2022-09-20 19:21:05 发布

阅读量177

点赞数

文章标签： Python TensorFlow

本文链接：https://blog.csdn.net/Gustav_II/article/details/80113297

版权

摘要: tensorflow estimator 的示例代码学习

程序有 premade_estimator.py 和 iris_data.py

iris_data.py 读取 training data 和 test data 以及定义 estimator 用的数据格式

-----------------------------------------------------------------------------------------------------------

1. iris_data.py 程序修改

iris_data 远程下载训练集和测试集。

http://download.tensorflow.org/data/iris_training.csv
http://download.tensorflow.org/data/iris_test.csv

但是实际测试无法使用。这里有这两个文件：

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_training.csv
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_test.csv

下载后保存为 xlsx，并修改 iris_data.py 中文件的下载和读取部分为：

def load_data(y_name='Species'):
    
    # x-features y-labels
    
    train = pandas.read_excel('iris_training.xlsx',names = CSV_COLUMN_NAMES, header = 0)
    train_features,train_labels = train, train.pop(y_name)
    
    test = pandas.read_excel('iris_test.xlsx',names = CSV_COLUMN_NAMES, header = 0)
    test_features, test_labels = test, test.pop(y_name)
    
    return (train_features,train_labels),(test_features,test_labels)

即：原来的 may_load 部分可以删除。改写 load_data，使用 read_excel。发现不同版本的中 returen 的

变量有些为 train_features, labels 有些为 train_x, y. 统一修改为 features 和 labels 更方便阅读。

------------------------------------------------------------------------------------------------------------

2. premade_estimator.py

添加 tensorflow 和 iris_data 模块

import tensorflow as tf
import iris_data

从 iris_data 读取 training 和 test 数据

 # Fetch the data
  (train_features, train_labels), (test_features, test_labels) = iris_data.load_data()

-------------------------------------------------------------------------------------------------------------

将 training_features data 添加到 tf.feature_column 中

 my_feature_columns = []
 for key in train_features.keys():
     my_feature_columns.append(tf.feature_column.numeric_column(key=key))

其中

tf.feature_column 
#tools for ingesting and representing features

tf.feature_column.numeric_column(...)
#Represents real valued or numerical features

将 train_features 中的每一个 keys 添加到 tensorflow.feature_column 中

-------------------------------------------------------------------------------------------------------------

实例化一个 estimator

classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    hidden_units=[10, 10],
    n_classes=3)

其中

tf.estimator.DNNClassifier
# A classifier for TensorFlow DNN models.

feature_columns
# input the feature_cloumn of the model

hidden_units = [m,n]
# the length of hidden_units define the number of hidden layers
# m and n define the number of nodes in each layer

n_classes
# the classes to be clarified

-------------------------------------------------------------------------------------------------------------

训练一个模型 Train the Model

classifier.train(
    input_fn=lambda:iris_data.train_input_fn(train_features, train_labels,args.batch_size),
    steps=args.train_steps)

train_input_fn 引用自 iris_data 定义的函数

def train_input_fn(features, labels, batch_size):

    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)

    return dataset

分析其中的模块

tf.data.Dataset 

# A Dataset can be used to represent an input pipeline as a collection of elements (nested 
# structures #of tensors) and a "logical plan" of transformations that act on those elements.

# 高层 TensorFlow API，用于读取数据并转化成 train 方法所需的格式

tf.data.Dataset.from_tensor_slices

# Creates a Dataset whose elements are slices of the given tensors.

dataset.shuffle

# Randomly shuffles the elements of this dataset 随机的训练样本会使训练效果更好
# 通过函数 tf.data.Dataset.shuffle 将样本随机化

dataset.repeat

# Repeats this dataset count times

dataset.batch

# Combines consecutive elements of this dataset into batches

(dict(features),labels) # features (dic) and labels (seris) combines as a turple

DNNClassifier.train 的第一个参数 input_fn 要求的是一个函数：A function that provides input data for training as minibatches.

而且要求这个函数的返回值是 tf.data.dataset object 或者是 turple

注意在输入 input_fn 函数使用用的 lambda 表达式：lambda 表达式是一行函数。它们在其他语言中也被称为匿名函数。如果你不想在程序中对一个函数使用两次，你也许会想用lambda表达式，它们和普通的函数完全一样。

-------------------------------------------------------------------------------------------------------------

评估一个模型 Evaluate the model

为了评估模型的有效性，每个 estimator 都提供了 evaluate 方法

eval_result = classifier.evaluate(
    input_fn=lambda:eval_input_fn(test_features, test_labels, args.batch_size))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))

注意评估一个模型的有效性需要调用的是测试数据集。 classifier.evaluate 的调用方法与 train 函数类似

def eval_input_fn(features, labels, batch_size):

    features=dict(features)
    if labels is None:
        # No labels, use only features.
        inputs = features
    else:
        inputs = (features, labels)

    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices(inputs)

    # Batch the examples
    assert batch_size is not None, "batch_size must not be None"
    dataset = dataset.batch(batch_size)

    # Return the dataset.
    return dataset

注释：

#The assert statement exists in almost every programming language. When you do...

assert condition
#you're telling the program to test that condition, and trigger an error if the condition is false.

#In Python, it's roughly equivalent to this:

if not condition:
    raise AssertionError()

------------------------------------------------------------------------------------------------------------

3.总结

如何构建一个 estimator

如何测试一个 estimator

如果构建 estimator 用的数据