参考网站:
tf.contrib.learn Quickstart
TensorFlow’s high-level machine learning API (tf.contrib.learn) makes it easy to configure, train, and evaluate a variety of machine learning models. In this tutorial, you’ll use tf.contrib.learn to construct a neural network classifier and train it on the Iris data set to predict flower species based on sepal/petal geometry. You’ll write code to perform the following five steps:
TensorFlow的高级机器学习API(tf.contrib.learn)使其变得更容易配置,训练,评估各种不同的机器学习模型。在这教程中,你将使用tf.contrib.learn去配置一个神经网络分类器并且在iris数据中训练它,以工具花萼预测花的种类。你需要做下面5个步骤:
1. 下载包含Iris训练/测试数据的CSVs文件
2. 搭建神经网络分类器
3. 使用训练数据以匹配模型
4. 计算模型的精确度
5. 对一些样本进行分类
Complete Neural Network Source Code:
Here is the full code for the neural network classifier:
这里是有关神经网络分类器的完整代码:
#Coding='UTF-8'
import tensorflow as tf
import numpy as np
#Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TEST = "iris_test.csv"
#Load datasets
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TRAINING,
target_dtype=np.int,
features_dtype = np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=IRIS_TEST,
target_dtype=np.int,
features_dtype=np.float32)
#Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("",dimension=4)]
#Build 3 layer DNN with 10,20,10 units respectively
classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
hidden_units = [10,20,10],
n_classes=3,
model_dir="/tmp/iris_model")
#Fit model
classifier.fit(x=training_set.data,y=training_set.target,steps=2000)
#Evaluate accuracy
accuracy_score = classifier.evaluate(x=test_set.data,y=test_set.target)["accuracy"]
print('Áccuracy:{0:f}'.format(accuracy_score))
#classify two new flower samples_generator
new_samples = np.array(
[[6.4,3.2,4.5,1.5],[5.8,3.1,5.0,1.7]],dtype = float)
y = list(classifier.predict(new_samples,as_iterable=True))
CSV下载地址:
1. iris_training.csv:http://download.tensorflow.org/data/iris_training.csv
2.iris_test.csv:http://download.tensorflow.org/data/iris_test.csv
Load the Iris CSV data to TensorFlow:
The Iris data set contains 150 rows of data, comprising 50 samples from each of three related Iris species: Iris setosa, Iris virginica, and Iris versicolor.
IRIS数据集包含150行数据,三个相关的鸢尾属植物包括50样:Iris setosa,Iris virginica,Iris versicolor。
From left to right, Iris setosa (by Radomil, CC BY-SA 3.0), Iris versicolor (by Dlanglois, CC BY-SA 3.0), and Iris virginica (by Frank Mayfield, CC BY-SA 2.0).
从左到右,分别是:Iris setosa,Iris virginica,Iris versicolor。
Each row contains the following data for each flower sample: sepal length, sepal width, petal length, petal width, and flower species. Flower species are represented as integers, with 0 denoting Iris setosa, 1 denoting Iris versicolor, and 2 denoting Iris virginica
每一行包含以下数据为每个样品:花萼片长度、萼片宽度、花瓣长度、花瓣宽,花卉种类。花卉种类表示为整数,0代表Iris setosa,1表示变色鸢尾,2表示Iris virginica。
如下:
Sepal Length Sepal Width Petal Length Petal Width Species
5.1 3.5 1.4 0.2 0
4.9 3.0 1.4 0.2 0
4.7 3.2 1.3 0.2 0
… … … … …
7.0 3.2 4.7 1.4 1
6.4 3.2 4.5 1.5 1
6.9 3.1 4.9 1.5 1
… … … … …
6.5 3.0 5.2 2.0 2
6.2 3.4 5.4 2.3 2
5.9 3.0 5.1 1.8 2
For this tutorial, the Iris data has been randomized and split into two separate CSVs
在本教程中,iris数据被随机的分为两种CSVs数据:
1. 120个样本的训练数据
2. 30个样本的测试数据
Place these files in the same directory as your Python code.
将这些文件放在与Python代码相同的目录中。
Step1:首先你要导入TensorFlow和numpy:
from future import absolute_import
from future import division
from future import print_function
import tensorflow as tf
import numpy as np
Step2:通过load_csv_with_header()方法将Datasets加载到learn.datasets.base中,load_csv_with_header()方法需要满足下面三个要求:
1. filename:csv文件的路径
2. target_dtype:numpy datatype的目标数据类型
3. features_dtype:数据集中numpy datatype的特征数据类型
Here, the target (the value you’re training the model to predict) is flower species, which is an integer from 0–2, so the appropriate numpy datatype is np.int:
在这里,目标(你训练的模型预测值)是花的品种,这是从0–2整数,所以numpy的适当数据类型为np.int
#Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TEST = "iris_test.csv"
#Load datasets
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(filename=IRIS_TRAINING,target_dtype = np.int,features_dtype = np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(filename = IRIS_TEST,target_dtype=np.int,features_dtype=np.float32)
Datasets in tf.contrib.learn are named tuples; you can access feature data and target values via the data and target fields. Here, training_set.data and training_set.target contain the feature data and target values for the training set, respectively, and test_set.data and test_set.target contain feature data and target values for the test set.
在tf.contrib.learn中的Datasets是一个命名元组,你可以通过data和target获取到特征数据和目标值。
Setp3:将DNNClassifier匹配给Iris Training Data,你可以使用training_set.data和traring_set.target去训练你的模型,最后评估模型的精确度时,你可以使用test_set.data和test_set.target。但在此之前,你需要搭建你的模型。
Construct a Deep Neural Network Classifier:
tf.contrib.learn offers a variety of predefined models, called Estimators, which you can use “out of the box” to run training and evaluation operations on your data. Here, you’ll configure a Deep Neural Network Classifier model to fit the Iris data. Using tf.contrib.learn, you can instantiate your DNNClassifier with just a couple lines of code:
tf.contrib.learn提供多种预定义的模型,称为Estimators,您可以使用“开箱”来训练和评估你的数据操作。在这里,您将配置一个深的神经网络分类器模型来适应Iris data。使用tf.contrib.learn,只用几行代码便可以实例化你的DNNClassifier:
#Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("",dimension=4)]
#Build 3 layer DNN with 10,20,10units respectively
classifier = tf.contrib.learn.DNNClassifier(
feature_columns=feature_columns,
hidden_units=[10,20,10],
n_classes=3,
model_dir="/tmp/iris_model")
The code above first defines the model’s feature columns, which specify the data type for the features in the data set. All the feature data is continuous, so tf.contrib.layers.real_valued_column is the appropriate function to use to construct the feature columns. There are four features in the data set (sepal width, sepal height, petal width, and petal height), so dimensions must be set accordingly to 4 to hold all the data.
上面的代码首先定义了模型的特征列,其中指定了数据集中的特性的数据类型。所有的特征数据是连续的,所以tf.contrib.layers.real_valued_column是使用构造特征列相应的功能。有四个特征数据集(萼片萼片宽度,高度,宽度和花瓣,花瓣高度),所以尺寸必须设置相应的4将所有的数据。
Then, the code creates a DNNClassifier model using the following arguments:
接下来创建DNNClassifier模型,使用如下参数:
1. feature_columns=feature_columns. The set of feature columns defined above.
这参数描述的是上面所述的一系列特征
2. hidden_units=[10, 20, 10]. Three hidden layers, containing 10, 20, and 10 neurons, respectively.
三个隐藏层,分别包含10,20和10个神经元
3. n_classes=3. Three target classes, representing the three Iris species.
三个目标类,表示的是Iris的三个种类(标识)
4. model_dir=/tmp/iris_model. The directory in which TensorFlow will save checkpoint data during model training. For more on logging and monitoring with TensorFlow, see Logging and Monitoring Basics with tf.contrib.learn.
在这个目录中,TensorFlow将会保留训练模型中检测到的点,还有一些日志和监控信息。
Fit the DNNClassifier to the Iris Training Data:
Now that you’ve configured your DNN classifier model, you can fit it to the Iris training data using the fit method. Pass as arguments your feature data (training_set.data), target values (training_set.target), and the number of steps to train (here, 2000):
现在你已经配置了DNN的分类器模型,你可以使用fit方法去训练iris数据。通过训练特征数据(training_set.data)、目标值(training_set.target),和培养步骤的数量(here,2000):
#Fit model
classifier.fit(x=training_set.data,y=training_set.target,step=2000)
The state of the model is preserved in the classifier, which means you can train iteratively if you like. For example, the above is equivalent to the following:
模型的状态被保存在分类器中,这意味着如果需要你可以迭代训练。例如,上面的内容相当于:
classifier.fit(x=training_set.data,y=training_set.target,steps=1000)
classifier.fit(x=training_set.data,y=training_set.target,steps=1000)
However, if you’re looking to track the model while it trains, you’ll likely want to instead use a TensorFlow monitor to perform logging operations. See the tutorial “Logging and Monitoring Basics with tf.contrib.learn” for more on this topic.
然而,如果你想要追踪模型训练情况,你可能会想用一个tensorflow监控以记录所执行的操作。看到教程“Logging and Monitoring Basics with tf.contrib.learn”(https://www.tensorflow.org/versions/r0.12/tutorials/monitors/index.html)
Evaluate Model Accuracy:
You’ve fit your DNNClassifier model on the Iris training data; now, you can check its accuracy on the Iris test data using the evaluate method. Like fit, evaluate takes feature data and target values as arguments, and returns a dict with the evaluation results. The following code passes the Iris test data—test_set.data and test_set.target—to evaluate and prints the accuracy from the results:
你将构建在Iris 训练数据集上的DNNClassifier模型;现在,你可以使用评估模型评价Iris测试数据的准确性。如适合,评价以特征数据与目标值作为参数,并返回一个结果。下面的代码通过Iirs测试数据:test_set.data和test_set.target去评价和打印结果的精度:
accuracy_score = classifier.evaluate(x=test_set.data,y=test_set.target)
print('Accuracy:{0:f}'.format(accuracy_score))
Run the full script, and check the accuracy results:
执行这脚本后,精度结果为:
Accuracy: 0.966667
Your accuracy result may vary a bit, but should be higher than 90%. Not bad for a relatively small data set!
你的精确结果可能有点变化,但应该高于90%。对于相对较小的数据集,这是不错的!
Classify New Samples:
Use the estimator’s predict() method to classify new samples. For example, say you have these two new flower samples:
使用估计的predict()方法对新的样本进行分类。例如,说你有这两个新的花卉样品:
Sepal Length Sepal Width Petal Length Petal Width
6.4 3.2 4.5 1.5
5.8 3.1 5.0 1.7
You can predict their species with the following code:
你可以通过下面代码预测它们的种类:
#Classify two new flower samples
new_samples = np.array([[6.4,3.2,4.5,1.5],[5.8,3.1,5.0,1.7]],dtype=float)
y = list(classifier.predict(new_samples,as_iterable=True))
print('Predictions:{}'.format(str(y)))
The predict() method returns an array of predictions, one for each sample:
这predict()方法将为每个样本返回它的估计值:
Prediction: [1 2]
The model thus predicts that the first sample is Iris versicolor, and the second sample is Iris virginica.
这个模型估计出第一个样本是Iris versicolor,第二个样本是Iris virginica
Additional Resources:
1. 更多tf.contrib.learn的参考信息:https://www.tensorflow.org/versions/r0.12/api_docs/python/contrib.learn.html
2. tf.contrib.learn如何创建线性模型:https://www.tensorflow.org/versions/r0.12/tutorials/linear/overview.html
3. 使用tf.contrib.learn构建你自己的估计器:http://terrytangyuan.github.io/2016/07/08/understand-and-build-tensorflow-estimator/
4. 在页面中实现可视化训练神经网络模型:http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle®Dataset=reg-plane&learningRate=0.03®ularizationRate=0&noise=0&networkShape=4,2&seed=0.40365&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false
5. 关于神经网络中的更多教程:CNN(https://www.tensorflow.org/versions/r0.12/tutorials/deep_cnn/index.html),RNN(https://www.tensorflow.org/versions/r0.12/tutorials/recurrent/index.html)