斯坦福CS20SI：基于Tensorflow的深度学习研究课程笔记

最新推荐文章于 2020-08-29 17:02:34 发布

键盘里的青春

最新推荐文章于 2020-08-29 17:02:34 发布

阅读量603

点赞数

分类专栏： Tensorflow

Tensorflow 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

转自：平凡_

Lecture note 1: Introduction to TensorFlow

1 TF学习(tf.contrib.learn)
Tensorflow有简化的界面，TF学习(tf.contrib.learn)提供可用的模型，用户可以简单的调用。这是为深度学习刻意创建了一个模仿sciki-learn的模型可以实现从单线机器的sciki-learn世界学习平滑过渡到更开放的可以构建不同形状的ML模型的世界。事实上，TF学习最初是一个叫做scikit flow（SKFlow）的独立项目。

TF学习允许你载入数据，建立模型，使用合适的模型训练数据，计算正确率，每一个都使用单行方式。一些能说上名字来的使用单行方式的模型在TF学习中包括线性分类，线性回归，深层神经网络分类。谷歌已经出了很好的教程教你如何使用TF学习构建自定义模型。

文档在TF学习中的深层神经网络分类中：这是一个Github上Tensorflow Examples例子中的一个TF学习的例子。

# 下载数据 
iris = tf.contrib.learn.datasets.load_dataset('iris') 
x_train, x_test, y_train, y_test = cross_validation.train_test_split( iris.data, iris.target, test_size=0.2, random_state=42) 
# 建立三层深度神经网每层分别是10，20，10个单元 
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input( x_train) classifier = tf.contrib.learn.DNNClassifier( feature_columns=feature_columns, hidden_units=[10, 20, 10], n_classes=3) 
# 分类并预测
classifier.fit(x_train, y_train, steps=200) predictions = list(classifier.predict(x_test, as_iterable=True)) score = metrics.accuracy_score(y_test, predictions) 
print('Accuracy: {0:f}'.format(score))

可以看到TF学习让你用一行加载数据，在另一行拆分数据，并且你可以选择建立在深度神经网络中隐藏单元的数量。在此情况下，第一层有10个隐藏单元，第二层有20个隐藏单元，第三层有10个隐藏单元。你也可以指定分类标签的个数，优化器（例如，梯度下降）的类型以及初始权重。

你可以访问TF学习示例知识库，了解更多TF学习的示例。但是，注意里面大多数建立的模型都是使用已经弃用的函数实现的，所以如果你调用它们会看到许多警告。

2 TF-Slim（tf.contrib.slim）
另一个简单的API叫做TF-Slim，它构建和训练以及评估神经网络的方式比较简化。

3 Tensorflow顶部的高级API
在Tensorflow顶部构建了许多高级的API。
包括一些最流行的API—— Keras (keras@GitHub), TFLearn (tflearn@GitHub), and Pretty Tensor(prettytensor@GitHub)。

注意：你不应该对简化TF学习界面的高级API TF学习（TF和学习之间没有空格）感到困惑。TF学习支持大部分的深度学习模型，像ConvNets, LSTM, BiRNN, ResNets, Generative networks和功能如BatchNorm,PReLU。TF学习由 Aymeric Damien开发。

Spoiler警告：Aymeric在几周里给出了客座讲座。

但是，Tensorflow的主要哦目的不是提供开箱即用的机器学习。相反，Tensorflow提供了适用度较广和函数和分类器允许用户从头开始定义模型。这更复杂，但是会更具备灵活性。你可以建立几乎你能想到的在Tensorflow中的人和架构。

数据流图：

定义中下一个关键短语是“数据流图”。这个关键短语意味着TF可以在图表中做所有的计算。请参阅第一讲的幻灯片（从23页）了解更多详情！
地址见此

会话还将分配内存来存储变量的当前值。
如您所见，我们变量的值只在一个会话内有效。如果我们在第二个会话后试图查询值，Tensorflow将引发一个错误，因为变量不是初始化的。

Lecture note2:TensorFlow Ops

1 TensorBoard很有趣

Tensorflow是一个集常量，变量，操作为一体的平台。它不仅仅是一个软件库，是包括一系列Tensorflow,TensorBoard和TensorServing的一套软件。要想了解Tensorflow，我们应该知道如何使用上述所有的这些软件的结合。在这节课中，我们将首先介绍TensorBoard。

TensorBoard是一个可视化软件,任何一个标准的Tensorflow版本都能安装。用谷歌自己的话说“你要用Tensorflow去计算，例如训练一个大规模的深度神经网-可能会更加复杂和混乱。简单的去理解，就是要调试和优化TensorFlow的项目，需要用一套可视化的工作就是TensorBoard”。

完全配置TensorBoard以后，看起来如下图，图像来自TensorBoad的网站。

当用户在TensorBoard-avtivated上执行某些操作时，例如Tensorflow的项目，这些操作可以输出一个文件。TensorBoard能将这些文件转换成图片，从而分析一个模型的行为。早点学习TensorBoard并且结合Tensorflow去使用可以实现更加愉快和富有成效的合作。

让我们首先写出Tensorflow的程序并且使用TensorBoard让其可视化。

import tensorflow as tf
a = tf.constant(2)
b = tf.constant(3)
x = tf.add(a, b)
with tf.Session() as sess:
print sess.run(x)

运行循环训练之前，可以在创建图片后添加这些语句激活TensorBoard。

writer = tf.summary.FileWriter(logs_dir, sess.graph)

上面那条语句是创建一个Writer去在event文件中写一些操作命令，保存在文件夹logs_dir中，你可以用文件路径‘./graphs’替代logs_dir如下：

import tensorflow as tf
a = tf.constant(2)
b = tf.constant(3)
x = tf.add(a, b)

with tf.Session() as sess:
       writer = tf.summary.FileWriter('./graphs', sess.graph)
       print sess.run(x)

# close the writer when you’re done using it
writer.close()

接下来，去终端运行程序。确保你当前的工作目录和跑你的Python代码时候的目录一样。

$ python [yourprogram.py]
$ tensorboard --logdir="./graphs"

打开浏览器键入地址http://localhost:6006/（或者是在运行完tensorboard命令之后你可以返回的那个链接）

在选项卡图中可以看到类似这种：

在选项卡图中我可以看这个图有3个节点：

这里写图片描述

a = tf.constant(2)
b = tf.constant(3)
x = tf.add(a, b)

“Const”和”Const_1”对应于a和b，节点add对应于x。a,b,x只是变量名，方便我们使用，和内部的Tensorflow毫无关系。TensorBoard中使用name给节点命名。如下：

a = tf.constant([2, 2], name="a")
b = tf.constant([3, 6], name="b")
x = tf.add(a, b, name="add")

现在如果再次运行TensorBoard，你可以看到如下图片：

这里写图片描述

这个图本身定义了ops和依赖库，但不显示值。它只在意我们运行会话的时候获取的一些值。如果你忘记了，运行下列语句可以快速提醒：

tf.Session.run(fetches,feed_dict=None,options=None,run_metadata=None)

注意：如果你已经多次运行代码，将会有多个event文件在路径
‘~/dev/cs20si/graphs/lecture01’中，TF只显示最后的图形，出现多个event文件的警告，如果去除警告需要删除所有的你不再需要的event文件。

2 constant类型

文档链接

你可以创建标量或者张量值的常量。

tf.constant(value, dtype=None, shape=None, name='Const', verify_shape=False)
# constant of 1d tensor (vector)
a = tf.constant([2, 2], name="vector")
# constant of 2x2 tensor (matrix)
b = tf.constant([[0, 1], [2, 3]], name="b")

你可以创建具有特定值的元素的张量。

类似numpy.zeros, numpy.zeros_like, numpy.ones, numpy.ones_like这些语句。

tf.zeros(shape, dtype=tf.float32, name=None)
# create a tensor of shape and all elements are zeros
tf.zeros([2, 3], tf.int32) ==> [[0, 0, 0], [0, 0, 0]]

tf.zeros_like(input_tensor, dtype=None, name=None, optimize=True)
# create a tensor of shape and type (unless type is specified) as the input_tensor
but all elements are zeros.
# input_tensor is [0, 1], [2, 3], [4, 5]]
tf.zeros_like(input_tensor) ==> [[0, 0], [0, 0], [0, 0]]

tf.ones(shape, dtype=tf.float32, name=None)
# create a tensor of shape and all elements are ones
tf.ones([2, 3], tf.int32) ==> [[1, 1, 1], [1, 1, 1]]

tf.ones_like(input_tensor, dtype=None, name=None, optimize=True)
# create a tensor of shape and type (unless type is specified) as the input_tensor
but all elements are ones.
# input_tensor is [0, 1], [2, 3], [4, 5]]
tf.ones_like(input_tensor) ==> [[1, 1], [1, 1], [1, 1]]

tf.fill(dims, value, name=None)
# create a tensor filled with a scalar value.
tf.ones([2, 3], 8) ==> [[8, 8, 8], [8, 8, 8]]

你可以创建序列的常量：

tf.linspace(start, stop, num, name=None)
# create a sequence of num evenly-spaced values are generated beginning at start. If
num > 1, the values in the sequence increase by stop - start / num - 1, so that the
last one is exactly stop.
# start, stop, num must be scalars
# comparable to but slightly different from numpy.linspace
# numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
tf.linspace(10.0, 13.0, 4, name="linspace") ==> [10.0 11.0 12.0 13.0]

tf.range(start, limit=None, delta=1, dtype=None, name='range')
# create a sequence of numbers that begins at start and extends by increments of
delta up to but not including limit
# slight different from range in Python
# 'start' is 3, 'limit' is 18, 'delta' is 3
tf.range(start, limit, delta) ==> [3, 6, 9, 12, 15]
# 'start' is 3, 'limit' is 1, 'delta' is -0.5
tf.range(start, limit, delta) ==> [3, 2.5, 2, 1.5]
# 'limit' is 5
tf.range(limit) ==> [0, 1, 2, 3, 4]

注意和Numpy或者Python序列不一样，Tensorflow序列不可迭代。

for _ in np.linspace(0, 10, 4): # OK
for _ in tf.linspace(0, 10, 4): # TypeError("'Tensor' object is not iterable.")
for _ in range(4): # OK
for _ in tf.range(4): # TypeError("'Tensor' object is not iterable.")

你还可以生成特定分布的随机数。

tf.random_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None)
tf.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None,
name=None)
tf.random_uniform(shape, minval=0, maxval=None, dtype=tf.float32, seed=None,
name=None)
tf.random_shuffle(value, seed=None, name=None)
tf.random_crop(value, size, seed=None, name=None)
tf.multinomial(logits, num_samples, seed=None, name=None)
tf.random_gamma(shape, alpha, beta=None, dtype=tf.float32, seed=None, name=None)

3 数学操作

Tensorflow的数学操作很标准，类似于Numpy。

访问链接了解更多。因为列举数学操作是比较枯燥的。

a = tf.constant([3, 6])
b = tf.constant([2, 2])
tf.add(a, b) # >> [5 8]
tf.add_n([a, b, b]) # >> [7 10]. Equivalent to a + b + b
tf.mul(a, b) # >> [6 12] because mul is element wise
tf.matmul(a, b) # >> ValueError
tf.matmul(tf.reshape(a, shape=[1, 2]), tf.reshape(b, shape=[2, 1])) # >> [[18]]
tf.div(a, b) # >> [1 3]
tf.mod(a, b) # >> [1 0]

下面是Python中的ops表，由“深度学习基础”一书的作者提供。

4 数据类型

Python自身数据类型：

Tensorflow兼容Python本身的数据类型，类似Python boolean values，数值（整数，浮点数）和字符串。单个值将被看做0-d的张量（或标量），列表值将被转换成1-d个张量（向量），值列表将被转换为2-d个张量（矩阵）等。下面的例子是从“机器智能中的Tensorflow”这本书里改编修改的。

t_0 = 19 # Treated as a 0-d tensor, or "scalar"
tf.zeros_like(t_0) # ==> 0
tf.ones_like(t_0) # ==> 1
t_1 = [b"apple", b"peach", b"grape"] # treated as a 1-d tensor, or "vector"
tf.zeros_like(t_1) # ==> ['' '' '']
tf.ones_like(t_1) # ==> TypeError: Expected string, got 1 of type 'int' instead.
t_2 = [[True, False, False],
 [False, False, True],
 [False, True, False]] # treated as a 2-d tensor, or "matrix"
tf.zeros_like(t_2) # ==> 2x2 tensor, all elements are False
tf.ones_like(t_2) # ==> 2x2 tensor, all elements are True

Tensorflow自身数据类型：

类似Numpy，Tensorflow也有自己的数据类型，如你所见tf.int32，tf.float32.下面是当前一个Tensorflow的数据类型的列表，取自Tensorflow的官方文档。

Numpy自身数据类型

现在，你可能注意到了Numpy和Tensorflow之间的相似性，Tensorflow被设计为与Numpy无缝接合，这个平台已经成为数据科学的通用平台。

Tensorflow的数据类型基于Numpy的数据类型，事实上，np.32==tf.int32返回为True，可以用Numpy数据类型在Tensorflow里操作。

例如：

tf.ones([2, 2], np.float32) ==> [[1.0 1.0], [1.0 1.0]]

还记得我们的老朋友tf.Session.run(fetches)吗？如果请求对象是Tensor，则输出为Numpy数组。

TL；DR：大多数时候，Tensorflow的数据类型和Numpy的数据类型是可以互相转换的。

注1：字符串数据类型有一个例外，数字和布尔类型Tensorflow和Numpy可以匹配的很好，然而由于Numpy管理数组的方式，tf.string没有一个确切的Numpy匹配，但是你依然可以将Numpy的字符串数据导入到tf.string中，只要你不要在Numpy中声明一个确切的dtype。

注2：Numpy和Tensorflow都是n-d数组库，np提供ndarrar，但是不支持创建张量函数和自动计算导数的方法，也不支持GPU运算，所以Tensorflow更胜一筹。

注3：使用Python类型指定Tensorflow对象快速而且简单，对于示例代码这是一个好主意。然而这样做有一个重要的缺陷，Python类型缺少明确规定数据类型的能力，但Tensorflow类型更为具体，比如所有的整型都是相同类型，但是Tensorflow有8-bit，16-bit，64-bit等整型可用，因此，如果你使用Python类型，Tensorflow不得不推导你的数据具体类型。

当你将数据传递到Tensorflow时，可以将数据转换为适当的类型，但是某些数据类型仍然可能难以正确声明，例如复数。因为这是常见的创建者定义的作为NumPy数组的Tensor对象。但是，如果可能的话还是要尽可能的使用TensorFlow类型，因为TensorFlow和NumPy若可以演变为一个点，这种兼容性不再存在。

5变量

常量很有趣，但我认为现在你已经足够熟悉变量了。他们的区别就是：
1 常量就是常量，而一个变量是可以被赋值和改变的。
2 常量的值被存储在图形中，并且其值可在图形的任何位置被复制加载。然而变量被单独存储，并且可能存在于参数服务器上。

第二点的意思是常量存储在图片的定义中。当常量占有的内存很大时，每次你要加载图片的时候就需要很久的时间。要查看图片的定义以及存储在图片定义中的内容，只需要输出图片的protobuf。protobuf代表协议缓冲区，“谷歌的语言中立，平台中立，可扩展用于序列化结构化的数据——像XML（可扩展标记语言），但是要更小，更快，更简单。”

import tensorflow as tf
my_const = tf.constant([1.0, 2.0], name="my_const")
print tf.get_default_graph().as_graph_def()

输出：

node {
 name: "my_const"
 op: "Const"
 attr {
 key: "dtype"
 value {
 type: DT_FLOAT
 }
 }
 attr {
 key: "value"
 value {
 tensor {
 dtype: DT_FLOAT
 tensor_shape {
 dim {
 size: 2
 }
 }
 tensor_content: "\000\000\200?\000\000\000@"
 }
 }
 }
}
versions {
 producer: 17
}

声明变量：
要声明一个变量，你需要创建一个类tf.Variable的实例。注意它是tf.constant而不是
tf.Variable也不是tf.variable，因为tf.constant是一个操作（节点），而tf.Variable是一个类。

#create variable a with scalar value
a = tf.Variable(2, name="scalar")
#create variable b as a vector
b = tf.Variable([2, 3], name="vector")
#create variable c as a 2x2 matrix
c = tf.Variable([[0, 1], [2, 3]], name="matrix")
# create variable W as 784 x 10 tensor, filled with zeros
W = tf.Variable(tf.zeros([784,10]))

tf.Variable可以有多种操作：

x = tf.Variable(...)
x.initializer # init
x.value() # read op
x.assign(...) # write op
x.assign_add(...)
# and more

你必须在使用变量之前初始化变量。如果你尝试使用变量，在初始化之前，你会遇到Failed Precondition Error：尝试使用未初始化的值张量。最简单的方法是使用以下命令tf.global_variables_initializer（）初始化所有变量。

init = tf.global_variables_initializer()
with tf.Session() as sess:
tf.run(init)

注意，你使用tf.run（）运行初始化变量，而不是获取任何值。如果要仅初始化部分变量，你可以使用 tf.variables_initializer()初始化变量列表。

init_ab = tf.variables_initializer([a, b], name="init_ab")
with tf.Session() as sess:
tf.run(init_ab)

你还可以使用tf.Variable.initializer分别初始化每个变量。

# create variable W as 784 x 10 tensor, filled with zeros
W = tf.Variable(tf.zeros([784,10]))
with tf.Session() as sess:
tf.run(W.initializer)

另一种初始化变量的方法是从保存的文件中恢复它。我们将在下面几周讨论此问题。

评估变量的值
如果我们输出初始化的变量，我们只能看到张量。

# W is a random 700 x 100 variable object
W = tf.Variable(tf.truncated_normal([700, 10]))
with tf.Session() as sess:
sess.run(W.initializer)
print W
>> Tensor("Variable/read:0", shape=(700, 10), dtype=float32)

要获取变量的值，我们需要使用eval()

# W is a random 700 x 100 variable object
W = tf.Variable(tf.truncated_normal([700, 10]))
with tf.Session() as sess:
sess.run(W.initializer)
print W.eval()
>> [[-0.76781619 -0.67020458 1.15333688 ..., -0.98434633 -1.25692499
 -0.90904623]
[-0.36763489 -0.65037876 -1.52936983 ..., 0.19320194 -0.38379928
 0.44387451]
[ 0.12510735 -0.82649058 0.4321366 ..., -0.3816964 0.70466036
 1.33211911]
...,
[ 0.9203397 -0.99590844 0.76853162 ..., -0.74290705 0.37568584
 0.64072722]
[-0.12753558 0.52571583 1.03265858 ..., 0.59978199 -0.91293705
 -0.02646019]
[ 0.19076447 -0.62968266 -1.97970271 ..., -1.48389161 0.68170643
 1.46369624]]

为变量赋值
我们可以使用tf.Variable.assign（）为一个变量赋值

W = tf.Variable(10)
W.assign(100)
with tf.Session() as sess:
sess.run(W.initializer)
print W.eval() # >> 10

为什么10而不是100？W.assign（100）不是将值100分配给W，而是创建这个语句去指派op去做那个。为了使这个op生效，我们必须在会话中运行这个op。

W = tf.Variable(10)
assign_op = W.assign(100)
with tf.Session() as sess:
sess.run(assign_op)
print W.eval() # >> 100

注意，在这种情况下，我们没有初始化W，因为assign（）为我们初始化了。事实上，initializer op是将变量的初始值赋给变量本身的assign op。

# in the source code
self._initializer_op = state_ops.assign(self._variable, self._initial_value,
 validate_shape=validate_shape).op

有趣的例子：

# create a variable whose original value is 2
a = tf.Variable(2, name="scalar")
# assign a * 2 to a and call that op a_times_two
a_times_two = a.assign(a * 2)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
 # have to initialize a, because a_times_two op depends on the value of a
sess.run(a_times_two) # >> 4
sess.run(a_times_two) # >> 8
sess.run(a_times_two) # >> 16

TensorFlow将a * 2分配给a这个操作为a_times_two。
对于简单的递增和递减变量，
TensorFlow有tf.Variable.assig_add()和tf.Variable.assign_sub()的方法。
与tf.Variable.assign()不同，tf.Variable.assign_add()和tf.Variable.assign_sub()不会初始化你的变量，因为这些操作取决于变量的初始值。

W = tf.Variable(10)
with tf.Session() as sess:
sess.run(W.initializer)
print sess.run(W.assign_add(10)) # >> 20
print sess.run(W.assign_sub(2)) # >> 18

由于Tensorflow的变量值分开存储，每个session都会有各自独立的变量值：

W = tf.Variable(10)
sess1 = tf.Session()
sess2 = tf.Session()
sess1.run(W.initializer)
sess2.run(W.initializer)
print sess1.run(W.assign_add(10)) # >> 20
print sess2.run(W.assign_sub(2)) # >> 8
print sess1.run(W.assign_add(100)) # >> 120
print sess2.run(W.assign_sub(50)) # >> -42
sess1.close()
sess2.close()

当然，你也可以声明一个依赖于其他变量的变量，假设你想声明U = W * 2

# W is a random 700 x 100 tensor
W = tf.Variable(tf.truncated_normal([700, 10]))
U = tf.Variable(W * 2)

在这种情况下，您应该确保W在使用之前用nitialized_value（）初始化。

U = tf.Variable(W.intialized_value() * 2)

6 InteractiveSession

有时您会看到InteractiveSession，而不是Session。它们的区别是IteractiveSession会把自己作为默认会话，使用run（）或eval（）等方法时不用指定调用特定session，在交互式shell或者ipython笔记本中是很方便的，但是在需要多个session的场景下可能会比较复杂。

sess = tf.InteractiveSession()
a = tf.constant(5.0)
b = tf.constant(6.0)
c = a * b
# We can just use 'c.eval()' without passing 'sess'
print(c.eval())
sess.close()

tf.InteractiveSession.close（）是关闭InteractiveSession。

tf.get_default_session（）是返回当前线程的默认会话。返回的会话将是一个Session或Session.as_default（）最里面的会话输入。

7 控制依赖

有时，我们会有两个独立的操作，但是你想指定先运行哪个操作，然后你就可以使用tf.Graph.control_dependencies（control_inputs）。

举例说明：

# your graph g have 5 ops: a, b, c, d, e
with g.control_dependencies([a, b, c]):
 # `d` and `e` will only run after `a`, `b`, and `c` have executed.
 d = ...
 e = …

8 占位符和feed_dict

记得我们在第一个课程中提到ensorFlow程序通常有两个阶段：
阶段1：组装图
阶段2：使用会话在图中执行操作。
因此，我们可以在不知道需要的计算值的情况下首先组合图。这等同于限定的x，y的函数而不知道的x，y的值。
例如，f（x，y）= x * 2 + y。
x，y是实际值的占位符。
我们或我们的客户在以后提供自己的数据时需要用图形组合执行计算。
要定义占位符，我们使用：

tf.placeholder(dtype, shape=None, name=None)

Dtype是指定占位符值的数据类型的必需参数。

Shape可接受为实际值的确定形状的张量占位符。shape = None表示将接受任何形状的张量。使用shape = None很容易构建图形，但是调试比较难。你应该尽可能详细地定义占位符的形状。

你也可以给你的占位符一个名字，那样你就可以在TensorFlow中执行任何其他操作。

关于占位符的更多信息在官方文档中。

# create a placeholder of type float 32-bit, shape is a vector of 3 elements
a = tf.placeholder(tf.float32, shape=[3])
# create a constant of type float 32-bit, shape is a vector of 3 elements
b = tf.constant([5, 5, 5], tf.float32)
# use the placeholder as you would a constant or a variable
c = a + b # Short for tf.add(a, b)
If we try to fetch c, we will run into error.
with tf.Session() as sess:
print(sess.run(c))
>> NameError

这里有一个错误，因为要计算c，我们需要a的值，但a只是一个占位符，没有实际的值。我们必须先将实际的值赋给a。

with tf.Session() as sess:
# feed [1, 2, 3] to placeholder a via the dict {a: [1, 2, 3]}
# fetch value of c
print(sess.run(c, {a: [1, 2, 3]}))
>> [6. 7. 8.]

让我们看一下在TensorBoard里如何显示，添加

writer = tf.summary.FileWriter('./my_graph', sess.graph)

并且在终端键入：

$ tensorboard --logdir='my_graph'

我们能看到，占位符是像其他操作一样处理的，3是占位符的形状。

在前面的示例中，我们将一个值赋给占位符。如果我们想赋多个数据给占位符呢？这是一个合理的假设，因为我们经常要运行一些在我们训练或者测试集中的多个数据点进行计算。
我们可以通过遍历数据集去让任何数据点作为我们想要的占位符，并且一次只输入一个值。

with tf.Session() as sess:
for a_value in list_of_a_values:
print(sess.run(c, {a: a_value}))

你可以将值传递给不是占位符的张量。要检测张量是否可以被传递值：

tf.Graph.is_feedable(tensor)

# create Operations, Tensors, etc (using the default graph)
a = tf.add(2, 5)
b = tf.mul(a, 3)
# start up a `Session` using the default graph
sess = tf.Session()
# define a dictionary that says to replace the value of `a` with 15
replace_dict = {a: 15}
# Run the session, passing in `replace_dict` as the value to `feed_dict`
sess.run(b, feed_dict=replace_dict) # returns 45

feed_dict可以很好的测试你的模型。当你有一个大图并且只是想要测试某些部分，你可以提供虚拟的值，TensorFlow不会浪费时间做不必要的计算。

9延迟加载的陷阱

在Tensorflow中最常见的非Bug错误我看到（以及我用来提交）的就是我朋友Danijar跟我说的，并且我称它为“延迟加载”。延迟加载是指一种编程的术语，意思是当你延迟声明/初始化一个对象直到它被加载的时候。在Tensorflow中，它意味着逆推迟创建一个操作，直到你需要计算它的时候。举个例子，这是一个正常的加载：当你组装图形时创建操作z。

x = tf.Variable(10, name='x')
y = tf.Variable(20, name='y')
z = tf.add(x, y)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(10):
sess.run(z)
writer.close()

这是当有的人自认聪明决定使用延迟加载来保存一行数据会发生什么的代码：

x = tf.Variable(10, name='x')
y = tf.Variable(20, name='y')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(10):
sess.run(tf.add(x, y)) # create the op add only when you need to compute it
writer.close()

让我们看看他们在TensorBoard上的图表。正常加载图看起来像我们的预期。

延迟加载：

那么，节点“添加”消失了，这是可以理解的，因为我们在建立图之后运行时才添加的节点add（），这使得它更难读取图，但它不是一个bug。
那么，什么是大事？
让我们来看看图形定义。记住，要打印图形定义，我们使用：

print tf.get_default_graph().as_graph_def()

在正常加载的图中的protobuf只有1个节点“Add”：

node {
 name: "Add"
 op: "Add"
 input: "x/read"
 input: "y/read"
 attr {
 key: "T"
 value {
 type: DT_INT32
 }
 }
}

另一方面，用于延迟加载的图的protobuf是有10个节点“Add”的。每次你想要计算z它都会添加一个新的节点“Add”！

node {
 name: "Add"
 op: "Add"
 ...
}
node {
 name: "Add_9"
 op: "Add"
 ...
}

你可能会想：“这很蠢。为什么我要计算相同的值超过一次？”并且认为这是一个没有人会提交的错误。它发生的频率会比你认为的多。例如，你可能想要计算相同的损失函数或进行一些一定数量的训练样本的预测。在你知道之前，你已经计算了数千的时间，并为您的图形添加了数千个不必要的节点。您的图形定义变得复杂冗长，载入缓慢并且运行一次要很久。

有两种方法可以避免这个bug。一种是避免使用延时加载，但是当你把相关节点组合分类的时候是没办法避免的，所以你可以利用python的特性确保你的函数在第一次调用的时候只加载一次，具体做法参考Danijar Hafner的博文。

Lecture note3：TensorFlow上的线性回归和逻辑回归

在前两个讲座中，我们已经学到了很多。这两个讲座涉及到了一些关键的概念性知识。如果这篇笔记中的某些知识你并不了解，可以回到前两个讲座中回顾，确保已经掌握了讲座中涉及到的知识如下：

图表和会话
TF ops:常量，变量，函数
TensorBoard
延迟加载

我们已经掌握了TensorFlow的基本原理。是的，我们就是这么快！现在让我们把这些东西放在一起看看可以做些什么。

1 TensorFlow的线性回归

我们从一个简单的线性回归实例开始。我希望你们都已经熟悉了线性回归。如果没有，你可以阅读

问题：我们经常听说保险公司使用例如一个社区的火灾和盗贼去衡量一个社区的安全程度，我的问题是，这是不是多余的，火灾和盗贼在一个社区里是否是相关的，如果相关，那么我们能不能找到他们的关系。

换句话说，我们能不能找到一个函数f，如果X是火灾数并且Y是盗贼数，是否存在Y=f(X)？

给出这个关系，如果我们有特定社区的火灾数我们能不能预测这一区域的盗贼数。

我们有美国民权委员会收集的数据集

数据集描述如下：

名称：芝加哥的火灾和盗贼
X=每1000住房单元的火灾数
Y=每1000人口的盗贼数

每对数据取自地区码不同的芝加哥的42个区域

解决办法：首先假设火灾数和盗贼数成线性关系Y=wX+b

我们需要找到参数w和b，通过平方差误差作为损失函数，写出如下程序：

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import xlrd

DATA_FILE = "data/fire_theft.xls"

# Step 1: read in data from the .xls file
book = xlrd.open_workbook(DATA_FILE, encoding_override="utf-8")
sheet = book.sheet_by_index(0)
data = np.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1

# Step 2: create placeholders for input X (number of fire) and label Y (number of
theft)
X = tf.placeholder(tf.float32, name="X")
Y = tf.placeholder(tf.float32, name="Y")

# Step 3: create weight and bias, initialized to 0
w = tf.Variable(0.0, name="weights")
b = tf.Variable(0.0, name="bias")

# Step 4: construct model to predict Y (number of theft) from the number of fire
Y_predicted = X * w + b

# Step 5: use the square error as the loss function
loss = tf.square(Y - Y_predicted, name="loss")

# Step 6: using gradient descent with learning rate of 0.01 to minimize loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)

with tf.Session() as sess:
    # Step 7: initialize the necessary variables, in this case, w and b
    sess.run(tf.global_variables_initializer())
    # Step 8: train the model
    for i in range(100): # run 100 epochs
        for x, y in data:
            # Session runs train_op to minimize loss
                sess.run(optimizer, feed_dict={X: x, Y:y})
    # Step 9: output the values of w and b
    w_value, b_value = sess.run([w, b])

训练100遍后我们得到了平均平方误差1372.77701716，w=1.62071，b=16.9162，误差太大了。

拟合情况并不好，如果使用二次函数 Y=wXX+uX+b 会不会更好呢？
试一下，我们只需要添加另外一个变量 u ，并且修改Y_predicted的公式。

# Step 3: create variables: weights_1, weights_2, bias. All are initialized to 0
w = tf.Variable(0.0, name="weights_1")
u = tf.Variable(0.0, name="weights_2")
b = tf.Variable(0.0, name="bias")
# Step 4: predict Y (number of theft) from the number of fire
Y_predicted = X * X * w + X * u + b
# Step 5: Profit!

10遍训练后，我们得到了平方平均损797.335975976。w,u,b=[0.0713430.010234 0.00143057]

这比线性函数收敛的时间更短，但由于右边的几个异常值，仍然没有很好的拟合。使用Huber 损失代替MSE或者三次函数作为拟合函数可能会更好，你可以自己试一试。

使用Huber损失二次模型，忽略异常值的情况下我得到的结果更好：

那么，我怎么知道我的模型是正确的？

1 使用相关系数R-squared

如果你不知道R-squared是什么，可以看Minitab的博客

下面是R-squared的要点：
R-squared是一种测量数据与拟合回归线的接近程度的统计上的测量方法。

它也被称为拟合优度，或者是多元回归的多元拟合优度。

R-squared 的解释为合理且简单的：它是反应变量与线性模型变化的比。R-squared=解释变异差（Explained variation）/ 总变分（Total variation）

2 在测试集上运行

我们在机器学习课上学到了所有这些都归结于验证和测试。所以第一种方法显然是在测试集上测试我们的模型。

有独立的数据集训练，验证和测试是很好的，但这意味着我们将有更少的训练数据。有很多文献可以帮助我们解决我们没有大量数据的情况，例如k-fold交叉验证。

3 用伪造数据测试我们的模型

另一种方式是我们可以测试它的伪造数据。例如，在这种情况下，我们可以伪造一些呈已知线性关系的数据去测试模型。让我们伪造100个数据点(X，Y)，使得Y~3 * X，看看我们的模型输出是否是w = 3，b = 0。

生成伪造数据：

# each value y is approximately linear but with some random noise
X_input = np.linspace(-1, 1, 100)
Y_input = X_input * 3 + np.random.randn(X_input.shape[0]) * 0.5

我们使用numpy数组为 X_input 和 Y_input 以支持之后的迭代（当我们输入占位符为X和Y时)。

它拟合的很好！

启示：伪造的数据比真实的数据更容易处理，因为生成的数据更加匹配我们的模型假设。真实的数据拟合这么好是比较困难的！

分析代码：

我们的模型代码非常简单易懂，除了两行：

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(loss)
sess.run(optimizer, feed_dict={X: x, Y:y})

我记得第一次遇到了类似于这样的代码很困惑，两个问题：

1 为什么train_op在tf.Session.run()的提取列表中？
2 TensorFlow如何知道要更新哪些变量？

我们实际上可以用tf.session.run()传递feed_dict给任何tf节点。TensorFlow将执行这些操作依赖的图部分。在这种情况下，我们看到train_op的目的是尽可能减少损失，而损失取决于变量w和b。

从图中可以看出，巨型节点的梯度下降优化器取决于3个节点：权重，偏差和下降（全部自动处理）。

优化器

GradientDescentOptimizer 意味着我们的更新规则是梯度下降法。TensorFlow自动完成：更新w和b的值以最小化损失。自动求梯度令人震惊！

默认情况下，优化程序会训练所有可以实现目标函数的可修改变量。如果有不想要训练的变量，则可以在声明变量时将关键字trainable设置为False。比如一个常见的不想训练的变量global_step，这个变量可以在许多TensorFlow模型中看到，它以跟踪你运行模型的次数。

global_step = tf.Variable(0, trainable=False, dtype=tf.int32)
learning_rate = 0.01 * 0.99 ** tf.cast(global_step, tf.float32)
increment_step = global_step.assign_add(1)
optimizer = tf.GradientDescentOptimizer(learning_rate) # learning rate can be a tensor

类tf.variable的完整定义：

tf.Variable(initial_value=None, trainable=True, collections=None,
validate_shape=True, caching_device=None, name=None, variable_def=None, dtype=None,
expected_shape=None, import_scope=None)

你还可以用优化器计算特定变量的梯度，还可以自己修改优化器计算的梯度。

# create an optimizer.
optimizer = GradientDescentOptimizer(learning_rate=0.1)
# compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars is a list of tuples (gradient, variable). Do whatever you
# need to the 'gradient' part, for example, subtract each of them by 1.
subtracted_grads_and_vars = [(gv[0] - 1.0, gv[1]) for gv in grads_and_vars]
# ask the optimizer to apply the subtracted gradients.
optimizer.apply_gradients(subtracted_grads_and_vars)

更多计算梯度的方法：

优化器自动计算图中节点的导数，但是新优化器的创建者或者熟练地使用者使用如下的更低级的函数：

tf.gradients(ys, xs, grad_ys=None, name='gradients',
colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)

这个方法构造当y为ys，和x为xs时各自的偏导列表。ys和xs都是张量列表。grad_ys是Tensor的列表，保存y所接收的梯度。该列表必须与ys的长度相同。

技术细节：这仅在训练模型的一部分时特别有用。例如，我们可以使用tf.gradients()来获取中间层损失的导数G。然后我们使用优化器来最小化中间层输出M和M + G之间的差。这只会更新网络的下半部分。

optimizer列表:

梯度下降优化器不是TensorFlow支持的唯一更新规则。下面是TensorFlow支持的优化器列表，截至1/20/2017。这些名字是不言自明的。您可以访问官方文档了解更多：

tf.train.GradientDescentOptimizer
tf.train.AdadeltaOptimizer
tf.train.AdagradDAOptimizer
tf.train.MomentumOptimizer
tf.train.AdamOptimizer
tf.train.FtrlOptimizer
tf.train.ProximalGradientDescentOptimizer
tf.train.ProximalAdagradOptimizer
tf.train.RMSPropOptimizer

Sebastian Ruder在他的博客
做了这些optimizer的比较，如果你懒得读他的博客，这有一份总结：

“RMSprop是Adagrad的扩展，它解决了学习率急剧下降的问题。它与Adadelta相同，除了Adadelta法在参数更新规则中使用的RMS法的numerator规则。ADAM终于向RMSprop增加了偏差校正和动量。RMSprop，Adadelta和Adam是非常相似的算法，在类似的情况下表现良好。Kingma等人证明，随着梯度变得越来越少，偏差校正方面ADAM略微优于RMSprop，从而达到最终的优化。ADAM可能是最好的整体选择。”

总结：使用adamOptimizer

讨论一个问题：

我们可以使用线性回归解决现实生活中的哪些问题？你可以写一个程序快速实现吗？

2 Tensorflow实现的逻辑回归

提到线性回归就不能不提到逻辑回归，我们可以使用逻辑回归解决一个很老的问题：MNIST数据集下的分类问题。

MNIST数据库（混合国家标准与技术研究所数据库）可能是含数据最多的数据库之一。用于训练各种图像处理系统的流行数据库。它是一个手写数字的数据库。图像看起来像这样：

每个图片28*28像素，拉伸为1维张量，长度为784，每一个都有一个标签，比如第一行标签为0，第二行为1……以此类推。数据集在此网站

TFLearn（tf的一个简单接口）有一个让你可以从Yan Lecun个人网站加载MNIST数据集的脚本，并且把它分为训练集，验证集和测试集。

from tensorflow.examples.tutorials.mnist import input_data
MNIST = input_data.read_data_sets("/data/mnist", one_hot=True)

One-hot encoding
In digital circuits, one-hot refers to a group of bits among which the legal combinations of
values are only those with a single high (1) bit and all the others low (0).
In this case, one-hot encoding means that if the output of the image is the digit 7, then the
output will be encoded as a vector of 10 elements with all elements being 0, except for the
element at index 7 which is 1.

MNIST是一个TensorFlow的数据集对象。它有55,000个数据点的训练数据（MNIST.train），10,000个测试数据（MNIST.test）和5,000个数据点的验证数据(MNIST.validation)。

逻辑回归模型的建立与线性回归模型非常相似。但是，现在我们有更多的数据。我们在CS229中了解到，如果我们在每个数据点之后计算梯度，那么它会很慢。解决这个问题的一个方法是批量处理。幸运的是，TensorFlow能很好的批处理数据。

要进行批量逻辑回归，我们只需要更改X_placeholder和Y_placeholder的维度，以便能够容纳batch_size数据点。

X = tf.placeholder(tf.float32, [batch_size, 784], name="image")
Y = tf.placeholder(tf.float32, [batch_size, 10], name="label")

当你将数据提供给占位符时，不是为每个数据点提供数据，我们可以提供batch_size数据点数。

X_batch, Y_batch = mnist.test.next_batch(batch_size)
sess.run(train_op, feed_dict={X: X_batch, Y:Y_batch})

这里有完整的实现：

import time
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# Step 1: Read in data
# using TF Learn's built in function to load MNIST data to the folder data/mnist
MNIST = input_data.read_data_sets("/data/mnist", one_hot=True)
# Step 2: Define parameters for the model
learning_rate = 0.01
batch_size = 128
n_epochs = 25
# Step 3: create placeholders for features and labels
# each image in the MNIST data is of shape 28*28 = 784
# therefore, each image is represented with a 1x784 tensor
# there are 10 classes for each image, corresponding to digits 0 - 9.
# each label is one hot vector.
X = tf.placeholder(tf.float32, [batch_size, 784])
Y = tf.placeholder(tf.float32, [batch_size, 10])
# Step 4: create weights and bias
# w is initialized to random variables with mean of 0, stddev of 0.01
# b is initialized to 0
# shape of w depends on the dimension of X and Y so that Y = tf.matmul(X, w)
# shape of b depends on Y
w = tf.Variable(tf.random_normal(shape=[784, 10], stddev=0.01), name="weights")
b = tf.Variable(tf.zeros([1, 10]), name="bias")
# Step 5: predict Y from X and w, b
# the model that returns probability distribution of possible label of the image
# through the softmax layer
# a batch_size x 10 tensor that represents the possibility of the digits
logits = tf.matmul(X, w) + b
# Step 6: define loss function
# use softmax cross entropy with logits as the loss function
# compute mean cross entropy, softmax is applied internally
entropy = tf.nn.softmax_cross_entropy_with_logits(logits, Y)
loss = tf.reduce_mean(entropy) # computes the mean over examples in the batch
# Step 7: define training op
# using gradient descent with learning rate of 0.01 to minimize cost
optimizer =
tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
n_batches = int(MNIST.train.num_examples/batch_size)
for i in range(n_epochs): # train the model n_epochs times
for _ in range(n_batches):
X_batch, Y_batch = MNIST.train.next_batch(batch_size)
sess.run([optimizer, loss], feed_dict={X: X_batch, Y:Y_batch})
# average loss should be around 0.35 after 25 epochs

在我的Mac上运行时，用批处理的批量大小为128的模型运行一次时间在0.5秒内，而非批处理模型运行的时间则需要24秒!但是，请注意，更大的batch需要更小的epoch，因为它的更新步骤变少，详细论述

我们可以测试模型，因为我们有一个测试集。让我们看看TensorFlow如何做到这一点：

# test the model
n_batches = int(MNIST.test.num_examples/batch_size)
total_correct_preds = 0
for i in range(n_batches):
    X_batch, Y_batch = MNIST.test.next_batch(batch_size)
    _, loss_batch, logits_batch = sess.run([optimizer, loss, logits],
feed_dict={X: X_batch, Y:Y_batch})
    preds = tf.nn.softmax(logits_batch)
    correct_preds = tf.equal(tf.argmax(preds, 1), tf.argmax(Y_batch, 1))
    accuracy = tf.reduce_sum(tf.cast(correct_preds, tf.float32)) # similar
to numpy.count_nonzero(boolarray) :(
    total_correct_preds += sess.run(accuracy)
print "Accuracy {0}".format(total_correct_preds/MNIST.test.num_examples)

我们在10个迭代之后达到90％的准确度。我们可以从线性分类器中得到：

注意：TensorFlow具有用于MNIST的feeder（数据集解析器），但不指望具有任何数据。你应该学习如何编写自己的数据解析器。

graph在tensorboard中：