tensorflow笔记
参考教程: 幕课: 人工智能实践-北京大学,网易云课堂: 吴恩达机器学习
一. 概述
(一) 概念
-
图灵测试: 提问者和回答者隔开,提问这随机向机器提问,如果有超过30%的人认为回答者是人而不是机器,则算法通过图灵测试
-
感知机(preceptron):单层神经网络,无法计算抑或逻辑
-
BP: 反向传播算法
-
SVM: 支持向量机
- 免去神经网络參數的不足
- 避免局部最优
-
DBN: 深层神经网络
-
CNN: 卷积神经网络
-
人工智能: 机器模拟人的意识和思維
-
机器学习: 在任务T上,随经验E的增加,效果P随之增加,则这个程序可以从经验中进行学习
- 三要素:
- 数据
- 算法
- 算力
- 三要素:
-
机器学习的过程:
- 单个神经元模型:
二. python语法串讲
(一) linux一些指令
pwd
:当前所在目录
(以pwd打印的是以根目录为起点的绝对路径)ls
:打印当前路径下的文件和目录mkdir newName
: 在当前路径新建一个叫newName的文件夹cd name
: 进入name文件夾sudo rm -r filename
:强制刪除文件夹并提示sudo rm -rf filename
:强制刪除文件夹但不提示
ubuntu vim:
vim filename.py
打开或新建名为filename的文本python filename.py
运行名为filename的python文件[esc] :q
退出vim[esc] :wq
保存并退出[esc] :q!
不保存并退出
(二) pyhton基础语法
1. 基础
\
:转义字符, 如 \t 表示tab
%
:占位符,在对应的位置用%后所表示的变量替換%
2. 列表 $ list[num] $
列表名[起,止]
前闭后开列表名[ : ]
访问所有列表名[起 : 止 : 步长]
从起点开始每隔步长个元素取一个元素,注意步长带方向;止可省略不写列表名[索引号] = 新值
修改del 列表名[索引号]
刪除列表名.insert(插入位置索引号,新元素)
插入
3. 元組 $ tuple(num) $
- 元組一旦定义,不可改变
4. 字典 dic{鍵1:值1, 鍵2:值2, ...}
dic[鍵x] = 值x
索引
- exp:
dic = {1:"123", "name":"Mike", "height":178}
- 索引:
dic["name"] 表示"Mike"
dic[鍵i] = 新值i
修改del dic[鍵i]
刪除dic[鍵i] = 新值i
插入
5. 条件
(1)
if 条件成立 :
dosomething
(2)
if 条件1成立 :
执行任务1
else :
执行任务2
(3)
if 条件1成立 :
执行任务1
elif 条件2成立 :
执行任务2
.
.
.
else :
执行任务n
- 注意:
- python用左对齐表示代码层次
- 报错:
SyntaxError: Non-ASCII character '\xe8' in file a.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for detail
原因及解決: 中文无法编码,解決方式:在.py文件第一行加入#coding:utf-8
6. 循环
(1)
for 变量 in range(start,end) :
dosomething
(2)
for 变量 in 列表名 :
dosomething
(3)
while 条件 :
dosomething
(4) 終止循环用break
- 一個例子:
code:
for i in range(0,5) :
print "i am counting %s" %i
結果:
i am counting 0
i am counting 1
i am counting 2
i am counting 3
i am counting 4
7. 函數
(1) 定义函數:
def 函數名 (參數表) :
函數体
(2) 使用函數:
函數名 (參數表)
(3) 內建函數: python解釋器自帶的函數
如:abs(num) #絕對值函數
8. 模块
模块是函數的集合,先导入,再使用
import time
time.asctime() #輸出當前時間
9. 包
包含多個模塊
from PIL import Image #从PIL这个包导入Image模块
10. 类 对象 实例化
- 类: 函數的集合,可实例化出对象的模具
- 实例化: 对象 = 类()
- 对象: 实例化出的个体, 实实在在完成具体工作
- 面向對象: 程序員反復修改优化后,实例化對象,對象調用类函數执行具体操作
类的定义:
class 类名 (父类名) :
具体函數
(1) 类里定义參數必須是self 函数时,语法规定第一个参数必须是self
例如:
class Animal:
def breath(self):
print "breathing"
(2) __init__
函數,在新对象实例化时会自动运行,用于给新对象赋初值
例如:
class Cats(Animal):
def __int__(self, spots):
self.spots = spots
def catch_mouse(self):
print "catch mouse"
实例化:
kitty = Cat(10)
print kitty.spots # 10
kitty.catch_mouse() # catch mouse
(3) 对象調用类的函數,用对象名.函數名()
对象調用类的变量,用对象名.变量名
(4) 类内定义函數時,如調用自身或父类的函數或变量,需用self.
引导,写为self.函數名
或self.变量名
- 一個实例
animal.py
class Animals():
def breath(self):
printf "breathing"
def move(self):
printf "moving"
def eat(self):
printf "eating food"
class Mammals(Animals):
def breastfeed(self):
printf "feeding young"
class Cats(Mammals):
def __init__(self, spots):
self.spots = spots
def catch_mouse(self):
print "catching mouse"
def left_foot_forward(self):
printf "left foot forward"
def left_foot_backward(self):
printf "left foot backward"
def dance(self):
self.left_foot_forward()
self.left_foot_backward()
self.left_foot_backward()
self.left_foot_forward()
kitty = Cats(10)
print kitty.spots
kitty.dance()
kitty.breastfeed()
kitty.move()
运行結果:
10
left foot forward
left foot backward
left foot backward
left foot forward
feeding young
moving
11. 文件
(1) 写: 开->存->关
import pickle #引入pickle包
文件變量 = open("文件路徑文件名(如save.dat)", "wb") #开
pickle.dump(待寫入的变量, 文件变量) #存
文件變量.close() #关
例如:
待写入的变量(数据)
game_data = {
"position":"N2 E3"
"pocket":["key","knife"]
"money":160
}
写入save.dat文件:
save_flie = open("save.dat","wb")
pickle.dump(game_data, save_file)
save_file.close()
(2) 读: 开->取->关
import pickle
文件变量 = open("文件路徑文件名","rb") #开
放內容的变量 = pickle.load(文件变量) #取
文件变量.close() #关
例如:
load_file = open("save.dat","rb")
load_game_data = pickle.load(load_file)
load_file.close()
补充
- python中虽然沒有访问控制关键字(如C++的private等),但在python編輯器中对访問控制有一定約束
(1) 单下划线 = protected,只允許本身与子类訪問,如:_foo
(2) 双下划线 = private,如:__foo
(3) 头尾双下划线 = 特別方法,如:__init__()
三. 一些模块
(一) turtle模块
1. 一些问题和解決方法
No module named _tkinter
- python2.7 :
sudo get-apt install python-tk
- python3 :
sudo apt-get install python3-tk
- python2.7 :
2. 基本操作
import turtle
导入turtle模块
$ t = turtle.Pen() $ 用Pen类实例化一個叫t的对象
$ t.forward(n) $ 让t向前走n個像素点
$ t.backward(n) $ 让t后退n個像素點
$ t.left(n) $ 让t左转n度
$ t.right(n) $ 让t右转n度
$ t.reset() $ 让t复位
(二) matplotlib模块
1. 引入
sudo pip install matplotlib
2. 功能
- 实现图形可视化
3. 操作
//引入模块
import matplotlib.pyplot as plt
//可视化数据点
plt.scatter(x坐标,y坐标,c="颜色")
plt.show()
//可视化坐标轴,形成网格坐标点
xx, yy = np.mgrid[起:止:步长, 起:止:步长]
//x,y坐标拉直(分别成一行),形成矩阵,收集区域内所有网格坐标点
grid=np.c_[xx.ravel(), yy.ravel()]
//将收集到的坐标点计算后赋值给probs(坐标点偏红或偏蓝的量化值),喂入神经网络
probs=sess.run(y, feed_dict={x:grid})
probs=probs.reshape(xx.shape)
//描色并图形化
plt.contour(x轴坐标值,y轴坐标值,该店的高度,levels=[等高线的高度])
四. Tensorflow框架
(一) 张量 计算图 会话
- 基于tensorflow的NN: 用张量表示数据,用计算图搭建神经网络,用会话执行计算图,优化线上的权重(参数),得到模型
- 张量(tensor): 多维数组(列表)
- 张量可以表示0阶到n阶的数据(几个中扩就几阶)
- 阶: 张量的维数
- 数据类型:
tf.float32
32位浮点数;tf.int32
32位整形数tf.constant(常数)
定义常数
- 计算图(graph): 搭建神经网络的计算过程,只搭建,不运算. 承载一个或多个计算节点(神经元)
$ Y = XW = {x}{1}*{w}{1} + {x}{2}*{w}{2} $
- 一个例子: 实现两个常数的加法
新建文件 tf3_1.py
import tensorflow as tf a = tf.constant([[1.0,2.0]]) b = tf.constant([[3.0],[4.0]]) y = tf.matmul(a, b) print y
运行结果
Tensorflow("matmul:0", shape=(1,1), dtype=float32)
分析:
add:0
: result是一个名叫 a d d : 0 add:0 add:0的张> 量- s h a p e ( x 1 , x 2 , x 3 , . . . ) shape({x}_{1},{x}_{2},{x}_{3},...) shape(x1,x2,x3,...): 张量的维度,有几个 x n {x}_{n} xn就是几维张量, x i {x}_{i} xi的数值是对应数组的长度
dtype
: 数据类型
- 会话(Session): 执行计算图中的节点运算
-
with tf.Session() as sess: print sess.run(需要计算的节点变量)
-
- 一个例子
新建文件 tf3_2.py
import tensorflow as tf x = tf.constant([[1.0, 2.0]]) w = tf.constant([[3.0],[4.0]]) y = tf.matmul(x, w) print y with tf.Session() as sess: print sess.run(y)
运行结果
Tensor("Matmul:0", shape=(1, 1), dtype=float32) [[11.]]
注意:可能warning如下:
Tensor("MatMul:0", shape=(1, 1), dtype=float32)
2019-07-13 17:57:09.100298: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2019-07-13 17:57:09.100369: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2019-07-13 17:57:09.100393: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2019-07-13 17:57:09.100412: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2019-07-13 17:57:09.100443: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[[11.]]
这是因为电脑支持一些可以加速的指令,但是运行代码的时候并没有启动这些指令.可以用如下方法屏蔽这些提示:
(1) xxx@aaa:~/tf$ vim ~/.bashrc
进入主目录下的bashrc文件
(2) 最后一行添加export TF_CPP_MIN_LOG_LEVEL=2
把tensorflow的提示等级降低,保存并退出
(3) source ~/.bashrc
刚才的配置文件生效
(二) 前向传播
- 参数: 即权重 w i {w}_{i} wi,用变量表示,一般随机给初值
w = tf.Variable(tf.random_normal([2,3], stddev=2, mean=0, seed=1))
tf.random_normal()
生成正态分布的随机数tf.truncated_normal()
去掉过大偏离点的正态分布随机数tf.random_normal()
平均分布的随机数tf.random_normal([2,3])
产生[2,3]矩阵stddev = 2
标准差为2mean = 0
均值为0seed = 1
随机种子,去掉每次生成的结果不一样- 标准差,均值,随机种子没有特殊要求可以不写
tf.zeros
生成全0数组,如tf.zeros([3,2], int32)
tf.ones
全1数组, 如tf.ones([3,2],int32)
tf.fill
全定制数组,如tf.fill([3,3], int32)
tf.constant
直接给值,如tf.constant([3,2,1])
表示直接生成 [ 3 , 2 , 1 ] [3,2,1] [3,2,1]
- 神经网络实现过程:
训
练
过
程
{
1
准
备
数
据
集
,
提
取
特
征
,
作
为
输
入
喂
给
神
经
网
络
2
前
向
传
播
:
搭
建
N
N
结
构
,
从
输
入
到
输
出
(
先
搭
建
计
算
图
,
再
用
会
话
执
行
)
N
N
传
播
算
法
=
=
>
计
算
输
出
3
反
向
传
播
:
大
量
特
征
数
据
喂
给
N
N
,
迭
代
优
化
N
N
参
数
N
N
反
向
传
播
算
法
=
=
>
优
化
参
数
训
练
模
型
训练过程 \begin{cases} 1 准备数据集,提取特征,作为输入喂给神经网络\\ 2 前向传播:搭建NN结构,从输入到输出(先搭建计算图, 再用会话执行)\\ \qquad NN传播算法 ==> 计算输出\\ 3 反向传播:大量特征数据喂给NN,迭代优化NN参数\\ \qquad NN反向传播算法 ==> 优化参数训练模型\\ \end{cases}
训练过程⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧1准备数据集,提取特征,作为输入喂给神经网络2前向传播:搭建NN结构,从输入到输出(先搭建计算图,再用会话执行)NN传播算法==>计算输出3反向传播:大量特征数据喂给NN,迭代优化NN参数NN反向传播算法==>优化参数训练模型
$ 使用过程\quad4使用训练好的模型预测和分类 $
- 前向传播: 搭建模型,实现推理
- 一个全连接网络的例子
生产一批零件将体积 x 1 {x}_{1} x1和重量 x 2 {x}_{2} x2作为特征输入NN,通过NN后输出一个数值
- tensorflow描述计算过程:
(1) X是输入为 1 × 2 1\times2 1×2的矩阵: W 前 节 点 编 号 , 元 节 点 编 号 ( 层 数 ) {W}_{前节点编号,元节点编号}^{(层数)} W前节点编号,元节点编号(层数)为待优化的参数
(2) W ( 1 ) = [ w 1.1 ( 1 ) w 1.2 ( 1 ) w 1.3 ( 1 ) w 2.1 ( 1 ) w 2.2 ( 1 ) w 2.3 ( 1 ) ] 为 2 × 3 矩 阵 W^{(1)} = \left[ \begin{matrix} {w}_{1.1}^{(1)} & {w}_{1.2}^{(1)} & {w}_{1.3}^{(1)} \\ {w}_{2.1}^{(1)} & {w}_{2.2}^{(1)} & {w}_{2.3}^{(1)} \end{matrix} \right]为2\times3矩阵 W(1)=[w1.1(1)w2.1(1)w1.2(1)w2.2(1)w1.3(1)w2.3(1)]为2×3矩阵
-
(3) a ( 1 ) = [ a 11 , a 12 , a 13 ] 为 1 × 3 a^{(1)} = [a_{11}, a_{12}, a_{13}]为1\times3 a(1)=[a11,a12,a13]为1×3矩阵 =XW^{(1)}$
(4) W ( 2 ) = [ w 1.1 ( 2 ) w 2.1 ( 2 ) w 3.1 ( 2 ) ] 为 3 × 1 矩 阵 W^{(2)} = \left[ \begin{matrix} {w}_{1.1}^{(2)} \\ {w}_{2.1}^{(2)} \\ {w}_{3.1}^{(2)} \end{matrix} \right] 为3\times1矩阵 W(2)=⎣⎢⎡w1.1(2)w2.1(2)w3.1(2)⎦⎥⎤为3×1矩阵
- (5) y = a ( 1 ) W ( 2 ) y = a^{(1)}W^{(2)} y=a(1)W(2)
用两个式子表示:
a = t f . m a t m u l ( X , W 1 ) a = tf.matmul(X, W1) a=tf.matmul(X,W1)
y = t f . m a t m u l ( a , W 2 ) y = tf.matmul(a, W2) y=tf.matmul(a,W2)
a: 第一个计算层(第一次层网络)
W ( 1 ) W^{(1)} W(1): 第一层参数
- 变量初始化,计算图节点运算都要用会话实现:
with tf.Session() as sess:
sess.run()
- 变量初始化: 在
sess.run
函数中用tf.global_variables_initializer()
init_op=tf.global_variables_initializer()
sess.run(init_op)
- 计算图节点运算: 在
sess.run
函数中写入待运算的节点
sess.run(y)
- 用
tf.placeholder
占位, 在sess.run()
函数中用feed_dic
喂数据
//喂一组数据:
x = tf.placeholder(tf.float32, shape = (1,2))
sess.run(y,feed_dict={x:[[0.5,0.6]]})
//喂多组数据:
x = tf.placeholder(tf.float32, shape = (None,2))
sess.run(y,feed_dict={x:[[0.1,0.2],[0.3,0.4],[0.4,0.5]]})
注意:shape = (x,y)
中,x表示喂入神经网络的数据的组数,None表示不知道几组数据;y表示数组的特征个数
- 一个例子
1 #coding:utf-8
2 #两层简单神经网络(全连接)
3 import tensorflow as tf
4
5 #定义输入和参数
6 #用placeholder定义输入
7 x = tf.placeholder(tf.float32, shape=(None, 2))
8 w1= tf.Variable(tf.random_normal([2,3], stddev=1, seed=1))
9 w2= tf.Variable(tf.random_normal([3,1], stddev=1, seed=1))
10
11 #定义前向传输过程
12 a = tf.matmul(x, w1)
13 y = tf.matmul(a, w2)
14
15 #调用会话计算结果
16 with tf.Session() as sess:
17 init_op = tf.global_variables_initializer()
18 sess.run(init_op)
19 print"the result of y is:\n",sess.run(y, feed_dict={x: [[0.7,0.5],[0.2,0 .3],[0.3,0.4],[0.4,0.5]]})
20 print "w1:\n", sess.run(w1)
21 print "w2:\n", sess.run(w2)
运行结果:
the result of y is:
[[3.0904665]
[1.2236414]
[1.7270732]
[2.2305048]]
w1:
[[-0.8113182 1.4845988 0.06532937]
[-2.4427042 0.0992484 0.5912243 ]]
w2:
[[-0.8113182 ]
[ 1.4845988 ]
[ 0.06532937]]
(三) 反向传播
- 反向传播: 训练模型参数,在所有参数上用梯度下降,使NN模型在训练数据上的损失函数最小.
- 损失函数(loss): 预测值( y y y)与已知答案( y − y_- y−)的差距
- 均方误差 MSE: $ MSE(y_- - y) = \sum_{i=1}^n\frac{{(y - y_-)}^2}{n}$
loss = tf.reduce_mean(tf.square(y_ - y))
- 反向传播训练方法: 以减小loss值为优化目标
train_step = tf.train.GradientDescentOptimizer(learing_rate).minimize(loss)#梯度下降
train_step = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(loss)#Momentum优化器
train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)#Adam优化器
- 学习率: 决定参数每次更新的幅度(一般尽量选较小数如0.001,具体视情况而定)
- 一个例子:
描述: 有一批零件,有两个特征:体积和数量, 零件的标签:合格与否. 需要通过神经网络对零件实现预测和分类
1 #coding:utf-8
2 #0导入模块,生成模拟数据集。
3 import tensorflow as tf
4 import numpy as np#科学计算模块
5 BATCH_SIZE = 8#一次喂8组数据
6 SEED = 23455
7
8 #基于seed产生随机数
9 rdm = np.random.RandomState(SEED)
10 #随机数返回32行2列的矩阵 表示32组 体积和重量 作为输入数据集
11 X = rdm.rand(32,2)#32组,2列(体积,重量)
12 #从X这个32行2列的矩阵中 取出一行 判断如果和小于1 给Y赋值1(合格) 如果和不小于1 给Y赋值0(标签)
13 #作为输入数据集的标签(正确答案).因为没有数据集,所以虚拟了样本和标签
14 Y_ = [[int(x0 + x1 < 1)] for (x0, x1) in X]
15 print "X:\n",X
16 print "Y_:\n",Y_
17
18 #1定义神经网络的输入、参数和输出,定义前向传播过程。
19 x = tf.placeholder(tf.float32, shape=(None, 2))
20 y_= tf.placeholder(tf.float32, shape=(None, 1))
21
22 w1= tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
23 w2= tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))
24
25 a = tf.matmul(x, w1)
26 y = tf.matmul(a, w2)
27
28 #2定义损失函数及反向传播方法。
29 loss_mse = tf.reduce_mean(tf.square(y-y_))
30 train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)
31 #train_step = tf.train.MomentumOptimizer(0.001,0.9).minimize(loss_mse)
32 #train_step = tf.train.AdamOptimizer(0.001).minimize(loss_mse)
33
34 #3生成会话,训练STEPS轮
35 with tf.Session() as sess:
36 init_op = tf.global_variables_initializer()
37 sess.run(init_op)
38 # 输出目前(未经训练)的参数取值。
39 print "w1:\n", sess.run(w1)
40 print "w2:\n", sess.run(w2)
41 print "\n"
42
43 # 训练模型。
44 STEPS = 3000
45 for i in range(STEPS):
46 start = (i*BATCH_SIZE) % 32
47 end = start + BATCH_SIZE
48 sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})
49 if i % 500 == 0:
50 total_loss = sess.run(loss_mse, feed_dict={x: X, y_: Y_})
51 print("After %d training step(s), loss_mse on all data is %g" % (i, total_loss))
52
53 # 输出训练后的参数取值。
54 print "\n"
55 print "w1:\n", sess.run(w1)
56 print "w2:\n", sess.run(w2)
-
- 输出:
X:
[[0.83494319 0.11482951]
[0.66899751 0.46594987]
[0.60181666 0.58838408]
[0.31836656 0.20502072]
[0.87043944 0.02679395]
[0.41539811 0.43938369]
[0.68635684 0.24833404]
[0.97315228 0.68541849]
[0.03081617 0.89479913]
[0.24665715 0.28584862]
[0.31375667 0.47718349]
[0.56689254 0.77079148]
[0.7321604 0.35828963]
[0.15724842 0.94294584]
[0.34933722 0.84634483]
[0.50304053 0.81299619]
[0.23869886 0.9895604 ]
[0.4636501 0.32531094]
[0.36510487 0.97365522]
[0.73350238 0.83833013]
[0.61810158 0.12580353]
[0.59274817 0.18779828]
[0.87150299 0.34679501]
[0.25883219 0.50002932]
[0.75690948 0.83429824]
[0.29316649 0.05646578]
[0.10409134 0.88235166]
[0.06727785 0.57784761]
[0.38492705 0.48384792]
[0.69234428 0.19687348]
[0.42783492 0.73416985]
[0.09696069 0.04883936]]
Y_:
[[1], [0], [0], [1], [1], [1], [1], [0], [1], [1], [1], [0], [0], [0], [0], [0], [0], [1], [0], [0], [1], [1], [0], [1], [0], [1], [1], [1], [1], [1], [0], [1]]
w1:
[[-0.8113182 1.4845988 0.06532937]
[-2.4427042 0.0992484 0.5912243 ]]
w2:
[[-0.8113182 ]
[ 1.4845988 ]
[ 0.06532937]]
After 0 training step(s), loss_mse on all data is 5.13118
After 500 training step(s), loss_mse on all data is 0.429111
After 1000 training step(s), loss_mse on all data is 0.409789
After 1500 training step(s), loss_mse on all data is 0.399923
After 2000 training step(s), loss_mse on all data is 0.394146
After 2500 training step(s), loss_mse on all data is 0.390597
w1:
[[-0.7000663 0.9136318 0.08953571]
[-2.3402493 -0.14641267 0.58823055]]
w2:
[[-0.06024267]
[ 0.91956186]
[-0.0682071 ]]
(四)神经网络搭建八股: 准备, 前传, 反传, 迭代
- 准备:
import
常量定义
生成数据集
- 前向传播:
x =
y_ =
w1 =
w2 =
a =
y =
- 反向传播: 定义损失函数,反向传播方法
loss =
train_step =
- 生成会话, 训练STEPS轮
with tf.session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
STEPS = 3000#迭代次数
for i in range(STEPS):
start =
end =
sess.run(train_step, feed_dict)
- 注意:由于真实情况有大量数据,常用print打印出迭代过程参数的变化
五. 神经网络优化
(一) 损失函数
- 1943年McCulloch Pitts神经元模型
- f ( ∑ x i w i + b ) f(\sum{x_i}{w_i}+b) f(∑xiwi+b)
- f f f是激活函数(activation function)
- b b b是偏置项(bias)
- 激活函数
- relu:
f
(
x
)
=
m
a
x
(
x
,
0
)
=
{
0
x
<
=
0
x
x
>
=
0
f(x)=max(x,0)=\left\{ \begin{array}{rcl} 0 & x <=0\\ x & x>=0 \end{array} \right.
f(x)=max(x,0)={0xx<=0x>=0
tf.nn.relu()
- sigmoid:
f
(
x
)
=
1
1
+
e
−
x
f(x)=\frac{1}{1+e^{-x}}
f(x)=1+e−x1
tf.nn.sigmoid()
- tanh:
f
(
x
)
=
1
−
e
−
2
x
1
+
e
−
2
x
f(x)=\frac{1-e^{-2x}}{1+e^{-2x}}
f(x)=1+e−2x1−e−2x
tf.nn.tanh()
- NN复杂度: 多用NN层数和NN参数的个数表示
- 层 数 = 隐 藏 层 的 层 数 + 1 个 输 出 层 层数=隐藏层的层数+1个输出层 层数=隐藏层的层数+1个输出层
- 总 参 数 = 总 W + 总 b 总参数=总W+总b 总参数=总W+总b
- 损失函数loss,学习率learning_rate,华东平均ema,正则化regularization
- 损失函数(loss):预测值(y)与已知答案(y_)的差距
- N N 优 化 目 标 : l o s s 最 小 { 均 方 误 差 : m s e ( M e a n S q u a r e d E r r o r ) 自 定 义 交 叉 熵 : c e ( C r o s s E n t r o p y ) NN优化目标: loss最小 \begin{cases} 均方误差:mse(Mean Squared Error)\\ 自定义\\ 交叉熵:ce(Cross Entropy) \end{cases} NN优化目标:loss最小⎩⎪⎨⎪⎧均方误差:mse(MeanSquaredError)自定义交叉熵:ce(CrossEntropy)
- 损失函数(loss):预测值(y)与已知答案(y_)的差距
- 均方误差
m
s
e
mse
mse:
M
S
E
(
y
_
,
y
)
=
∑
i
=
1
n
(
y
−
y
_
)
2
n
MSE(y_{\_},y)=\frac{\sum_{i=1}^n (y-y_{\_})^2}{n}
MSE(y_,y)=n∑i=1n(y−y_)2
loss_mse=tf.reduce_mean(tf.square(y_-y))
一个例子: 预测酸奶日销量y. x1和x2是影响日销量的因素. (建模前应预先采集的数据有: 每日x1 x2和销量y_ (即已知答案, 最佳情况: 产量=销量) 拟造数据集X,Y_: y_=x1+x2; 噪声:-0.005~0.005; ) 拟合可以预测销量的函数
代码如下:
1 #coding:utf-8
2 #预测多或预测少的影响一样
3 #0导入模块,生成数据集
4 import tensorflow as tf
5 import numpy as np
6 BATCH_SIZE = 8
7 SEED = 23455
8
9 rdm = np.random.RandomState(SEED)
10 X = rdm.rand(32,2)
11 Y_ = [[x1 + x2 + rdm.rand()/10.0-0.05] for (x1, x2) in X]
12
13 #1定义神经网络的输入,参数和输出,定义前向传播过程
14 x = tf.placeholder(tf.float32, shape=(None, 2))
15 y_= tf.placeholder(tf.float32, shape=(None, 1))
16 w1= tf.Variable(tf.random_normal([2,1], stddev = 1, seed = 1))
17 y = tf.matmul(x, w1)
18
19 #2定义损失函数及反向传播方法
20 #定义损失函数为MSE,反向传播方法为梯度下降
21 loss_mse = tf.reduce_mean(tf.square(y_ - y))
22 train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)
23
24 #3生成会话,训练STEPS轮
25 with tf.Session() as sess:
26 init_op = tf.global_variables_initializer()
27 sess.run(init_op)
28 STEPS = 20000
29 for i in range(STEPS):
30 start = (i*BATCH_SIZE) % 32
31 end = (i*BATCH_SIZE) % 32 + BATCH_SIZE
32 sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})
33 if i % 500 == 0:
34 print "After %d traning steps, w1 is: " % (i)
35 print sess.run(w1), "\n"
36 print "Final w1 is :\n", sess.run(w1)
37
运行结果:
After 0 traning steps, w1 is:
[[-0.80974597]
[ 1.4852903 ]]
After 500 traning steps, w1 is:
[[-0.46074435]
[ 1.641878 ]]
After 1000 traning steps, w1 is:
[[-0.21939856]
[ 1.6984766 ]]
After 1500 traning steps, w1 is:
[[-0.04415595]
[ 1.7003176 ]]
After 2000 traning steps, w1 is:
[[0.08942621]
[1.673328 ]]
After 2500 traning steps, w1 is:
[[0.19583555]
[1.6322677 ]]
After 3000 traning steps, w1 is:
[[0.28375748]
[1.5854434 ]]
After 3500 traning steps, w1 is:
[[0.35848638]
[1.5374472 ]]
After 4000 traning steps, w1 is:
[[0.42332518]
[1.4907393 ]]
After 4500 traning steps, w1 is:
[[0.48040026]
[1.4465574 ]]
After 5000 traning steps, w1 is:
[[0.53113604]
[1.4054536 ]]
After 5500 traning steps, w1 is:
[[0.5765325]
[1.3675941]]
...
After 16000 traning steps, w1 is:
[[0.95107025]
[1.0415728 ]]
After 16500 traning steps, w1 is:
[[0.9560928]
[1.037164 ]]
After 17000 traning steps, w1 is:
[[0.96064115]
[1.0331714 ]]
After 17500 traning steps, w1 is:
[[0.96476096]
[1.0295546 ]]
After 18000 traning steps, w1 is:
[[0.9684917]
[1.0262802]]
After 18500 traning steps, w1 is:
[[0.9718707]
[1.0233142]]
After 19000 traning steps, w1 is:
[[0.974931 ]
[1.0206276]]
After 19500 traning steps, w1 is:
[[0.9777026]
[1.0181949]]
Final w1 is :
[[0.98019385]
[1.0159807 ]]
可以看到,随着迭代次数的增加,两个参数越来越趋进于1
拟合结果:
y
=
0.98
x
1
+
1.02
x
2
y = 0.98x_1 + 1.02x_2
y=0.98x1+1.02x2
(但是,由实际我们可知,预测商品销量,预测多了,损失成本;预测少了,损失利润. 所以接下来我们使用自定义损失函数)
- 自定义损失函数
若利润 ≠ \neq =成本,则mse产生的loss无法利益最大化
自定义损失函数 l o s s ( y _ , y ) = ∑ n f ( y _ , y ) loss(y_{\_},y)=\sum_n f(y_{\_},y) loss(y_,y)=∑nf(y_,y)
,其中,y_是来自数据集的标准答案,y是计算出的预测答案
f ( y _ , y ) = { P R O F I T ∗ ( y _ − y ) y < y _ 预 测 的 y 少 了 , 损 失 利 润 C O S T ∗ ( y − y _ ) y > = y _ 预 测 的 y 多 了 , 损 失 成 本 f(y_{\_},y)=\begin{cases} PROFIT*(y_{\_} - y) & y < y_{\_} & 预测的y少了,损失利润\\ COST*(y-y_{\_}) & y >= y_{\_} & 预测的y多了,损失成本 \end{cases} f(y_,y)={PROFIT∗(y_−y)COST∗(y−y_)y<y_y>=y_预测的y少了,损失利润预测的y多了,损失成本
loss = tf.reduce_sum(tf.where(tf.greater(y,y_),COST(y-y_), PROFIT(y_-y)))
如:预测酸奶销量,成本(COST)1元, 利润(PROFIT)9元.
预测小了损失利润9元,预测大了损失成本1元
预测少了损失大,希望函数往多了预测.
代码如下:
1 #coding:utf-8
2 import tensorflow as tf
3 import numpy as np
4 BATCH_SIZE = 8
5 SEED = 23455
6 COST = 1
7 PROFIT = 9
8
9 rdm = np.random.RandomState(SEED)
10 X = rdm.rand(32,2)
11 Y_= [[x1 + x2 + rdm.rand()/10.0-0.05] for (x1,x2) in X]
12
13 x = tf.placeholder(tf.float32, shape = (None,2))
14 y_= tf.placeholder(tf.float32, shape = (None,1))
15 w1= tf.Variable(tf.random_normal([2,1],stddev = 1, seed = 1))
16 y = tf.matmul(x, w1)
17
18 loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_)*COST, (y_ - y)*PRO FIT))
19 train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)
20
21 with tf.Session() as sess:
22 init_op = tf.global_variables_initializer()
23 sess.run(init_op)
24 STEPS = 20000
25 for i in range (STEPS):
26 start = (i*BATCH_SIZE) % 32
27 end = (i*BATCH_SIZE) % 32 + BATCH_SIZE
28 sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})
29 if i % 500 == 0:
30 print "After %d training steps, w1 is: " % (i)
31 print sess.run(w1), "\n"
32 print "Final w1 is :\n", sess.run(w1)
拟合结果如下:
Final w1 is :
[[1.020171 ]
[1.0425103]]
可见两个参数均大于1,模型在往多了预测
- 交叉熵ce(Cross Entropy): 表征两个概率分布之间的距离
H
(
y
_
,
y
)
=
−
∑
y
_
∗
l
o
g
y
H(y_{\_},y)=-\sum y_{\_}*logy
H(y_,y)=−∑y_∗logy
ce = -tf.reduce_mean(y_ *tf.log(tf.clip_by_value(y, 1e-12, 10.)))
- y小于 1 0 − 12 10^{-12} 10−12时为 1 0 − 12 10^{-12} 10−12,大于1.0时为1.0
- softmax():
- 当n分类的n个输出 ( y 1 , y 2 , . . . , y n ) (y_1,y_2,...,y_n) (y1,y2,...,yn)通过softmax()函数,便满足了概率分布需求: ∀ x , P ( X = x ) ∈ [ 0 , 1 ] 且 ∑ x P ( X = x ) = 1 \forall x,P(X=x)\in [0,1]且\sum_x P(X=x) = 1 ∀x,P(X=x)∈[0,1]且x∑P(X=x)=1
- s o f t m a x ( y i ) = e y i ∑ j = 1 n e y i softmax(y_i) = \frac{e^{y_i}}{\sum_{j=1}^n e^{y_i}} softmax(yi)=∑j=1neyieyi
ce = tf.nn.aparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
cem=tf.reduce_mean(ce)
(二) 学习率
- 学习率learninng_rate: 每次参数更新的幅度
w n + 1 = w n − l e a r n i n g _ r a t e ∇ w_{n+1}=w_n -learning\_rate\nabla wn+1=wn−learning_rate∇- w n + 1 w_{n+1} wn+1:更新后的参数
- w n w_n wn: 当前参数
- learning_rate: 学习率
- ∇ \nabla ∇: 损失函数的梯度(导数)
一个例子
1 #coding:utf-8
2 #设损失函数 loss=(w+1)^2 ,令w初值是常数5. 反向传播就是求最优w,即最小loss对应
的w值
3 import tensorflow as tf
4 #定义待优化参数w初值为5
5 w = tf.Variable(tf.constant(5, dtype=tf.float32))
6 #定义损失函数loss
7 loss = tf.square(w+1)
8 #定义反向传播方法
9 train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
10 #生成会话,训练40轮
11 with tf.Session() as sess:
12 init_op=tf.global_variables_initializer()
13 sess.run(init_op)
14 for i in range(40):
15 sess.run(train_step)
16 w_val = sess.run(w)
17 loss_val = sess.run(loss)
18 print "After %s steps: w is %f, loss is %f.\n" %(i, w_val, loss_val )
结果:
After 0 steps: w is 2.600000, loss is 12.959999.
After 1 steps: w is 1.160000, loss is 4.665599.
After 2 steps: w is 0.296000, loss is 1.679616.
After 3 steps: w is -0.222400, loss is 0.604662.
After 4 steps: w is -0.533440, loss is 0.217678.
After 5 steps: w is -0.720064, loss is 0.078364.
After 6 steps: w is -0.832038, loss is 0.028211.
After 7 steps: w is -0.899223, loss is 0.010156.
After 8 steps: w is -0.939534, loss is 0.003656.
After 9 steps: w is -0.963720, loss is 0.001316.
After 10 steps: w is -0.978232, loss is 0.000474.
After 11 steps: w is -0.986939, loss is 0.000171.
After 12 steps: w is -0.992164, loss is 0.000061.
After 13 steps: w is -0.995298, loss is 0.000022.
After 14 steps: w is -0.997179, loss is 0.000008.
After 15 steps: w is -0.998307, loss is 0.000003.
After 16 steps: w is -0.998984, loss is 0.000001.
After 17 steps: w is -0.999391, loss is 0.000000.
After 18 steps: w is -0.999634, loss is 0.000000.
After 19 steps: w is -0.999781, loss is 0.000000.
After 20 steps: w is -0.999868, loss is 0.000000.
After 21 steps: w is -0.999921, loss is 0.000000.
After 22 steps: w is -0.999953, loss is 0.000000.
After 23 steps: w is -0.999972, loss is 0.000000.
After 24 steps: w is -0.999983, loss is 0.000000.
After 25 steps: w is -0.999990, loss is 0.000000.
After 26 steps: w is -0.999994, loss is 0.000000.
After 27 steps: w is -0.999996, loss is 0.000000.
After 28 steps: w is -0.999998, loss is 0.000000.
After 29 steps: w is -0.999999, loss is 0.000000.
After 30 steps: w is -0.999999, loss is 0.000000.
After 31 steps: w is -1.000000, loss is 0.000000.
After 32 steps: w is -1.000000, loss is 0.000000.
After 33 steps: w is -1.000000, loss is 0.000000.
After 34 steps: w is -1.000000, loss is 0.000000.
After 35 steps: w is -1.000000, loss is 0.000000.
After 36 steps: w is -1.000000, loss is 0.000000.
After 37 steps: w is -1.000000, loss is 0.000000.
After 38 steps: w is -1.000000, loss is 0.000000.
After 39 steps: w is -1.000000, loss is 0.000000.
改变学习率大小后再运行,发现 学习率大了震荡不收敛,小了收敛速度慢
- 指数衰减学习率:
-
根据运行BATCH_SIZE的轮数动态更新学习率
-
l e a r n i n g _ r a t e = L E A R N I N G _ R A T E _ B A S E × L E A R N I N G _ R A T E _ D E C A Y g l o b a l _ s t e p L E A R N I N G _ R A T E _ S T P E learning\_rate=LEARNING\_RATE\_BASE \times LEARNING\_RATE\_DECAY^{\frac{global\_step}{LEARNING\_RATE\_STPE}} learning_rate=LEARNING_RATE_BASE×LEARNING_RATE_DECAYLEARNING_RATE_STPEglobal_step
其中,learningn_rate是 学习率基数;learning_rate_base是 学习率初始值 ; learning_rate_decay是 学习率衰减率(0,1) ;global_step是 运行了几轮BATCH_SIZE;多少轮更新一次学习率= 总 样 本 数 B A T C H _ S I Z E \frac{总样本数}{BATCH\_SIZE} BATCH_SIZE总样本数
global_step = tf.Variable(0, trainable=False)#运行到第几轮的计数器 learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True)#为True时,学习率呈梯形衰减;反之学习率是一条平滑下降的曲线
-
一个例子
1 #coding:utf-8
2 #设损失函数 loss=(w+1)^2, 令w初值是常熟10
3 #使用指数衰减的学习率,在迭代初期得到较高的下降速度,可以在较小的训练轮数下取得更快收敛速度
4 import tensorflow as tf
5
6 LEARNING_RATE_BASE = 0.1#最初学习率
7 LEARNING_RATE_DECAY = 0.99#学习率衰减率
8 LEARNING_RATE_STEP = 1#喂入多少轮BATCH_SIZE后更新一次学习率,一般是 总样本数/BATCH_SIZE 的值
9
10 #运行了几轮BATCH_SIZE的计数器,初值给0,意为不被训练
11 global_step = tf.Variable(0, trainable=False)
12 #定义指数下降学习率
13 learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, LEARNING_RATE_STEP, LEARNING_RATE_DECAY, staircase=True)
14 #定义待优化函数, 初值给10
15 w = tf.Variable(tf.constant(5, dtype=tf.float32))
16 #定义损失函数loss
17 loss = tf.square(w+1)
18 #定义反向传播方法
19 train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
20 #生成会话, 训练40轮
21 with tf.Session() as sess:
22 init_op = tf.global_variables_initializer()
23 sess.run(init_op)
24 for i in range(40):
25 sess.run(train_step)
26 learning_rate_val = sess.run(learning_rate)
27 global_step_val = sess.run(global_step)
28 w_val = sess.run(w)
29 loss_val = sess.run(loss)
30 print "\nAfter %s steps: global_step is %f, w is %f\n learning_rate is %f\n loss is %f" % (i, global_step_val, w_val, learning_rate_val, loss_val)
- 运行结果
After 0 steps: global_step is 1.000000, w is 3.800000
learning_rate is 0.099000
loss is 23.040001
After 1 steps: global_step is 2.000000, w is 2.849600
learning_rate is 0.098010
loss is 14.819419
After 2 steps: global_step is 3.000000, w is 2.095001
learning_rate is 0.097030
loss is 9.579033
After 3 steps: global_step is 4.000000, w is 1.494386
learning_rate is 0.096060
loss is 6.221961
After 4 steps: global_step is 5.000000, w is 1.015167
learning_rate is 0.095099
loss is 4.060896
After 5 steps: global_step is 6.000000, w is 0.631886
learning_rate is 0.094148
loss is 2.663051
After 6 steps: global_step is 7.000000, w is 0.324608
learning_rate is 0.093207
loss is 1.754587
After 7 steps: global_step is 8.000000, w is 0.077684
learning_rate is 0.092274
loss is 1.161403
...
After 33 steps: global_step is 34.000000, w is -0.989550
learning_rate is 0.071055
loss is 0.000109
After 34 steps: global_step is 35.000000, w is -0.991035
learning_rate is 0.070345
loss is 0.000080
After 35 steps: global_step is 36.000000, w is -0.992297
learning_rate is 0.069641
loss is 0.000059
After 36 steps: global_step is 37.000000, w is -0.993369
learning_rate is 0.068945
loss is 0.000044
After 37 steps: global_step is 38.000000, w is -0.994284
learning_rate is 0.068255
loss is 0.000033
After 38 steps: global_step is 39.000000, w is -0.995064
learning_rate is 0.067573
loss is 0.000024
After 39 steps: global_step is 40.000000, w is -0.995731
learning_rate is 0.066897
loss is 0.000018
(三) 滑动平均ema
- 滑动平均(影子): 记录了没一个参数(包括w和b)在过往一段时间内值的平均,增加了模型的泛化性
- 参数发生变化,影子慢慢追随
-
影
子
=
衰
减
率
×
影
子
+
(
1
−
衰
减
率
)
×
参
数
影子=衰减率 \times 影子 + (1-衰减率) \times 参数
影子=衰减率×影子+(1−衰减率)×参数
- 影 子 初 值 = 参 数 初 值 影子初值 = 参数初值 影子初值=参数初值
- 衰 减 率 ( M O V I N G _ A V E R A G E _ D E C A Y ) = m i n { 衰 减 率 , 1 + 轮 数 10 + 轮 数 } 衰减率(MOVING\_AVERAGE\_DECAY) = min\{衰减率, \frac{1+轮数}{10+轮数}\} 衰减率(MOVING_AVERAGE_DECAY)=min{衰减率,10+轮数1+轮数}
ema = tf.train.ExponentialMovingAverage(衰减率MOVING_AVERAGE_DECAY,当前轮数global_step)#计算衰减率 ema_op = ema.apply([])#[]中列出需要计算滑动平均的参数,或者将[]用函数用tf.trainable_variables()代替,可自动列出所有参数 with tf.control_dependencise([train_step, ema_op]): train_op = tf.no_op(name='train') 还可以用函数返回某参数的滑动平均值: ema.average(参数名)
一个例子
1 #coding:utf-8
2 import tensorflow as tf
3
4 w1 = tf.Variable(0, dtype=tf.float32)
5 global_step = tf.Variable(0, trainable=False)
6 MOVING_AVERAGE_DECAY = 0.99
7 ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
8
9 ema_op = ema.apply(tf.trainable_variables())
10
11 with tf.Session() as sess:
12 init_op = tf.global_variables_initializer()
13 sess.run(init_op)
14
15 print sess.run([w1,ema.average(w1)])
16
17 sess.run(tf.assign(w1,1))
18 sess.run(ema_op)
19 print sess.run([w1, ema.average(w1)])
20
21 sess.run(tf.assign(global_step, 100))
22 sess.run(tf.assign(w1, 10))
23 sess.run(ema_op)
24 print sess.run([w1, ema.average(w1)])
25
26 for i in range(5):
27 sess.run(ema_op)
28 print sess.run([w1, ema.average(w1)])
运行结果
[0.0, 0.0]
[1.0, 0.9]
[10.0, 1.6445453]
[10.0, 2.3281732]
[10.0, 2.955868]
[10.0, 3.532206]
[10.0, 4.061389]
[10.0, 4.547275]
- 可以看到,开始时参数w1是0,滑动平均是0;w1设定为1,滑动平均为0.9;当迭代轮数更新为100轮时,参数设定为10,滑动平均向10逼近
(四) 正则化
- 过拟合现象: 模型在训练数据集上的正确率非常高,但对于未训练的新数据很难作出正确反映,即过拟合现象
- 正则化缓解过拟合: 正则化在损失函数中引入模型复杂度指标, 利用给W加权值,弱化训练数据的噪声(一般不正则化b)
l
o
s
s
=
l
o
s
s
(
y
与
y
_
)
+
R
E
G
U
L
A
R
I
Z
E
R
∗
l
o
s
s
(
w
)
loss=loss(y与y_{\_})+REGULARIZER*loss(w)
loss=loss(y与y_)+REGULARIZER∗loss(w)
- loss(y与y_): 模型中所有参数的损失函数
- REGULARIZER: 用超参数REGULARIZER给出参数w在总loss中的比例,及正则化的权重
- loss(w): 需要正则化的参数
loss(w)=tf.contrib.layers.l1_regularizer(REGULARIZER)(w)#w绝对值求和
loss(w)=tf.contrib.layers.l2_regularizer(REGULARIZER)(w)#w平方求和
//以上正则化二选一
tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regulizer)(w))#把内容加到集合对应位置做加法
loss = cem + tf.add_n(tf.get_collection('losses'))
一个例子:
随机给出平面坐标上的一些点,当点的x和y坐标值平方和小于2时,该点为红色,否则为蓝色.要求拟合一条线作为红蓝点阵之间的分界线
1 #coding:utf-8
2 import tensorflow as tf
3 import numpy as np
4 import matplotlib.pyplot as plt
5 BATCH_SIZE = 30
6 seed = 2
7
8 rdm = np.random.RandomState(seed)
9
10 X = rdm.randn(300,2)
11 Y_ = [int(x0*x0 + x1*x1 < 2) for (x0,x1) in X]
12 Y_c = [['red' if y else 'blue'] for y in Y_]
13
14 X = np.vstack(X).reshape(-1,2)#n行2列
15 Y_ = np.vstack(Y_).reshape(-1,1)
16 print X
17 print Y_
18 print Y_c
19
20 plt.scatter(X[:,0], X[:,1], c = np.squeeze(Y_c))#X[:,0]表示取X第一行元素
21 plt.show()
22
23 def get_weight(shape, regularizer):
24 w = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
25 tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
26 return w
27
28 def get_bias(shape):
29 b = tf.Variable(tf.constant(0.01, shape=shape))
30 return b
31
32 x = tf.placeholder(tf.float32, shape=(None, 2))
33 y_ = tf.placeholder(tf.float32, shape=(None, 1))
34
35 w1 = get_weight([2,11], 0.01)
36 b1 = get_bias([11])
37 y1 = tf.nn.relu(tf.matmul(x, w1)+b1)
38
39 w2 = get_weight([11,1], 0.01)
40 b2 = get_bias([1])
41 y = tf.matmul(y1, w2)+b2#输出层不激活
42
43 loss_mse = tf.reduce_mean(tf.square(y-y_))
44 loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))
45
46 train_step = tf.train.AdamOptimizer(0.0001).minimize(loss_mse)
47
48 with tf.Session() as sess:
49 init_op = tf.global_variables_initializer()
50 sess.run(init_op)
51 STEPS = 40000
52 for i in range(STEPS):
53 start = (i*BATCH_SIZE) % 300
54 end = start + BATCH_SIZE
55 sess.run(train_step, feed_dict={x:X[start:end], y_:Y_[start:end]})
56 if i % 2000 == 0:
57 loss_mse_v = sess.run(loss_mse, feed_dict={x:X, y_:Y_})
58 print("After %d steps, loss is:%f" %(i, loss_mse_v))
59
60 xx, yy = np.mgrid[-3:3:.01, -3:3:.01]
61 grid = np.c_[xx.ravel(), yy.ravel()]
62 probs = sess.run(y, feed_dict={x:grid})
63 probs = probs.reshape(xx.shape)
64
65 print "w1:\n",sess.run(w1)
66 print "b1:\n",sess.run(b1)
67 print "w2:\n",sess.run(w2)
68 print "b2:\n",sess.run(b2)
69
70 plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
71 plt.contour(xx, yy, probs, levels=[.5])
72 plt.show()
73
74 #定义反向传播方法:包含正则化
75 train_step = tf.train.AdamOptimizer(0.0001).minimize(loss_total)
76
77 with tf.Session() as sess:
78 init_op = tf.global_variables_initializer()
79 sess.run(init_op)
80 STEPS = 40000
81 for i in range(STEPS):
82 start = (i*BATCH_SIZE) % 300
83 end = start + BATCH_SIZE
84 sess.run(train_step, feed_dict={x: X[start:end], y_:Y_[start:end]})
85 if i % 2000 == 0:
86 loss_v = sess.run(loss_total, feed_dict={x:X, y_:Y_})
87 print("After %d steps, loss is:%f" %(i, loss_v))
88
89 xx, yy = np.mgrid[-3:3:.01, -3:3:.01]
90 grid = np.c_[xx.ravel(), yy.ravel()]
91 probs = sess.run(y, feed_dict={x:grid})
92 probs = probs.reshape(xx.shape)
93 print "w1:\n",sess.run(w1)
94 print "b1:\n",sess.run(b1)
95 print "w2:\n",sess.run(w2)
96 print "b2:\n",sess.run(b2)
97
98 plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
99 plt.contour(xx, yy, probs, levels=[.5])
100 plt.show()
- 包含正则化的模型具有更好的泛化性
(五) 神经网络搭建八股
- forward.py
前向传播: 搭建网络,设计网络结构
def forward(x, regularizer):
w=
b=
y=
return y
def get_weight(shape, regularizer):
w=tf.Variable()
tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))#把每一个w的正则化损失加到总损失中返回正则化函数
return w
def get_bias(shape):#参数:某层中b的个数
b=tf.Variable()
return b
- backward.py
反向传播: 训练网络,优化网络参数
def backward():
x=tf.placeholder()
y_=tf.placeholder()
y=forward.forward(x,REGULARIZER)
global_step=tf.Variable(0,trainable=False)
loss=
//正则化
y与y_的差距(loss_mse)=tf.reduce_mean(tf.square(y-y_))#均方误差
或
ce = tf.nn.spqrse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argnax(y_, 1))//交叉熵
y与y_的差距(cem)=tf.reduce_mean(ce)
加入正则化后
loss = y与y_的差距 + tf.add_n(tf.get_collection('losses'))
//指数衰减学习率
learning_rate = tf.train.exponential_decay(
LLEARNING_RATE_BASE,
global_step,
数据集总样本数/BATCH_SIZE,
LEARNING_RATE_DECAY,
staircases=True)
)
//滑动平均
见前一章代码
一个例子:
- generateds.py
1 #coding:utf-8
2 import numpy as np
3 import matplotlib.pyplot as plt
4
5 seed = 2
6
7 def generateds():
8 rdm = np.random.RandomState(seed)
9 X = rdm.randn(300,2)
10 Y_ = [int(x0*x0 + x1*x1 < 2) for (x0, x1) in X]
11 Y_c = [['red' if y else 'blue'] for y in Y_]
12 X = np.vstack(X).reshape(-1,2)
13 Y_ = np.vstack(Y_).reshape(-1,1)
14
15 return X, Y_, Y_c
- forward.py
1 #coding:utf-8
2 import tensorflow as tf
3
4 def get_weight(shape, regularizer):
5 w = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
6 tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
7 return w
8
9 def get_bias(shape):
10 b = tf.Variable(tf.constant(0.01, shape=shape))
11 return b
12
13 def forward(x, regularizer):
14 w1 = get_weight([2,11], regularizer)
15 b1 = get_bias([11])
16 y1 = tf.nn.relu(tf.matmul(x, w1) + b1)
17
18 w2 = get_weight([11,1], regularizer)
19 b2 = get_bias([1])
20 y = tf.matmul(y1, w2) + b2
21
22 return y
- backward.py
1 #coding:utf-8
2 import tensorflow as tf
3 import numpy as np
4 import matplotlib.pyplot as plt
5 import opt4_8_generateds
6 import opt4_8_forward
7
8 STEPS = 40000
9 BATCH_SIZE = 30
10 LEARNING_RATE_BASE = 0.001
11 LEARNING_RATE_DECAY = 0.999
12 REGULARIZER = 0.01
13
14 def backward():
15 x = tf.placeholder(tf.float32, shape=(None, 2))
16 y_ = tf.placeholder(tf.float32, shape=(None, 1))
17
18 X, Y_, Y_c = opt4_8_generateds.generateds()
19
20 y = opt4_8_forward.forward(x, REGULARIZER)
21
22 global_step = tf.Variable(0, trainable=False)
23
24 learning_rate = tf.train.exponential_decay(
25 LEARNING_RATE_BASE,
26 global_step,
27 300/BATCH_SIZE,
28 LEARNING_RATE_DECAY,
29 staircase=True)
30
31 loss_mse = tf.reduce_mean(tf.square(y-y_))
32 loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))
33
34 train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss_total)
35
36 with tf.Session() as sess:
37 init_op = tf.global_variables_initializer()
38 sess.run(init_op)
39 for i in range(STEPS):
40 start = (i*BATCH_SIZE) % 300
41 end = start + BATCH_SIZE
42 sess.run(train_step, feed_dict={x: X[start:end], y_:Y_[start:end]})
43 if i % 2000 == 0:
44 loss_v = sess.run(loss_total, feed_dict={x:X, y_:Y_})
45 print("After %d steps, loss is: %f" % (i, loss_v))
46
47 xx, yy = np.mgrid[-3:3:.01, -3:3:.01]
48 grid = np.c_[xx.ravel(), yy.ravel()]
49 probs = sess.run(y, feed_dict={x:grid})
50 probs = probs.reshape(xx.shape)
51
52 plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
53 plt.contour(xx, yy, probs, levels=[.5])
54 plt.show()
55
56 if __name__ == '__main__':
57 backward()
六. 全连接网络基础
(一) MNIST数据集
- MNIST数据集: 含7w张图片,其中有6w张用来训练,1w张用来测试.其中每张图片大小为28*28(=784,即长为784的一维数组)像素
- 数组中,每位数字在0到1之间,黑底用0表示,白色用1表示,越接近1,颜色越白
- 数组有一个标签(长为10的一维数组),每一位表示0~9之间的数字可能的概率
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('(保存路径)./data/',one_hot=True)#以读热码的形式存取
//返回各子集样本数
print "train data size:", mnist.train.num_examples
print "validation data size:", mnist.validation.num_examples
print "test data size:", mnist.test.num_examples
//返回标签和数据
mnist.train.labels[0(编号)]#返回训练集中制定编号的标签或图片
mnist.train.images[0(编号)]
//取一小撮数据,准备喂入神经网络训练
BATCH_SIZE = 200#一次喂200张图
xs, ys = mnist.train.next_batch(BATCH_SZIE)#从训练集中随机抽取BATCH_SIZE组标签,分别赋给xs和ys
print "xs shape:", xs.shape#(200,784)200行,每行784个像素
print "ys shape:", ys.shape#(200,10) 200行,每行10个元素表示10个标签
几个函数
tf.get_collection("")
从集合中取全部变量,生成一个列表tf.add_n([])
列表内对应元素相加tf.cast(x,dtype)
把x转为dtype类型tf.argmax(x,axis)
返回最大值所在索引号,如tf.argmax([1,0,0],1) 在第一维度找最大值索引号,返回0os.path.join("home","name")
os模块的函数,返回路径(home/name)字符串.split()
按指定拆分对字符串切片,返回分割后的列表with tf.Graph().as_default() as g:
其内定义的节点在计算图g中
模型的保存和加载
//保存
saver=tf.train.Saver()#实例化saver对象
with tf.Session() as sess:
for i in range(STEPS):
if i % 轮数 == 0:
saver.save(sess,os.path.join(MODEL_SAVE_PATH,MODEL_NAME),global_step=global_step)
//加载
with tf.Session() as sess:
ckpt=tf.train.get_checkpoint_state(存储路径)
if ckpt and ckpt.medel_checkpoint_path:
saver.restore(sess.ckpt.model_checkpoint_path)
// 实例化可还原滑动平均值的saver
ema=tf.train.ExponentialMovingAverage(滑动平均基数)
ema_restore=ema.variables_to_restore()
saver=tf.train.Saver(ema_restore)
//计算准确率
correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
(二) 模块化搭建全链接神经网络
遇到问题
- TensorFlow IOError: [Errno socket error] [Errno 104] Connection reset by peer
解决方法: 网络出问题,看看能不能访问http://yann.lecun.com/exdb/mnist/,调节网络配置,翻过防火墙,能够访问后就没有问题了。 贴一个大牛的解决方法
backward.py
def backward(mnist):
x=
y_=
y=
global_step=
loss=
//<正则化,指数衰减学习率,滑动平均>
train_step=
实例化saver
with tf.Session() as sess:
初始化
for i in range(STEPS):
sess.run(train_step,feed_dict={x: ,y_: })
if i % 轮数 ==0:
print
saver.save( )
//损失函数loss含正则化regularization
//backward.py中加
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
cem=tf.reduce_mean(ce)
loss=cem+tf.add_n(tf.get_collection('losses'))
//forward.py中加
if regularizer != None:tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))
//学习率learning_rate
//backward.py中加入
learning_rate=tf.train.exponential_decay(
LEARNING_RATE_BASE,
global_step,
LEARNING_RATE_STEP,
LEARNING_RATE_DECAY,
staircase=True)
//滑动平均ema
//backward.py中加入
ema=tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
ema_op=ema.apply(tf.trainable_variables())
with tf.control_dependencies([train_step,ema_op]):
train_op=tf.no_op(name='train')
test.py
def test(mnist):
with tf.Graph().as_default() as g:
定义x y_ y
实例化可还原滑动平均值的saver
计算正确率
while True:
with tf.Session() as sess:
ckpt=tf.train.get_checkpoint_state(存储路径)//加载ckpt模型
if ckpt and ckpt.model_checkpoint_path://如果已有ckpt模型则恢复
saver.restore(sess,ckpt.model_checkpoint_path)//恢复会话
global_step=ckpt.model_checkpoint_path.spilt('/')[-1].split('-')[-1]//恢复轮数
accuracy_score=sess.run(accuracy,feed_dict={x:mnist.test.images,y_:mnist.test.labels})//计算准确率
打印提示
else://如果没有模型
给出提示(print)
return
def main():
mnist=input_data.read_data_sets("./data/",one_hot=True)
test(mnist)
if __name__ == '__main__':
main()