(持续更新)Tensorflow学习笔记

tensorflow笔记

参考教程: 幕课: 人工智能实践-北京大学,网易云课堂: 吴恩达机器学习


一. 概述

(一) 概念

  1. 图灵测试: 提问者和回答者隔开,提问这随机向机器提问,如果有超过30%的人认为回答者是人而不是机器,则算法通过图灵测试

  2. 感知机(preceptron):单层神经网络,无法计算抑或逻辑

  3. BP: 反向传播算法

  4. SVM: 支持向量机

    • 免去神经网络參數的不足
    • 避免局部最优
  5. DBN: 深层神经网络

  6. CNN: 卷积神经网络

  7. 人工智能: 机器模拟人的意识和思維

  8. 机器学习: 在任务T上,随经验E的增加,效果P随之增加,则这个程序可以从经验中进行学习

    • 三要素:
      • 数据
      • 算法
      • 算力
  9. 机器学习的过程:

输入
训练
预测
新数据
历史数据
模型
結果
  1. 单个神经元模型:
value1
value2
value3
輸入1
輸入2
輸入3
求和
非线性函數
輸出

二. python语法串讲

(一) linux一些指令

  1. pwd:当前所在目录
    (以pwd打印的是以根目录为起点的绝对路径)
  2. ls:打印当前路径下的文件和目录
  3. mkdir newName: 在当前路径新建一个叫newName的文件夹
  4. cd name: 进入name文件夾
  5. sudo rm -r filename:强制刪除文件夹并提示
  6. sudo rm -rf filename:强制刪除文件夹但不提示

ubuntu vim:

  1. vim filename.py打开或新建名为filename的文本
  2. python filename.py运行名为filename的python文件
  3. [esc] :q退出vim
  4. [esc] :wq保存并退出
  5. [esc] :q!不保存并退出

(二) pyhton基础语法

1. 基础

\:转义字符, 如 \t 表示tab
%:占位符,在对应的位置用%后所表示的变量替換%

2. 列表 $ list[num] $
  1. 列表名[起,止]前闭后开
  2. 列表名[ : ]访问所有
  3. 列表名[起 : 止 : 步长]从起点开始每隔步长个元素取一个元素,注意步长带方向;止可省略不写
  4. 列表名[索引号] = 新值修改
  5. del 列表名[索引号]刪除
  6. 列表名.insert(插入位置索引号,新元素)插入
3. 元組 $ tuple(num) $
  1. 元組一旦定义,不可改变
4. 字典 dic{鍵1:值1, 鍵2:值2, ...}
  1. dic[鍵x] = 值x索引
  • exp: dic = {1:"123", "name":"Mike", "height":178}
  • 索引: dic["name"] 表示"Mike"
  1. dic[鍵i] = 新值i修改
  2. del dic[鍵i]刪除
  3. dic[鍵i] = 新值i插入
5. 条件

(1)

if 条件成立 :
    dosomething

(2)

    if 条件1成立 :
        执行任务1
    else :
        执行任务2

(3)

    if 条件1成立 :
        执行任务1
    elif 条件2成立 :
        执行任务2
    .
    .
    .
    else :
        执行任务n
  • 注意:
    1. python用左对齐表示代码层次
    2. 报错: SyntaxError: Non-ASCII character '\xe8' in file a.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for detail
      原因及解決: 中文无法编码,解決方式:在.py文件第一行加入 #coding:utf-8
6. 循环

(1)

 for 变量 in range(start,end) :
     dosomething

(2)

 for 变量 in 列表名 :
     dosomething

(3)

 while 条件 :
     dosomething

(4) 終止循环用break

  • 一個例子:
code:
 for i in range(0,5) :
    print "i am counting %s" %i
結果: 
i am counting 0
i am counting 1
i am counting 2
i am counting 3
i am counting 4
7. 函數

(1) 定义函數:

def 函數名 (參數表) :
    函數体   

(2) 使用函數:

函數名 (參數表)      

(3) 內建函數: python解釋器自帶的函數

:abs(num) #絕對值函數
8. 模块

模块是函數的集合,先导入,再使用

import  time
time.asctime() #輸出當前時間
9. 包

包含多個模塊

from PIL import Image #从PIL这个包导入Image模块
10. 类 对象 实例化
  • 类: 函數的集合,可实例化出对象的模具
  • 实例化: 对象 = 类()
  • 对象: 实例化出的个体, 实实在在完成具体工作
  • 面向對象: 程序員反復修改优化后,实例化對象,對象調用类函數执行具体操作

类的定义:

class 类名 (父类名) :
    具体函數        

(1) 类里定义參數必須是self 函数时,语法规定第一个参数必须是self
例如:

class Animal:
    def breath(self):
        print "breathing"

(2) __init__函數,在新对象实例化时会自动运行,用于给新对象赋初值
例如:

class Cats(Animal):
    def __int__(self, spots):
        self.spots = spots
    def catch_mouse(self):
        print "catch mouse"

实例化:

kitty = Cat(10) 
print kitty.spots # 10
kitty.catch_mouse() # catch mouse

(3) 对象調用类的函數,用对象名.函數名()
对象調用类的变量,用对象名.变量名
(4) 类内定义函數時,如調用自身或父类的函數或变量,需用self.引导,写为self.函數名self.变量名

  • 一個实例
    animal.py
class Animals():
    def breath(self):
        printf "breathing"
    def move(self):
        printf "moving"
    def eat(self):
        printf "eating food"
class Mammals(Animals):
    def breastfeed(self):
        printf "feeding young"
class Cats(Mammals):
    def __init__(self, spots):
        self.spots = spots
    def catch_mouse(self):
        print "catching mouse"
    def left_foot_forward(self):
        printf "left foot forward"
    def left_foot_backward(self):
        printf "left foot backward"
    def dance(self):
        self.left_foot_forward()
        self.left_foot_backward()
        self.left_foot_backward()
        self.left_foot_forward()
kitty = Cats(10)
print kitty.spots
kitty.dance()
kitty.breastfeed()
kitty.move()

运行結果:

10    
left foot forward    
left foot backward      
left foot backward    
left foot forward    
feeding young
moving
11. 文件

(1) 写: 开->存->关

import pickle #引入pickle包
文件變量 = open("文件路徑文件名(如save.dat)", "wb") #开
pickle.dump(待寫入的变量, 文件变量) #存
文件變量.close() #关

例如:

待写入的变量(数据)
game_data = {
    "position":"N2 E3"
    "pocket":["key","knife"]
    "money":160
}
写入save.dat文件:
save_flie = open("save.dat","wb")
pickle.dump(game_data, save_file)
save_file.close()

(2) 读: 开->取->关

import pickle
文件变量 = open("文件路徑文件名","rb") #开
放內容的变量 = pickle.load(文件变量) #取
文件变量.close() #关

例如:

load_file = open("save.dat","rb")
load_game_data = pickle.load(load_file)
load_file.close()
补充
  1. python中虽然沒有访问控制关键字(如C++的private等),但在python編輯器中对访問控制有一定約束
    (1) 单下划线 = protected,只允許本身与子类訪問,如:_foo
    (2) 双下划线 = private,如:__foo
    (3) 头尾双下划线 = 特別方法,如:__init__()

三. 一些模块

(一) turtle模块

1. 一些问题和解決方法
  • No module named _tkinter
    • python2.7 : sudo get-apt install python-tk
    • python3 : sudo apt-get install python3-tk
2. 基本操作

import turtle导入turtle模块
$ t = turtle.Pen() $ 用Pen类实例化一個叫t的对象
$ t.forward(n) $ 让t向前走n個像素点
$ t.backward(n) $ 让t后退n個像素點
$ t.left(n) $ 让t左转n度
$ t.right(n) $ 让t右转n度
$ t.reset() $ 让t复位

(二) matplotlib模块

1. 引入
  • sudo pip install matplotlib
2. 功能
  • 实现图形可视化
3. 操作
//引入模块
import matplotlib.pyplot as plt
//可视化数据点
plt.scatter(x坐标,y坐标,c="颜色")
plt.show()
//可视化坐标轴,形成网格坐标点
xx, yy = np.mgrid[::步长,::步长]
//x,y坐标拉直(分别成一行),形成矩阵,收集区域内所有网格坐标点
grid=np.c_[xx.ravel(), yy.ravel()]
//将收集到的坐标点计算后赋值给probs(坐标点偏红或偏蓝的量化值),喂入神经网络
probs=sess.run(y, feed_dict={x:grid})
probs=probs.reshape(xx.shape)
//描色并图形化
plt.contour(x轴坐标值,y轴坐标值,该店的高度,levels=[等高线的高度])

四. Tensorflow框架

(一) 张量 计算图 会话

  1. 基于tensorflow的NN: 用张量表示数据,用计算图搭建神经网络,用会话执行计算图,优化线上的权重(参数),得到模型
  2. 张量(tensor): 多维数组(列表)
    • 张量可以表示0阶到n阶的数据(几个中扩就几阶)
  3. 阶: 张量的维数
  4. 数据类型:
    • tf.float3232位浮点数;
    • tf.int3232位整形数
    • tf.constant(常数)定义常数
  5. 计算图(graph): 搭建神经网络的计算过程,只搭建,不运算. 承载一个或多个计算节点(神经元)
w1
w2
x1
x2
y

$ Y = XW = {x}{1}*{w}{1} + {x}{2}*{w}{2} $

  • 一个例子: 实现两个常数的加法

新建文件 tf3_1.py

import tensorflow as tf
a = tf.constant([[1.0,2.0]])
b = tf.constant([[3.0],[4.0]])

y = tf.matmul(a, b)
print y

运行结果

Tensorflow("matmul:0", shape=(1,1), dtype=float32)

分析:

  • add:0: result是一个名叫 a d d : 0 add:0 add:0的张> 量
  • s h a p e ( x 1 , x 2 , x 3 , . . . ) shape({x}_{1},{x}_{2},{x}_{3},...) shape(x1,x2,x3,...): 张量的维度,有几个 x n {x}_{n} xn就是几维张量, x i {x}_{i} xi的数值是对应数组的长度
  • dtype: 数据类型
  1. 会话(Session): 执行计算图中的节点运算
    • with tf.Session() as sess:
        print sess.run(需要计算的节点变量)
      
  • 一个例子

新建文件 tf3_2.py

import tensorflow as tf
x = tf.constant([[1.0, 2.0]])
w = tf.constant([[3.0],[4.0]])
y = tf.matmul(x, w)
print y 
with tf.Session() as sess:
  print sess.run(y)

运行结果

Tensor("Matmul:0", shape=(1, 1), dtype=float32)
[[11.]]
注意:可能warning如下:
Tensor("MatMul:0", shape=(1, 1), dtype=float32)
2019-07-13 17:57:09.100298: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2019-07-13 17:57:09.100369: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2019-07-13 17:57:09.100393: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2019-07-13 17:57:09.100412: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2019-07-13 17:57:09.100443: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[[11.]]

这是因为电脑支持一些可以加速的指令,但是运行代码的时候并没有启动这些指令.可以用如下方法屏蔽这些提示:
(1) xxx@aaa:~/tf$ vim ~/.bashrc进入主目录下的bashrc文件
(2) 最后一行添加export TF_CPP_MIN_LOG_LEVEL=2把tensorflow的提示等级降低,保存并退出
(3) source ~/.bashrc刚才的配置文件生效

(二) 前向传播

  1. 参数: 即权重 w i {w}_{i} wi,用变量表示,一般随机给初值
w = tf.Variable(tf.random_normal([2,3], stddev=2, mean=0, seed=1))
  • tf.random_normal()生成正态分布的随机数
  • tf.truncated_normal()去掉过大偏离点的正态分布随机数
  • tf.random_normal()平均分布的随机数
  • tf.random_normal([2,3])产生[2,3]矩阵
  • stddev = 2标准差为2
  • mean = 0均值为0
  • seed = 1随机种子,去掉每次生成的结果不一样
  • 标准差,均值,随机种子没有特殊要求可以不写
  • tf.zeros生成全0数组,如tf.zeros([3,2], int32)
  • tf.ones全1数组, 如 tf.ones([3,2],int32)
  • tf.fill全定制数组,如 tf.fill([3,3], int32)
  • tf.constant直接给值,如 tf.constant([3,2,1])表示直接生成 [ 3 , 2 , 1 ] [3,2,1] [3,2,1]
  1. 神经网络实现过程:

训 练 过 程 { 1 准 备 数 据 集 , 提 取 特 征 , 作 为 输 入 喂 给 神 经 网 络 2 前 向 传 播 : 搭 建 N N 结 构 , 从 输 入 到 输 出 ( 先 搭 建 计 算 图 , 再 用 会 话 执 行 ) N N 传 播 算 法 = = > 计 算 输 出 3 反 向 传 播 : 大 量 特 征 数 据 喂 给 N N , 迭 代 优 化 N N 参 数 N N 反 向 传 播 算 法 = = > 优 化 参 数 训 练 模 型 训练过程 \begin{cases} 1 准备数据集,提取特征,作为输入喂给神经网络\\ 2 前向传播:搭建NN结构,从输入到输出(先搭建计算图, 再用会话执行)\\ \qquad NN传播算法 ==> 计算输出\\ 3 反向传播:大量特征数据喂给NN,迭代优化NN参数\\ \qquad NN反向传播算法 ==> 优化参数训练模型\\ \end{cases} 1,,2:NN,(,)NN==>3:NN,NNNN==>
$ 使用过程\quad4使用训练好的模型预测和分类 $

  1. 前向传播: 搭建模型,实现推理
  • 一个全连接网络的例子

生产一批零件将体积 x 1 {x}_{1} x1和重量 x 2 {x}_{2} x2作为特征输入NN,通过NN后输出一个数值

体积
重量
w11
w12
w13
w21
w22
w23
w11'
w21'
w31'
x1
x2
a11
a12
a13
y
  • tensorflow描述计算过程:
    (1) X是输入为 1 × 2 1\times2 1×2的矩阵: W 前 节 点 编 号 , 元 节 点 编 号 ( 层 数 ) {W}_{前节点编号,元节点编号}^{(层数)} W,()为待优化的参数
    (2) W ( 1 ) = [ w 1.1 ( 1 ) w 1.2 ( 1 ) w 1.3 ( 1 ) w 2.1 ( 1 ) w 2.2 ( 1 ) w 2.3 ( 1 ) ] 为 2 × 3 矩 阵 W^{(1)} = \left[ \begin{matrix} {w}_{1.1}^{(1)} & {w}_{1.2}^{(1)} & {w}_{1.3}^{(1)} \\ {w}_{2.1}^{(1)} & {w}_{2.2}^{(1)} & {w}_{2.3}^{(1)} \end{matrix} \right]为2\times3矩阵 W(1)=[w1.1(1)w2.1(1)w1.2(1)w2.2(1)w1.3(1)w2.3(1)]2×3
体积
重量
w11
w12
w13
w21
w22
w23
x1
x2
a11
a12
a13
  • (3) a ( 1 ) = [ a 11 , a 12 , a 13 ] 为 1 × 3 a^{(1)} = [a_{11}, a_{12}, a_{13}]为1\times3 a(1)=[a11,a12,a13]1×3矩阵 =XW^{(1)}$

    (4) W ( 2 ) = [ w 1.1 ( 2 ) w 2.1 ( 2 ) w 3.1 ( 2 ) ] 为 3 × 1 矩 阵 W^{(2)} = \left[ \begin{matrix} {w}_{1.1}^{(2)} \\ {w}_{2.1}^{(2)} \\ {w}_{3.1}^{(2)} \end{matrix} \right] 为3\times1矩阵 W(2)=w1.1(2)w2.1(2)w3.1(2)3×1

w11'
w21'
w31'
a11
a12
a13
y
  • (5) y = a ( 1 ) W ( 2 ) y = a^{(1)}W^{(2)} y=a(1)W(2)

用两个式子表示:

a = t f . m a t m u l ( X , W 1 ) a = tf.matmul(X, W1) a=tf.matmul(X,W1)
y = t f . m a t m u l ( a , W 2 ) y = tf.matmul(a, W2) y=tf.matmul(a,W2)
a: 第一个计算层(第一次层网络)
W ( 1 ) W^{(1)} W(1): 第一层参数

  1. 变量初始化,计算图节点运算都要用会话实现:
with tf.Session() as sess:
    sess.run()
  1. 变量初始化: 在sess.run函数中用tf.global_variables_initializer()
init_op=tf.global_variables_initializer()
sess.run(init_op)
  1. 计算图节点运算: 在sess.run函数中写入待运算的节点
sess.run(y)
  1. tf.placeholder占位, 在sess.run()函数中用feed_dic喂数据
//喂一组数据:
x = tf.placeholder(tf.float32, shape = (1,2))
sess.run(y,feed_dict={x:[[0.5,0.6]]})
//喂多组数据:
x = tf.placeholder(tf.float32, shape = (None,2))
sess.run(y,feed_dict={x:[[0.1,0.2],[0.3,0.4],[0.4,0.5]]})

注意:shape = (x,y)中,x表示喂入神经网络的数据的组数,None表示不知道几组数据;y表示数组的特征个数

  • 一个例子
  1 #coding:utf-8
  2 #两层简单神经网络(全连接)
  3 import tensorflow as tf
  4 
  5 #定义输入和参数
  6 #用placeholder定义输入
  7 x = tf.placeholder(tf.float32, shape=(None, 2))
  8 w1= tf.Variable(tf.random_normal([2,3], stddev=1, seed=1))
  9 w2= tf.Variable(tf.random_normal([3,1], stddev=1, seed=1))
 10 
 11 #定义前向传输过程
 12 a = tf.matmul(x, w1)
 13 y = tf.matmul(a, w2)
 14 
 15 #调用会话计算结果
 16 with tf.Session() as sess:
 17     init_op = tf.global_variables_initializer()
 18     sess.run(init_op)
 19     print"the result of y is:\n",sess.run(y, feed_dict={x: [[0.7,0.5],[0.2,0    .3],[0.3,0.4],[0.4,0.5]]})
 20     print "w1:\n", sess.run(w1)
 21     print "w2:\n", sess.run(w2)

运行结果:

the result of y is:
[[3.0904665]
 [1.2236414]
 [1.7270732]
 [2.2305048]]
w1:
[[-0.8113182   1.4845988   0.06532937]
 [-2.4427042   0.0992484   0.5912243 ]]
w2:
[[-0.8113182 ]
 [ 1.4845988 ]
 [ 0.06532937]]

(三) 反向传播

  1. 反向传播: 训练模型参数,在所有参数上用梯度下降,使NN模型在训练数据上的损失函数最小.
  2. 损失函数(loss): 预测值( y y y)与已知答案( y − y_- y)的差距
  3. 均方误差 MSE: $ MSE(y_- - y) = \sum_{i=1}^n\frac{{(y - y_-)}^2}{n}$    loss = tf.reduce_mean(tf.square(y_ - y))
  4. 反向传播训练方法: 以减小loss值为优化目标
train_step = tf.train.GradientDescentOptimizer(learing_rate).minimize(loss)#梯度下降
train_step = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(loss)#Momentum优化器
train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)#Adam优化器
  1. 学习率: 决定参数每次更新的幅度(一般尽量选较小数如0.001,具体视情况而定)
  • 一个例子:
    描述: 有一批零件,有两个特征:体积和数量, 零件的标签:合格与否. 需要通过神经网络对零件实现预测和分类
  1 #coding:utf-8
  2 #0导入模块,生成模拟数据集。
  3 import tensorflow as tf
  4 import numpy as np#科学计算模块
  5 BATCH_SIZE = 8#一次喂8组数据
  6 SEED = 23455
  7 
  8 #基于seed产生随机数
  9 rdm = np.random.RandomState(SEED)
 10  #随机数返回32行2列的矩阵 表示32组 体积和重量 作为输入数据集
 11 X = rdm.rand(32,2)#32组,2列(体积,重量)
 12 #从X这个32行2列的矩阵中 取出一行 判断如果和小于1 给Y赋值1(合格) 如果和不小于1 给Y赋值0(标签) 
 13 #作为输入数据集的标签(正确答案).因为没有数据集,所以虚拟了样本和标签 
 14 Y_ = [[int(x0 + x1 < 1)] for (x0, x1) in X]
 15 print "X:\n",X
 16 print "Y_:\n",Y_
 17 
 18 #1定义神经网络的输入、参数和输出,定义前向传播过程。
 19 x = tf.placeholder(tf.float32, shape=(None, 2))
 20 y_= tf.placeholder(tf.float32, shape=(None, 1))
 21 
 22 w1= tf.Variable(tf.random_normal([2, 3], stddev=1, seed=1))
 23 w2= tf.Variable(tf.random_normal([3, 1], stddev=1, seed=1))
 24 
 25 a = tf.matmul(x, w1)
 26 y = tf.matmul(a, w2)
 27 
 28 #2定义损失函数及反向传播方法。
 29 loss_mse = tf.reduce_mean(tf.square(y-y_))
 30 train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)
 31 #train_step = tf.train.MomentumOptimizer(0.001,0.9).minimize(loss_mse)
 32 #train_step = tf.train.AdamOptimizer(0.001).minimize(loss_mse)
 33 
 34 #3生成会话,训练STEPS轮
 35 with tf.Session() as sess:
 36     init_op = tf.global_variables_initializer()
 37     sess.run(init_op)
 38     # 输出目前(未经训练)的参数取值。
 39     print "w1:\n", sess.run(w1)
 40     print "w2:\n", sess.run(w2)
 41     print "\n"
 42 
 43     # 训练模型。
 44     STEPS = 3000
 45     for i in range(STEPS):
 46         start = (i*BATCH_SIZE) % 32
 47         end = start + BATCH_SIZE
 48         sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})
 49         if i % 500 == 0:
 50             total_loss = sess.run(loss_mse, feed_dict={x: X, y_: Y_})
 51             print("After %d training step(s), loss_mse on all data is %g" % (i, total_loss))
 52 
 53     # 输出训练后的参数取值。
 54     print "\n"
 55     print "w1:\n", sess.run(w1)
 56     print "w2:\n", sess.run(w2)

    • 输出:
X:
[[0.83494319 0.11482951]
 [0.66899751 0.46594987]
 [0.60181666 0.58838408]
 [0.31836656 0.20502072]
 [0.87043944 0.02679395]
 [0.41539811 0.43938369]
 [0.68635684 0.24833404]
 [0.97315228 0.68541849]
 [0.03081617 0.89479913]
 [0.24665715 0.28584862]
 [0.31375667 0.47718349]
 [0.56689254 0.77079148]
 [0.7321604  0.35828963]
 [0.15724842 0.94294584]
 [0.34933722 0.84634483]
 [0.50304053 0.81299619]
 [0.23869886 0.9895604 ]
 [0.4636501  0.32531094]
 [0.36510487 0.97365522]
 [0.73350238 0.83833013]
 [0.61810158 0.12580353]
 [0.59274817 0.18779828]
 [0.87150299 0.34679501]
 [0.25883219 0.50002932]
 [0.75690948 0.83429824]
 [0.29316649 0.05646578]
 [0.10409134 0.88235166]
 [0.06727785 0.57784761]
 [0.38492705 0.48384792]
 [0.69234428 0.19687348]
 [0.42783492 0.73416985]
 [0.09696069 0.04883936]]
Y_:
[[1], [0], [0], [1], [1], [1], [1], [0], [1], [1], [1], [0], [0], [0], [0], [0], [0], [1], [0], [0], [1], [1], [0], [1], [0], [1], [1], [1], [1], [1], [0], [1]]
w1:
[[-0.8113182   1.4845988   0.06532937]
 [-2.4427042   0.0992484   0.5912243 ]]
w2:
[[-0.8113182 ]
 [ 1.4845988 ]
 [ 0.06532937]]


After 0 training step(s), loss_mse on all data is 5.13118
After 500 training step(s), loss_mse on all data is 0.429111
After 1000 training step(s), loss_mse on all data is 0.409789
After 1500 training step(s), loss_mse on all data is 0.399923
After 2000 training step(s), loss_mse on all data is 0.394146
After 2500 training step(s), loss_mse on all data is 0.390597


w1:
[[-0.7000663   0.9136318   0.08953571]
 [-2.3402493  -0.14641267  0.58823055]]
w2:
[[-0.06024267]
 [ 0.91956186]
 [-0.0682071 ]]

(四)神经网络搭建八股: 准备, 前传, 反传, 迭代

  1. 准备:
import
常量定义
生成数据集
  1. 前向传播:
x  =
y_ = 
w1 = 
w2 = 
a  =
y  =
  1. 反向传播: 定义损失函数,反向传播方法
loss = 
train_step =
  1. 生成会话, 训练STEPS轮
with tf.session() as sess:
    init_op = tf.global_variables_initializer()
    sess.run(init_op)

    STEPS = 3000#迭代次数
    for i in range(STEPS):
        start = 
        end = 
        sess.run(train_step, feed_dict)
  • 注意:由于真实情况有大量数据,常用print打印出迭代过程参数的变化

五. 神经网络优化

(一) 损失函数

  1. 1943年McCulloch Pitts神经元模型
w1
w2
w3
x1
x2
x3
b
+\f
f(XW+b)
  • f ( ∑ x i w i + b ) f(\sum{x_i}{w_i}+b) f(xiwi+b)
  • f f f是激活函数(activation function)
  • b b b是偏置项(bias)
  1. 激活函数
  • relu: f ( x ) = m a x ( x , 0 ) = { 0 x < = 0 x x > = 0 f(x)=max(x,0)=\left\{ \begin{array}{rcl} 0 & x <=0\\ x & x>=0 \end{array} \right. f(x)=max(x,0)={0xx<=0x>=0 tf.nn.relu()
  • sigmoid: f ( x ) = 1 1 + e − x f(x)=\frac{1}{1+e^{-x}} f(x)=1+ex1 tf.nn.sigmoid()
  • tanh: f ( x ) = 1 − e − 2 x 1 + e − 2 x f(x)=\frac{1-e^{-2x}}{1+e^{-2x}} f(x)=1+e2x1e2x tf.nn.tanh()
  1. NN复杂度: 多用NN层数和NN参数的个数表示
    • 层 数 = 隐 藏 层 的 层 数 + 1 个 输 出 层 层数=隐藏层的层数+1个输出层 =+1
    • 总 参 数 = 总 W + 总 b 总参数=总W+总b =W+b
  2. 损失函数loss,学习率learning_rate,华东平均ema,正则化regularization
    • 损失函数(loss):预测值(y)与已知答案(y_)的差距
      • N N 优 化 目 标 : l o s s 最 小 { 均 方 误 差 : m s e ( M e a n S q u a r e d E r r o r ) 自 定 义 交 叉 熵 : c e ( C r o s s E n t r o p y ) NN优化目标: loss最小 \begin{cases} 均方误差:mse(Mean Squared Error)\\ 自定义\\ 交叉熵:ce(Cross Entropy) \end{cases} NN:loss:mse(MeanSquaredError):ce(CrossEntropy)
  • 均方误差 m s e mse mse: M S E ( y _ , y ) = ∑ i = 1 n ( y − y _ ) 2 n MSE(y_{\_},y)=\frac{\sum_{i=1}^n (y-y_{\_})^2}{n} MSE(y_,y)=ni=1n(yy_)2 loss_mse=tf.reduce_mean(tf.square(y_-y))
    一个例子: 预测酸奶日销量y. x1和x2是影响日销量的因素. (建模前应预先采集的数据有: 每日x1 x2和销量y_ (即已知答案, 最佳情况: 产量=销量) 拟造数据集X,Y_: y_=x1+x2; 噪声:-0.005~0.005; ) 拟合可以预测销量的函数
    代码如下:
  1 #coding:utf-8
  2 #预测多或预测少的影响一样
  3 #0导入模块,生成数据集
  4 import tensorflow as tf
  5 import numpy as np
  6 BATCH_SIZE = 8
  7 SEED = 23455
  8 
  9 rdm = np.random.RandomState(SEED)
 10 X = rdm.rand(32,2)
 11 Y_ = [[x1 + x2 + rdm.rand()/10.0-0.05] for (x1, x2) in X]
 12 
 13 #1定义神经网络的输入,参数和输出,定义前向传播过程
 14 x = tf.placeholder(tf.float32, shape=(None, 2))
 15 y_= tf.placeholder(tf.float32, shape=(None, 1))
 16 w1= tf.Variable(tf.random_normal([2,1], stddev = 1, seed = 1))
 17 y = tf.matmul(x, w1)
 18 
 19 #2定义损失函数及反向传播方法
 20 #定义损失函数为MSE,反向传播方法为梯度下降
 21 loss_mse = tf.reduce_mean(tf.square(y_ - y))
 22 train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss_mse)
 23 
 24 #3生成会话,训练STEPS轮
 25 with tf.Session() as sess:
 26     init_op = tf.global_variables_initializer()
 27     sess.run(init_op)
 28     STEPS = 20000
 29     for i in range(STEPS):
 30         start = (i*BATCH_SIZE) % 32
 31         end = (i*BATCH_SIZE) % 32 + BATCH_SIZE
 32         sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})
 33         if i % 500 == 0:
 34             print "After %d traning steps, w1 is: " % (i)
 35             print sess.run(w1), "\n"
 36     print "Final w1 is :\n", sess.run(w1)
 37 

运行结果:

After 0 traning steps, w1 is: 
[[-0.80974597]
 [ 1.4852903 ]] 

After 500 traning steps, w1 is: 
[[-0.46074435]
 [ 1.641878  ]] 

After 1000 traning steps, w1 is: 
[[-0.21939856]
 [ 1.6984766 ]] 

After 1500 traning steps, w1 is: 
[[-0.04415595]
 [ 1.7003176 ]] 

After 2000 traning steps, w1 is: 
[[0.08942621]
 [1.673328  ]] 

After 2500 traning steps, w1 is: 
[[0.19583555]
 [1.6322677 ]] 

After 3000 traning steps, w1 is: 
[[0.28375748]
 [1.5854434 ]] 

After 3500 traning steps, w1 is: 
[[0.35848638]
 [1.5374472 ]] 

After 4000 traning steps, w1 is: 
[[0.42332518]
 [1.4907393 ]] 

After 4500 traning steps, w1 is: 
[[0.48040026]
 [1.4465574 ]] 

After 5000 traning steps, w1 is: 
[[0.53113604]
 [1.4054536 ]] 

After 5500 traning steps, w1 is: 
[[0.5765325]
 [1.3675941]] 

...

After 16000 traning steps, w1 is: 
[[0.95107025]
 [1.0415728 ]] 

After 16500 traning steps, w1 is: 
[[0.9560928]
 [1.037164 ]] 

After 17000 traning steps, w1 is: 
[[0.96064115]
 [1.0331714 ]] 

After 17500 traning steps, w1 is: 
[[0.96476096]
 [1.0295546 ]] 

After 18000 traning steps, w1 is: 
[[0.9684917]
 [1.0262802]] 

After 18500 traning steps, w1 is: 
[[0.9718707]
 [1.0233142]] 

After 19000 traning steps, w1 is: 
[[0.974931 ]
 [1.0206276]] 

After 19500 traning steps, w1 is: 
[[0.9777026]
 [1.0181949]] 

Final w1 is :
[[0.98019385]
 [1.0159807 ]]

可以看到,随着迭代次数的增加,两个参数越来越趋进于1
拟合结果: y = 0.98 x 1 + 1.02 x 2 y = 0.98x_1 + 1.02x_2 y=0.98x1+1.02x2
(但是,由实际我们可知,预测商品销量,预测多了,损失成本;预测少了,损失利润. 所以接下来我们使用自定义损失函数)

  • 自定义损失函数
    若利润 ≠ \neq =成本,则mse产生的loss无法利益最大化
    自定义损失函数 l o s s ( y _ , y ) = ∑ n f ( y _ , y ) loss(y_{\_},y)=\sum_n f(y_{\_},y) loss(y_,y)=nf(y_,y)
    ,其中,y_是来自数据集的标准答案,y是计算出的预测答案
    f ( y _ , y ) = { P R O F I T ∗ ( y _ − y ) y < y _ 预 测 的 y 少 了 , 损 失 利 润 C O S T ∗ ( y − y _ ) y > = y _ 预 测 的 y 多 了 , 损 失 成 本 f(y_{\_},y)=\begin{cases} PROFIT*(y_{\_} - y) & y < y_{\_} & 预测的y少了,损失利润\\ COST*(y-y_{\_}) & y >= y_{\_} & 预测的y多了,损失成本 \end{cases} f(y_,y)={PROFIT(y_y)COST(yy_)y<y_y>=y_y,y,
    loss = tf.reduce_sum(tf.where(tf.greater(y,y_),COST(y-y_), PROFIT(y_-y)))
    如:预测酸奶销量,成本(COST)1元, 利润(PROFIT)9元.
    预测小了损失利润9元,预测大了损失成本1元
    预测少了损失大,希望函数往多了预测.
    代码如下:
  1 #coding:utf-8
  2 import tensorflow as tf
  3 import numpy as np
  4 BATCH_SIZE = 8
  5 SEED = 23455
  6 COST = 1
  7 PROFIT = 9
  8 
  9 rdm = np.random.RandomState(SEED)
 10 X = rdm.rand(32,2)
 11 Y_= [[x1 + x2 + rdm.rand()/10.0-0.05] for (x1,x2) in X]
 12 
 13 x = tf.placeholder(tf.float32, shape = (None,2))
 14 y_= tf.placeholder(tf.float32, shape = (None,1))
 15 w1= tf.Variable(tf.random_normal([2,1],stddev = 1, seed = 1))
 16 y = tf.matmul(x, w1)
 17 
 18 loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_)*COST, (y_ - y)*PRO    FIT))
 19 train_step = tf.train.GradientDescentOptimizer(0.001).minimize(loss)
 20 
 21 with tf.Session() as sess:
 22     init_op = tf.global_variables_initializer()
 23     sess.run(init_op)
 24     STEPS = 20000
 25     for i in range (STEPS):
 26         start = (i*BATCH_SIZE) % 32
 27         end = (i*BATCH_SIZE) % 32 + BATCH_SIZE
 28         sess.run(train_step, feed_dict={x: X[start:end], y_: Y_[start:end]})
 29         if i % 500 == 0:
 30             print "After %d training steps, w1 is: " % (i)
 31             print sess.run(w1), "\n"
 32     print "Final w1 is :\n", sess.run(w1)

拟合结果如下:

Final w1 is :
[[1.020171 ]
 [1.0425103]]

可见两个参数均大于1,模型在往多了预测

  • 交叉熵ce(Cross Entropy): 表征两个概率分布之间的距离 H ( y _ , y ) = − ∑ y _ ∗ l o g y H(y_{\_},y)=-\sum y_{\_}*logy H(y_,y)=y_logy
    • ce = -tf.reduce_mean(y_ *tf.log(tf.clip_by_value(y, 1e-12, 10.)))
    • y小于 1 0 − 12 10^{-12} 1012时为 1 0 − 12 10^{-12} 1012,大于1.0时为1.0
    • softmax():
      1. 当n分类的n个输出 ( y 1 , y 2 , . . . , y n ) (y_1,y_2,...,y_n) (y1,y2,...,yn)通过softmax()函数,便满足了概率分布需求: ∀ x , P ( X = x ) ∈ [ 0 , 1 ] 且 ∑ x P ( X = x ) = 1 \forall x,P(X=x)\in [0,1]且\sum_x P(X=x) = 1 x,P(X=x)[0,1]xP(X=x)=1
      2. s o f t m a x ( y i ) = e y i ∑ j = 1 n e y i softmax(y_i) = \frac{e^{y_i}}{\sum_{j=1}^n e^{y_i}} softmax(yi)=j=1neyieyi
      3. ce = tf.nn.aparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
        cem=tf.reduce_mean(ce)

(二) 学习率

  • 学习率learninng_rate: 每次参数更新的幅度
    w n + 1 = w n − l e a r n i n g _ r a t e ∇ w_{n+1}=w_n -learning\_rate\nabla wn+1=wnlearning_rate
    • w n + 1 w_{n+1} wn+1:更新后的参数
    • w n w_n wn: 当前参数
    • learning_rate: 学习率
    • ∇ \nabla : 损失函数的梯度(导数)

一个例子

  1 #coding:utf-8
  2 #设损失函数 loss=(w+1)^2 ,令w初值是常数5. 反向传播就是求最优w,即最小loss对应
    的w值
  3 import tensorflow as tf
  4 #定义待优化参数w初值为5
  5 w = tf.Variable(tf.constant(5, dtype=tf.float32))
  6 #定义损失函数loss
  7 loss = tf.square(w+1)
  8 #定义反向传播方法
  9 train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
 10 #生成会话,训练40轮
 11 with tf.Session() as sess:
 12     init_op=tf.global_variables_initializer()
 13     sess.run(init_op)
 14     for i in range(40):
 15         sess.run(train_step)
 16         w_val = sess.run(w)
 17         loss_val = sess.run(loss)
 18         print "After %s steps: w is %f,  loss is %f.\n" %(i, w_val, loss_val    )

结果:

After 0 steps: w is 2.600000,  loss is 12.959999.
After 1 steps: w is 1.160000,  loss is 4.665599.
After 2 steps: w is 0.296000,  loss is 1.679616.
After 3 steps: w is -0.222400,  loss is 0.604662.
After 4 steps: w is -0.533440,  loss is 0.217678.
After 5 steps: w is -0.720064,  loss is 0.078364.
After 6 steps: w is -0.832038,  loss is 0.028211.
After 7 steps: w is -0.899223,  loss is 0.010156.
After 8 steps: w is -0.939534,  loss is 0.003656.
After 9 steps: w is -0.963720,  loss is 0.001316.
After 10 steps: w is -0.978232,  loss is 0.000474.
After 11 steps: w is -0.986939,  loss is 0.000171.
After 12 steps: w is -0.992164,  loss is 0.000061.
After 13 steps: w is -0.995298,  loss is 0.000022.
After 14 steps: w is -0.997179,  loss is 0.000008.
After 15 steps: w is -0.998307,  loss is 0.000003.
After 16 steps: w is -0.998984,  loss is 0.000001.
After 17 steps: w is -0.999391,  loss is 0.000000.
After 18 steps: w is -0.999634,  loss is 0.000000.
After 19 steps: w is -0.999781,  loss is 0.000000.
After 20 steps: w is -0.999868,  loss is 0.000000.
After 21 steps: w is -0.999921,  loss is 0.000000.
After 22 steps: w is -0.999953,  loss is 0.000000.
After 23 steps: w is -0.999972,  loss is 0.000000.
After 24 steps: w is -0.999983,  loss is 0.000000.
After 25 steps: w is -0.999990,  loss is 0.000000.
After 26 steps: w is -0.999994,  loss is 0.000000.
After 27 steps: w is -0.999996,  loss is 0.000000.
After 28 steps: w is -0.999998,  loss is 0.000000.
After 29 steps: w is -0.999999,  loss is 0.000000.
After 30 steps: w is -0.999999,  loss is 0.000000.
After 31 steps: w is -1.000000,  loss is 0.000000.
After 32 steps: w is -1.000000,  loss is 0.000000.
After 33 steps: w is -1.000000,  loss is 0.000000.
After 34 steps: w is -1.000000,  loss is 0.000000.
After 35 steps: w is -1.000000,  loss is 0.000000.
After 36 steps: w is -1.000000,  loss is 0.000000.
After 37 steps: w is -1.000000,  loss is 0.000000.
After 38 steps: w is -1.000000,  loss is 0.000000.
After 39 steps: w is -1.000000,  loss is 0.000000.

改变学习率大小后再运行,发现 学习率大了震荡不收敛,小了收敛速度慢

  • 指数衰减学习率:
    • 根据运行BATCH_SIZE的轮数动态更新学习率

    • l e a r n i n g _ r a t e = L E A R N I N G _ R A T E _ B A S E × L E A R N I N G _ R A T E _ D E C A Y g l o b a l _ s t e p L E A R N I N G _ R A T E _ S T P E learning\_rate=LEARNING\_RATE\_BASE \times LEARNING\_RATE\_DECAY^{\frac{global\_step}{LEARNING\_RATE\_STPE}} learning_rate=LEARNING_RATE_BASE×LEARNING_RATE_DECAYLEARNING_RATE_STPEglobal_step
      其中,learningn_rate是 学习率基数;learning_rate_base是 学习率初始值 ; learning_rate_decay是 学习率衰减率(0,1) ;global_step是 运行了几轮BATCH_SIZE;多少轮更新一次学习率= 总 样 本 数 B A T C H _ S I Z E \frac{总样本数}{BATCH\_SIZE} BATCH_SIZE

    global_step = tf.Variable(0, trainable=False)#运行到第几轮的计数器
    learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,global_step,LEARNING_RATE_STEP,LEARNING_RATE_DECAY,staircase=True)#为True时,学习率呈梯形衰减;反之学习率是一条平滑下降的曲线
    

一个例子

  1 #coding:utf-8
  2 #设损失函数 loss=(w+1)^2, 令w初值是常熟10
  3 #使用指数衰减的学习率,在迭代初期得到较高的下降速度,可以在较小的训练轮数下取得更快收敛速度
  4 import tensorflow as tf
  5 
  6 LEARNING_RATE_BASE = 0.1#最初学习率
  7 LEARNING_RATE_DECAY = 0.99#学习率衰减率
  8 LEARNING_RATE_STEP = 1#喂入多少轮BATCH_SIZE后更新一次学习率,一般是 总样本数/BATCH_SIZE 的值
  9 
 10 #运行了几轮BATCH_SIZE的计数器,初值给0,意为不被训练
 11 global_step = tf.Variable(0, trainable=False)
 12 #定义指数下降学习率
 13 learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, LEARNING_RATE_STEP, LEARNING_RATE_DECAY, staircase=True)
 14 #定义待优化函数, 初值给10
 15 w = tf.Variable(tf.constant(5, dtype=tf.float32))
 16 #定义损失函数loss
 17 loss = tf.square(w+1)
 18 #定义反向传播方法
 19 train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
 20 #生成会话, 训练40轮
 21 with tf.Session() as sess:
 22     init_op = tf.global_variables_initializer()
 23     sess.run(init_op)
 24     for i in range(40):
 25         sess.run(train_step)
 26         learning_rate_val = sess.run(learning_rate)
 27         global_step_val = sess.run(global_step)
 28         w_val = sess.run(w)
 29         loss_val = sess.run(loss)
 30         print "\nAfter %s steps: global_step is %f, w is %f\n learning_rate is %f\n loss is %f" % (i, global_step_val, w_val, learning_rate_val, loss_val)
  • 运行结果
After 0 steps: global_step is 1.000000, w is 3.800000
 learning_rate is 0.099000
 loss is 23.040001

After 1 steps: global_step is 2.000000, w is 2.849600
 learning_rate is 0.098010
 loss is 14.819419

After 2 steps: global_step is 3.000000, w is 2.095001
 learning_rate is 0.097030
 loss is 9.579033

After 3 steps: global_step is 4.000000, w is 1.494386
 learning_rate is 0.096060
 loss is 6.221961

After 4 steps: global_step is 5.000000, w is 1.015167
 learning_rate is 0.095099
 loss is 4.060896

After 5 steps: global_step is 6.000000, w is 0.631886
 learning_rate is 0.094148
 loss is 2.663051

After 6 steps: global_step is 7.000000, w is 0.324608
 learning_rate is 0.093207
 loss is 1.754587

After 7 steps: global_step is 8.000000, w is 0.077684
 learning_rate is 0.092274
 loss is 1.161403
...

After 33 steps: global_step is 34.000000, w is -0.989550
 learning_rate is 0.071055
 loss is 0.000109

After 34 steps: global_step is 35.000000, w is -0.991035
 learning_rate is 0.070345
 loss is 0.000080

After 35 steps: global_step is 36.000000, w is -0.992297
 learning_rate is 0.069641
 loss is 0.000059

After 36 steps: global_step is 37.000000, w is -0.993369
 learning_rate is 0.068945
 loss is 0.000044

After 37 steps: global_step is 38.000000, w is -0.994284
 learning_rate is 0.068255
 loss is 0.000033

After 38 steps: global_step is 39.000000, w is -0.995064
 learning_rate is 0.067573
 loss is 0.000024

After 39 steps: global_step is 40.000000, w is -0.995731
 learning_rate is 0.066897
 loss is 0.000018

(三) 滑动平均ema

  1. 滑动平均(影子): 记录了没一个参数(包括w和b)在过往一段时间内值的平均,增加了模型的泛化性
    • 参数发生变化,影子慢慢追随
    • 影 子 = 衰 减 率 × 影 子 + ( 1 − 衰 减 率 ) × 参 数 影子=衰减率 \times 影子 + (1-衰减率) \times 参数 =×+(1)×
      • 影 子 初 值 = 参 数 初 值 影子初值 = 参数初值 =
      • 衰 减 率 ( M O V I N G _ A V E R A G E _ D E C A Y ) = m i n { 衰 减 率 , 1 + 轮 数 10 + 轮 数 } 衰减率(MOVING\_AVERAGE\_DECAY) = min\{衰减率, \frac{1+轮数}{10+轮数}\} (MOVING_AVERAGE_DECAY)=min{,10+1+}
    ema = tf.train.ExponentialMovingAverage(衰减率MOVING_AVERAGE_DECAY,当前轮数global_step)#计算衰减率
    ema_op = ema.apply([])#[]中列出需要计算滑动平均的参数,或者将[]用函数用tf.trainable_variables()代替,可自动列出所有参数
    with tf.control_dependencise([train_step, ema_op]):
        train_op = tf.no_op(name='train')
    
    还可以用函数返回某参数的滑动平均值:
    ema.average(参数名)
    

一个例子

  1 #coding:utf-8
  2 import tensorflow as tf
  3 
  4 w1 = tf.Variable(0, dtype=tf.float32)
  5 global_step = tf.Variable(0, trainable=False)
  6 MOVING_AVERAGE_DECAY = 0.99
  7 ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
  8 
  9 ema_op = ema.apply(tf.trainable_variables())
 10 
 11 with tf.Session() as sess:
 12     init_op = tf.global_variables_initializer()
 13     sess.run(init_op)
 14 
 15     print sess.run([w1,ema.average(w1)])
 16 
 17     sess.run(tf.assign(w1,1))
 18     sess.run(ema_op)
 19     print sess.run([w1, ema.average(w1)])
 20 
 21     sess.run(tf.assign(global_step, 100))
 22     sess.run(tf.assign(w1, 10))
 23     sess.run(ema_op)
 24     print sess.run([w1, ema.average(w1)])
 25 
 26     for i in range(5):
 27         sess.run(ema_op)
 28         print sess.run([w1, ema.average(w1)])

运行结果

[0.0, 0.0]
[1.0, 0.9]
[10.0, 1.6445453]
[10.0, 2.3281732]
[10.0, 2.955868]
[10.0, 3.532206]
[10.0, 4.061389]
[10.0, 4.547275]    
  • 可以看到,开始时参数w1是0,滑动平均是0;w1设定为1,滑动平均为0.9;当迭代轮数更新为100轮时,参数设定为10,滑动平均向10逼近

(四) 正则化

  1. 过拟合现象: 模型在训练数据集上的正确率非常高,但对于未训练的新数据很难作出正确反映,即过拟合现象
  2. 正则化缓解过拟合: 正则化在损失函数中引入模型复杂度指标, 利用给W加权值,弱化训练数据的噪声(一般不正则化b) l o s s = l o s s ( y 与 y _ ) + R E G U L A R I Z E R ∗ l o s s ( w ) loss=loss(y与y_{\_})+REGULARIZER*loss(w) loss=loss(yy_)+REGULARIZERloss(w)
    • loss(y与y_): 模型中所有参数的损失函数
    • REGULARIZER: 用超参数REGULARIZER给出参数w在总loss中的比例,及正则化的权重
    • loss(w): 需要正则化的参数
loss(w)=tf.contrib.layers.l1_regularizer(REGULARIZER)(w)#w绝对值求和
loss(w)=tf.contrib.layers.l2_regularizer(REGULARIZER)(w)#w平方求和
//以上正则化二选一
tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regulizer)(w))#把内容加到集合对应位置做加法
loss = cem + tf.add_n(tf.get_collection('losses'))

一个例子:
随机给出平面坐标上的一些点,当点的x和y坐标值平方和小于2时,该点为红色,否则为蓝色.要求拟合一条线作为红蓝点阵之间的分界线

  1 #coding:utf-8
  2 import tensorflow as tf
  3 import numpy as np
  4 import matplotlib.pyplot as plt
  5 BATCH_SIZE = 30
  6 seed = 2
  7 
  8 rdm = np.random.RandomState(seed)
  9 
 10 X = rdm.randn(300,2)
 11 Y_ = [int(x0*x0 + x1*x1 < 2) for (x0,x1) in X]
 12 Y_c = [['red' if y else 'blue'] for y in Y_]
 13 
 14 X = np.vstack(X).reshape(-1,2)#n行2列
 15 Y_ = np.vstack(Y_).reshape(-1,1)
 16 print X
 17 print Y_
 18 print Y_c
 19 
 20 plt.scatter(X[:,0], X[:,1], c = np.squeeze(Y_c))#X[:,0]表示取X第一行元素
 21 plt.show()
 22 
 23 def get_weight(shape, regularizer):
 24     w = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
 25     tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
 26     return w
 27 
 28 def get_bias(shape):
 29     b = tf.Variable(tf.constant(0.01, shape=shape))
 30     return b
 31 
 32 x = tf.placeholder(tf.float32, shape=(None, 2))
 33 y_ = tf.placeholder(tf.float32, shape=(None, 1))
 34 
 35 w1 = get_weight([2,11], 0.01)
 36 b1 = get_bias([11])
 37 y1 = tf.nn.relu(tf.matmul(x, w1)+b1)
 38 
 39 w2 = get_weight([11,1], 0.01)
 40 b2 = get_bias([1])
 41 y = tf.matmul(y1, w2)+b2#输出层不激活
 42 
 43 loss_mse = tf.reduce_mean(tf.square(y-y_))
 44 loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))
 45 
 46 train_step = tf.train.AdamOptimizer(0.0001).minimize(loss_mse)
 47 
 48 with tf.Session() as sess:
 49     init_op = tf.global_variables_initializer()
 50     sess.run(init_op)
 51     STEPS = 40000
 52     for i in range(STEPS):
 53         start = (i*BATCH_SIZE) % 300
 54         end = start + BATCH_SIZE
 55         sess.run(train_step, feed_dict={x:X[start:end], y_:Y_[start:end]})
 56         if i % 2000 == 0:
 57             loss_mse_v = sess.run(loss_mse, feed_dict={x:X, y_:Y_})
 58             print("After %d steps, loss is:%f" %(i, loss_mse_v))
 59 
 60     xx, yy = np.mgrid[-3:3:.01, -3:3:.01]
 61     grid = np.c_[xx.ravel(), yy.ravel()]
 62     probs = sess.run(y, feed_dict={x:grid})
 63     probs = probs.reshape(xx.shape)
 64 
 65     print "w1:\n",sess.run(w1)
 66     print "b1:\n",sess.run(b1)
 67     print "w2:\n",sess.run(w2)
 68     print "b2:\n",sess.run(b2)
 69 
 70 plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
 71 plt.contour(xx, yy, probs, levels=[.5])
 72 plt.show()
 73 
 74 #定义反向传播方法:包含正则化
 75 train_step = tf.train.AdamOptimizer(0.0001).minimize(loss_total)
 76 
 77 with tf.Session() as sess:
 78     init_op = tf.global_variables_initializer()
 79     sess.run(init_op)
 80     STEPS = 40000
 81     for i in range(STEPS):
 82         start = (i*BATCH_SIZE) % 300
 83         end = start + BATCH_SIZE
 84         sess.run(train_step, feed_dict={x: X[start:end], y_:Y_[start:end]})
 85         if i % 2000 == 0:
 86             loss_v = sess.run(loss_total, feed_dict={x:X, y_:Y_})
 87             print("After %d steps, loss is:%f" %(i, loss_v))
 88 
 89     xx, yy = np.mgrid[-3:3:.01, -3:3:.01]
 90     grid = np.c_[xx.ravel(), yy.ravel()]
 91     probs = sess.run(y, feed_dict={x:grid})
 92     probs = probs.reshape(xx.shape)
 93     print "w1:\n",sess.run(w1)
 94     print "b1:\n",sess.run(b1)
 95     print "w2:\n",sess.run(w2)
 96     print "b2:\n",sess.run(b2)
 97 
 98 plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
 99 plt.contour(xx, yy, probs, levels=[.5])
100 plt.show()
  • 包含正则化的模型具有更好的泛化性

(五) 神经网络搭建八股

  1. forward.py
    前向传播: 搭建网络,设计网络结构
def forward(x, regularizer):
    w=
    b=
    y=
    return y

def get_weight(shape, regularizer):
    w=tf.Variable()
    tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))#把每一个w的正则化损失加到总损失中返回正则化函数
    return w

def get_bias(shape):#参数:某层中b的个数
    b=tf.Variable()
    return b
  1. backward.py
    反向传播: 训练网络,优化网络参数
def backward():
    x=tf.placeholder()
    y_=tf.placeholder()
    y=forward.forward(x,REGULARIZER)
    global_step=tf.Variable(0,trainable=False)
    loss=

//正则化
y与y_的差距(loss_mse)=tf.reduce_mean(tf.square(y-y_))#均方误差
或
ce = tf.nn.spqrse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argnax(y_, 1))//交叉熵
y与y_的差距(cem)=tf.reduce_mean(ce)

加入正则化后
loss = y与y_的差距 + tf.add_n(tf.get_collection('losses'))

//指数衰减学习率
learning_rate = tf.train.exponential_decay(
    LLEARNING_RATE_BASE,
    global_step,
    数据集总样本数/BATCH_SIZE,
    LEARNING_RATE_DECAY,
    staircases=True)
)

//滑动平均
见前一章代码

一个例子:

  • generateds.py
  1 #coding:utf-8
  2 import numpy as np
  3 import matplotlib.pyplot as plt
  4 
  5 seed = 2
  6 
  7 def generateds():
  8     rdm = np.random.RandomState(seed)
  9     X = rdm.randn(300,2)
 10     Y_ = [int(x0*x0 + x1*x1 < 2) for (x0, x1) in X]
 11     Y_c = [['red' if y else 'blue'] for y in Y_]
 12     X = np.vstack(X).reshape(-1,2)
 13     Y_ = np.vstack(Y_).reshape(-1,1)
 14 
 15     return X, Y_, Y_c
  • forward.py
  1 #coding:utf-8
  2 import tensorflow as tf
  3 
  4 def get_weight(shape, regularizer):
  5     w = tf.Variable(tf.random_normal(shape), dtype=tf.float32)
  6     tf.add_to_collection('losses', tf.contrib.layers.l2_regularizer(regularizer)(w))
  7     return w
  8 
  9 def get_bias(shape):
 10     b = tf.Variable(tf.constant(0.01, shape=shape))
 11     return b
 12 
 13 def forward(x, regularizer):
 14     w1 = get_weight([2,11], regularizer)
 15     b1 = get_bias([11])
 16     y1 = tf.nn.relu(tf.matmul(x, w1) + b1)
 17 
 18     w2 = get_weight([11,1], regularizer)
 19     b2 = get_bias([1])
 20     y = tf.matmul(y1, w2) + b2
 21 
 22     return y
  • backward.py
  1 #coding:utf-8
  2 import tensorflow as tf
  3 import numpy as np
  4 import matplotlib.pyplot as plt
  5 import opt4_8_generateds
  6 import opt4_8_forward
  7 
  8 STEPS = 40000
  9 BATCH_SIZE = 30
 10 LEARNING_RATE_BASE = 0.001
 11 LEARNING_RATE_DECAY = 0.999
 12 REGULARIZER = 0.01
 13 
 14 def backward():
 15     x = tf.placeholder(tf.float32, shape=(None, 2))
 16     y_ = tf.placeholder(tf.float32, shape=(None, 1))
 17 
 18     X, Y_, Y_c = opt4_8_generateds.generateds()
 19 
 20     y = opt4_8_forward.forward(x, REGULARIZER)
 21 
 22     global_step = tf.Variable(0, trainable=False)
 23 
 24     learning_rate = tf.train.exponential_decay(
 25             LEARNING_RATE_BASE,
 26             global_step,
 27             300/BATCH_SIZE,
 28             LEARNING_RATE_DECAY,
 29             staircase=True)
 30 
 31     loss_mse = tf.reduce_mean(tf.square(y-y_))
 32     loss_total = loss_mse + tf.add_n(tf.get_collection('losses'))
 33 
 34     train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss_total)
 35 
 36     with tf.Session() as sess:
 37         init_op = tf.global_variables_initializer()
 38         sess.run(init_op)
 39         for i in range(STEPS):
 40             start = (i*BATCH_SIZE) % 300
 41             end = start + BATCH_SIZE
 42             sess.run(train_step, feed_dict={x: X[start:end], y_:Y_[start:end]})
 43             if i % 2000 == 0:
 44                 loss_v = sess.run(loss_total, feed_dict={x:X, y_:Y_})
 45                 print("After %d steps, loss is: %f" % (i, loss_v))
 46 
 47         xx, yy = np.mgrid[-3:3:.01, -3:3:.01]
 48         grid = np.c_[xx.ravel(), yy.ravel()]
 49         probs = sess.run(y, feed_dict={x:grid})
 50         probs = probs.reshape(xx.shape)
 51 
 52     plt.scatter(X[:,0], X[:,1], c=np.squeeze(Y_c))
 53     plt.contour(xx, yy, probs, levels=[.5])
 54     plt.show()
 55 
 56 if __name__ == '__main__':
 57     backward()

六. 全连接网络基础

(一) MNIST数据集

  • MNIST数据集: 含7w张图片,其中有6w张用来训练,1w张用来测试.其中每张图片大小为28*28(=784,即长为784的一维数组)像素
  • 数组中,每位数字在0到1之间,黑底用0表示,白色用1表示,越接近1,颜色越白
  • 数组有一个标签(长为10的一维数组),每一位表示0~9之间的数字可能的概率
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('(保存路径)./data/',one_hot=True)#以读热码的形式存取

//返回各子集样本数
print "train data size:", mnist.train.num_examples
print "validation data size:", mnist.validation.num_examples
print "test data size:", mnist.test.num_examples

//返回标签和数据
mnist.train.labels[0(编号)]#返回训练集中制定编号的标签或图片
mnist.train.images[0(编号)]

//取一小撮数据,准备喂入神经网络训练
BATCH_SIZE = 200#一次喂200张图
xs, ys = mnist.train.next_batch(BATCH_SZIE)#从训练集中随机抽取BATCH_SIZE组标签,分别赋给xs和ys
print "xs shape:", xs.shape#(200,784)200行,每行784个像素
print "ys shape:", ys.shape#(200,10) 200行,每行10个元素表示10个标签
几个函数
  1. tf.get_collection("")从集合中取全部变量,生成一个列表
  2. tf.add_n([])列表内对应元素相加
  3. tf.cast(x,dtype)把x转为dtype类型
  4. tf.argmax(x,axis)返回最大值所在索引号,如tf.argmax([1,0,0],1) 在第一维度找最大值索引号,返回0
  5. os.path.join("home","name")os模块的函数,返回路径(home/name)
  6. 字符串.split()按指定拆分对字符串切片,返回分割后的列表
  7. with tf.Graph().as_default() as g:其内定义的节点在计算图g中
模型的保存和加载
//保存
saver=tf.train.Saver()#实例化saver对象
with tf.Session() as sess:
    for i in range(STEPS):
        if i % 轮数 == 0:
            saver.save(sess,os.path.join(MODEL_SAVE_PATH,MODEL_NAME),global_step=global_step)

//加载
with tf.Session() as sess:
    ckpt=tf.train.get_checkpoint_state(存储路径)
    if ckpt and ckpt.medel_checkpoint_path:
        saver.restore(sess.ckpt.model_checkpoint_path)

// 实例化可还原滑动平均值的saver
ema=tf.train.ExponentialMovingAverage(滑动平均基数)
ema_restore=ema.variables_to_restore()
saver=tf.train.Saver(ema_restore)

//计算准确率
correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

(二) 模块化搭建全链接神经网络

遇到问题
  1. TensorFlow IOError: [Errno socket error] [Errno 104] Connection reset by peer
    解决方法: 网络出问题,看看能不能访问http://yann.lecun.com/exdb/mnist/,调节网络配置,翻过防火墙,能够访问后就没有问题了。 贴一个大牛的解决方法
backward.py
def backward(mnist):
    x=
    y_=
    y=
    global_step=
    loss=
    //<正则化,指数衰减学习率,滑动平均>
    train_step=
    实例化saver
    with tf.Session() as sess:
        初始化
        for i in range(STEPS):
            sess.run(train_step,feed_dict={x: ,y_: })
            if i % 轮数 ==0:
                print
                saver.save( )

//损失函数loss含正则化regularization
//backward.py中加
ce=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_,1))
cem=tf.reduce_mean(ce)
loss=cem+tf.add_n(tf.get_collection('losses'))
//forward.py中加
if regularizer != None:tf.add_to_collection('losses',tf.contrib.layers.l2_regularizer(regularizer)(w))

//学习率learning_rate
//backward.py中加入
learning_rate=tf.train.exponential_decay(
    LEARNING_RATE_BASE,
    global_step,
    LEARNING_RATE_STEP,
    LEARNING_RATE_DECAY,
    staircase=True)

//滑动平均ema
//backward.py中加入
ema=tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
ema_op=ema.apply(tf.trainable_variables())
with tf.control_dependencies([train_step,ema_op]):
    train_op=tf.no_op(name='train')
test.py
def test(mnist):
    with tf.Graph().as_default() as g:
        定义x y_ y
        实例化可还原滑动平均值的saver
        计算正确率
        while True:
            with tf.Session() as sess:
                ckpt=tf.train.get_checkpoint_state(存储路径)//加载ckpt模型
                if ckpt and ckpt.model_checkpoint_path://如果已有ckpt模型则恢复
                    saver.restore(sess,ckpt.model_checkpoint_path)//恢复会话
                    global_step=ckpt.model_checkpoint_path.spilt('/')[-1].split('-')[-1]//恢复轮数
                    accuracy_score=sess.run(accuracy,feed_dict={x:mnist.test.images,y_:mnist.test.labels})//计算准确率
                    打印提示
                else://如果没有模型
                    给出提示(print)
                    return

def main():
    mnist=input_data.read_data_sets("./data/",one_hot=True)
    test(mnist)
if __name__ == '__main__':
    main()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值