深度学习-06(PaddlePaddle体系结构与基本概念[Tensor、Layer、Program、Variable、Executor、Place]线性回归、波士顿房价预测)

本文链接：https://blog.csdn.net/yegeli/article/details/107480890

深度学习-06(PaddlePaddle基础)

paddlePaddle概述

在这里插入图片描述

PaddlePaddle简介

什么是PaddlePaddle

在这里插入图片描述

为什么学习PaddlePaddle

开源、国产
能更好、更快解决工程实际问题

PaddlePaddle优点

在这里插入图片描述

PaddlePaddle缺点

教材少
学习难度大、曲线陡峭

国际竞赛获奖情况

在这里插入图片描述

行业应用

在这里插入图片描述

课程预览

在这里插入图片描述

学习资源

在这里插入图片描述

体系结构

在这里插入图片描述

体系结构

总体架构

在这里插入图片描述

编译时与执行时

在这里插入图片描述

三个重要术语

Fluid :定义程序执行流程
Program :对用户来说一个完整的程序
Executor :执行器,执行程序

案例1：快速开始

# helloworld示例
import paddle.fluid as fluid

# 创建两个类型为int64, 形状为1*1张量
x = fluid.layers.fill_constant(shape=[1], dtype="int64", value=5)
y = fluid.layers.fill_constant(shape=[1], dtype="int64", value=1)
z = x + y # z只是一个对象,没有run,所以没有值

# 创建执行器
place = fluid.CPUPlace() # 指定在CPU上执行
exe = fluid.Executor(place) # 创建执行器
result = exe.run(fluid.default_main_program(),
                 fetch_list=[z]) #返回哪个结果
print(result) # result为多维张量

基本概念与操作

在这里插入图片描述

基本概念

张量

在这里插入图片描述

Layer

在这里插入图片描述

Variable

在这里插入图片描述

Pogram

在这里插入图片描述

Executor

在这里插入图片描述

Place

在这里插入图片描述

Optimizer

优化器,用于优化网络,一般用来对损失函数做梯度下降优化,从而求得最小损失值

案例2：执行两个张量计算


import paddle.fluid as fluid
import numpy

# 创建x, y两个2行3列，类型为float32的变量(张量)
x = fluid.layers.data(name="x", shape=[2, 3], dtype="float32")
y = fluid.layers.data(name="y", shape=[2, 3], dtype="float32")

x_add_y = fluid.layers.elementwise_add(x, y)  # 两个张量按元素相加
x_mul_y = fluid.layers.elementwise_mul(x, y)  # 两个张量按元素相乘

place = fluid.CPUPlace()  # 指定在CPU上执行
exe = fluid.Executor(place)  # 创建执行器
exe.run(fluid.default_startup_program())  # 初始化网络

a = numpy.array([[1, 2, 3],
                 [4, 5, 6]])  # 输入x, 并转换为数组
b = numpy.array([[1, 1, 1],
                 [2, 2, 2]])  # 输入y, 并转换为数组

params = {"x": a, "y": b}
outs = exe.run(fluid.default_main_program(),  # 默认程序上执行
               feed=params,  # 喂入参数
               fetch_list=[x_add_y, x_mul_y])  # 获取结果
for i in outs:
    print(i)

程序执行步骤

在这里插入图片描述

案例3：编写简单线性回归

在这里插入图片描述

代码实现


# 简单线性回归
import paddle
import paddle.fluid as fluid
import numpy as np
import matplotlib.pyplot as plt

train_data = np.array([[0.5], [0.6], [0.8], [1.1], [1.4]]).astype('float32')
y_true = np.array([[5.0], [5.5], [6.0], [6.8], [6.8]]).astype('float32')

# 定义数据数据类型
x = fluid.layers.data(name="x", shape=[1], dtype="float32")
y = fluid.layers.data(name="y", shape=[1], dtype="float32")
# 通过全连接网络进行预测
y_preict = fluid.layers.fc(input=x, size=1, act=None)
# 添加损失函数
cost = fluid.layers.square_error_cost(input=y_preict, label=y)
avg_cost = fluid.layers.mean(cost)  # 求均方差
# 定义优化方法
optimizer = fluid.optimizer.SGD(learning_rate=0.01)
optimizer.minimize(avg_cost)  # 指定最小化均方差值

# 搭建网络
place = fluid.CPUPlace()  # 指定在CPU执行
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())  # 初始化系统参数

# 开始训练, 迭代100次
costs = []
iters = []
values = []
params = {"x": train_data, "y": y_true}
for i in range(200):
    outs = exe.run(feed=params, fetch_list=[y_preict.name, avg_cost.name])
    iters.append(i)  # 迭代次数
    costs.append(outs[1][0])  # 损失值
    print("i:", i, " cost:", outs[1][0])

# 线性模型可视化
tmp = np.random.rand(10, 1)  # 生成10行1列的均匀随机数组
tmp = tmp * 2  # 范围放大到0~2之间
tmp.sort(axis=0)  # 排序
x_test = np.array(tmp).astype("float32")
params = {"x": x_test, "y": x_test}  # y参数不参加计算，只需传一个参数避免报错
y_out = exe.run(feed=params, fetch_list=[y_preict.name])  # 预测
y_test = y_out[0]

# 损失函数可视化
plt.figure("Trainging")
plt.title("Training Cost", fontsize=24)
plt.xlabel("Iter", fontsize=14)
plt.ylabel("Cost", fontsize=14)
plt.plot(iters, costs, color="red", label="Training Cost")  # 绘制损失函数曲线
plt.grid()  # 绘制网格线
plt.savefig("train.png")  # 保存图片

# 线性模型可视化
plt.figure("Inference")
plt.title("Linear Regression", fontsize=24)
plt.plot(x_test, y_test, color="red", label="inference")  # 绘制模型线条
plt.scatter(train_data, y_true)  # 原始样本散点图

plt.legend()
plt.grid()  # 绘制网格线
plt.savefig("infer.png")  # 保存图片
plt.show()  # 显示图片

fluid API结构图

在这里插入图片描述

数据准备

在这里插入图片描述

数据准备

什么是数据准备

在这里插入图片描述

为什么需要数据准备

在这里插入图片描述

案例4：使用reader

在这里插入图片描述

实现多元回归

数据集及任务

在这里插入图片描述

思路

在这里插入图片描述

执行结果

在这里插入图片描述

案例5：波士顿放假预测

# 多元回归示例：波士顿房价预测
''' 数据集介绍:
 1) 共506行，每行14列，前13列描述房屋特征信息，最后一列为价格中位数
 2) 考虑了犯罪率（CRIM）        宅用地占比（ZN）
    非商业用地所占尺寸（INDUS）  查尔斯河虚拟变量（CHAS）
    环保指数（NOX）            每栋住宅的房间数（RM）
    1940年以前建成的自建单位比例（AGE）   距离5个波士顿就业中心的加权距离（DIS）
    距离高速公路便利指数（RAD）          每一万元不动产税率（TAX）
    教师学生比（PTRATIO）              黑人比例（B）
    房东属于中低收入比例（LSTAT）
'''
import paddle
import paddle.fluid as fluid
import numpy as np
import os
import matplotlib.pyplot as plt

# step1: 数据准备
# paddle提供了uci_housing训练集、测试集，直接读取并返回数据
BUF_SIZE = 500
BATCH_SIZE = 20

# 训练数据集读取器
random_reader = paddle.reader.shuffle(paddle.dataset.uci_housing.train(),
                                      buf_size=BUF_SIZE)  # 创建随机读取器
train_reader = paddle.batch(random_reader, batch_size=BATCH_SIZE)  # 训练数据读取器

# 打印数据
#train_data = paddle.dataset.uci_housing.train() 
#for sample_data in train_data():
#    print(sample_data)

# step2: 配置网络
# 定义输入、输出，类型均为张量
x = fluid.layers.data(name="x", shape=[13], dtype="float32")
y = fluid.layers.data(name="y", shape=[1], dtype="float32")
# 定义个简单的线性网络，连接输出层、输出层
y_predict = fluid.layers.fc(input=x,  # 输入数据
                            size=1,  # 输出值个数
                            act=None)  # 激活函数
# 定义损失函数，并将损失函数指定给优化器
cost = fluid.layers.square_error_cost(input=y_predict,  # 预测值，张量
                                      label=y)  # 期望值，张量
avg_cost = fluid.layers.mean(cost)  # 求损失值平均数
optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.001)  # 使用随机梯度下降优化器
opts = optimizer.minimize(avg_cost)  # 优化器最小化损失值

# 创建新的program用于测试计算
test_program = fluid.default_main_program().clone(for_test=True)

# step3: 模型训练、模型评估
place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())

feeder = fluid.DataFeeder(place=place, feed_list=[x, y])

iter = 0
iters = []
train_costs = []

EPOCH_NUM = 120
model_save_dir = "../model/uci_housing"  # 模型保存路径
for pass_id in range(EPOCH_NUM):
    train_cost = 0
    i = 0
    for data in train_reader():
        i += 1
        train_cost = exe.run(program=fluid.default_main_program(),
                             feed=feeder.feed(data),
                             fetch_list=[avg_cost])
        if i % 20 == 0:  # 每20笔打印一次损失函数值
            print("PassID: %d, Cost: %0.5f" % (pass_id, train_cost[0][0]))
        iter = iter + BATCH_SIZE  # 加上每批次笔数
        iters.append(iter)  # 记录笔数
        train_costs.append(train_cost[0][0])  # 记录损失值

# 保存模型
if not os.path.exists(model_save_dir):  # 如果存储模型的目录不存在，则创建
    os.makedirs(model_save_dir)
fluid.io.save_inference_model(model_save_dir,  # 保存模型的路径
                              ["x"],  # 预测需要喂入的数据
                              [y_predict],  # 保存预测结果的变量
                              exe)  # 模型
# 训练过程可视化
plt.figure("Training Cost")
plt.title("Training Cost", fontsize=24)
plt.xlabel("iter", fontsize=14)
plt.ylabel("cost", fontsize=14)
plt.plot(iters, train_costs, color="red", label="Training Cost")
plt.grid()
plt.savefig("train.png")

# step4: 模型预测
infer_exe = fluid.Executor(place)  # 创建用于预测的Executor
infer_scope = fluid.core.Scope()  # 修改全局/默认作用域, 运行时中的所有变量都将分配给新的scope
infer_result = [] #预测值列表
ground_truths = [] #真实值列表

# with fluid.scope_guard(infer_scope):
# 加载模型，返回三个值
# program: 预测程序(包含了数据、计算规则)
# feed_target_names: 需要喂入的变量
# fetch_targets: 预测结果保存的变量
[infer_program, feed_target_names, fetch_targets] = \
    fluid.io.load_inference_model(model_save_dir,  # 模型保存路径
                                  infer_exe)  # 要执行模型的Executor
# 获取测试数据
infer_reader = paddle.batch(paddle.dataset.uci_housing.test(),
                            batch_size=200)  # 测试数据读取器
test_data = next(infer_reader())  # 获取一条数据
test_x = np.array([data[0] for data in test_data]).astype("float32")
test_y = np.array([data[1] for data in test_data]).astype("float32")

x_name = feed_target_names[0]  # 模型中保存的输入参数名称
results = infer_exe.run(infer_program,  # 预测program
                        feed={x_name: np.array(test_x)},  # 喂入预测的值
                        fetch_list=fetch_targets)  # 预测结果
# 预测值
for idx, val in enumerate(results[0]):
    print("%d: %.2f" % (idx, val))
    infer_result.append(val)

# 真实值
for idx, val in enumerate(test_y):
    print("%d: %.2f" % (idx, val))
    ground_truths.append(val)

# 可视化
plt.figure('scatter')
plt.title("TestFigure", fontsize=24)
plt.xlabel("ground truth", fontsize=14)
plt.ylabel("infer result", fontsize=14)
x = np.arange(1, 30)
y = x
plt.plot(x, y)
plt.scatter(ground_truths, infer_result, color="green", label="Test")
plt.grid()
plt.legend()
plt.savefig("predict.png")
plt.show()

PassID: 0, Cost: 740.35278
PassID: 1, Cost: 606.08551
PassID: 2, Cost: 372.98065
PassID: 3, Cost: 449.45566
PassID: 4, Cost: 662.28650
PassID: 5, Cost: 366.99777
PassID: 6, Cost: 442.59750
PassID: 7, Cost: 220.53012
PassID: 8, Cost: 336.41757
PassID: 9, Cost: 346.11435
PassID: 10, Cost: 372.14886
PassID: 11, Cost: 222.66690
PassID: 12, Cost: 130.25555
PassID: 13, Cost: 233.00571
PassID: 14, Cost: 140.46913
PassID: 15, Cost: 297.38672
PassID: 16, Cost: 323.91489
PassID: 17, Cost: 97.08193
PassID: 18, Cost: 120.99207
PassID: 19, Cost: 116.71027
PassID: 20, Cost: 168.13571
PassID: 21, Cost: 74.17239
PassID: 22, Cost: 213.31438
PassID: 23, Cost: 130.29099
PassID: 24, Cost: 61.30040
PassID: 25, Cost: 371.26035
PassID: 26, Cost: 71.74551
PassID: 27, Cost: 110.92019
PassID: 28, Cost: 108.98529
PassID: 29, Cost: 83.41230
PassID: 30, Cost: 111.70056
PassID: 31, Cost: 104.67867
PassID: 32, Cost: 153.84940
PassID: 33, Cost: 89.61612
PassID: 34, Cost: 101.31937
PassID: 35, Cost: 118.40926
PassID: 36, Cost: 56.72471
PassID: 37, Cost: 16.30172
PassID: 38, Cost: 58.26217
PassID: 39, Cost: 40.01674
PassID: 40, Cost: 8.47746
PassID: 41, Cost: 102.85758
PassID: 42, Cost: 90.31106
PassID: 43, Cost: 67.09414
PassID: 44, Cost: 51.76865
PassID: 45, Cost: 174.99576
PassID: 46, Cost: 36.16768
PassID: 47, Cost: 68.24715
PassID: 48, Cost: 80.25554
PassID: 49, Cost: 73.06264
PassID: 50, Cost: 48.89883
PassID: 51, Cost: 59.08198
PassID: 52, Cost: 42.32336
PassID: 53, Cost: 84.36057
PassID: 54, Cost: 45.26277
PassID: 55, Cost: 119.80692
PassID: 56, Cost: 88.87520
PassID: 57, Cost: 58.20774
PassID: 58, Cost: 68.46684
PassID: 59, Cost: 17.85213
PassID: 60, Cost: 47.71296
PassID: 61, Cost: 21.33267
PassID: 62, Cost: 32.88322
PassID: 63, Cost: 43.57720
PassID: 64, Cost: 46.99419
PassID: 65, Cost: 12.02126
PassID: 66, Cost: 148.95886
PassID: 67, Cost: 52.31165
PassID: 68, Cost: 95.54218
PassID: 69, Cost: 51.04523
PassID: 70, Cost: 62.38363
PassID: 71, Cost: 37.53402
PassID: 72, Cost: 104.72713
PassID: 73, Cost: 26.85268
PassID: 74, Cost: 27.40690
PassID: 75, Cost: 40.29744
PassID: 76, Cost: 43.85099
PassID: 77, Cost: 87.08317
PassID: 78, Cost: 48.94913
PassID: 79, Cost: 123.66785
PassID: 80, Cost: 76.81953
PassID: 81, Cost: 62.81258
PassID: 82, Cost: 85.38822
PassID: 83, Cost: 85.57560
PassID: 84, Cost: 33.62983
PassID: 85, Cost: 29.80763
PassID: 86, Cost: 80.94147
PassID: 87, Cost: 64.78574
PassID: 88, Cost: 107.18308
PassID: 89, Cost: 16.42400
PassID: 90, Cost: 18.96049
PassID: 91, Cost: 56.48349
PassID: 92, Cost: 60.37949
PassID: 93, Cost: 54.47504
PassID: 94, Cost: 89.07017
PassID: 95, Cost: 59.49055
PassID: 96, Cost: 75.61760
PassID: 97, Cost: 58.82702
PassID: 98, Cost: 54.15410
PassID: 99, Cost: 62.33812
PassID: 100, Cost: 32.79350
PassID: 101, Cost: 35.62849
PassID: 102, Cost: 85.08487
PassID: 103, Cost: 100.62189
PassID: 104, Cost: 53.61509
PassID: 105, Cost: 69.98814
PassID: 106, Cost: 16.88175
PassID: 107, Cost: 69.71999
PassID: 108, Cost: 18.15492
PassID: 109, Cost: 46.36757
PassID: 110, Cost: 21.52023
PassID: 111, Cost: 45.19756
PassID: 112, Cost: 48.24986
PassID: 113, Cost: 61.38441
PassID: 114, Cost: 32.66204
PassID: 115, Cost: 54.74180
PassID: 116, Cost: 66.52021
PassID: 117, Cost: 113.15836
PassID: 118, Cost: 24.62640
PassID: 119, Cost: 91.92756

0: 14.17
1: 14.68
2: 13.92
3: 16.36
4: 14.49
5: 15.67
6: 15.62
7: 14.86
8: 11.66
9: 14.61
10: 11.03
11: 13.31
12: 14.19
13: 13.48
14: 13.94
15: 14.93
16: 16.22
17: 15.98
18: 16.32
19: 14.37
20: 15.30
21: 13.68
22: 15.96
23: 15.57
24: 14.97
25: 14.23
26: 15.67
27: 15.49
28: 16.81
29: 15.59
30: 15.42
31: 14.50
32: 14.88
33: 13.28
34: 12.52
35: 14.94
36: 15.12
37: 15.79
38: 16.09
39: 15.88
40: 14.34
41: 14.07
42: 15.80
43: 16.15
44: 15.93
45: 15.61
46: 15.17
47: 16.16
48: 16.24
49: 16.80
50: 14.95
51: 15.23
52: 14.59
53: 14.98
54: 16.15
55: 16.65
56: 16.20
57: 16.76
58: 16.93
59: 17.43
60: 17.48
61: 17.23
62: 15.11
63: 15.60
64: 16.57
65: 17.32
66: 16.86
67: 17.38
68: 17.48
69: 18.21
70: 15.74
71: 15.14
72: 16.44
73: 14.53
74: 16.20
75: 17.12
76: 18.21
77: 18.74
78: 19.00
79: 18.43
80: 17.85
81: 18.36
82: 17.05
83: 17.97
84: 16.42
85: 15.26
86: 14.16
87: 16.61
88: 17.61
89: 22.63
90: 22.77
91: 22.30
92: 20.72
93: 22.03
94: 22.46
95: 21.70
96: 21.97
97: 23.41
98: 23.06
99: 23.80
100: 23.58
101: 23.09
0: 8.50
1: 5.00
2: 11.90
3: 27.90
4: 17.20
5: 27.50
6: 15.00
7: 17.20
8: 17.90
9: 16.30
10: 7.00
11: 7.20
12: 7.50
13: 10.40
14: 8.80
15: 8.40
16: 16.70
17: 14.20
18: 20.80
19: 13.40
20: 11.70
21: 8.30
22: 10.20
23: 10.90
24: 11.00
25: 9.50
26: 14.50
27: 14.10
28: 16.10
29: 14.30
30: 11.70
31: 13.40
32: 9.60
33: 8.70
34: 8.40
35: 12.80
36: 10.50
37: 17.10
38: 18.40
39: 15.40
40: 10.80
41: 11.80
42: 14.90
43: 12.60
44: 14.10
45: 13.00
46: 13.40
47: 15.20
48: 16.10
49: 17.80
50: 14.90
51: 14.10
52: 12.70
53: 13.50
54: 14.90
55: 20.00
56: 16.40
57: 17.70
58: 19.50
59: 20.20
60: 21.40
61: 19.90
62: 19.00
63: 19.10
64: 19.10
65: 20.10
66: 19.90
67: 19.60
68: 23.20
69: 29.80
70: 13.80
71: 13.30
72: 16.70
73: 12.00
74: 14.60
75: 21.40
76: 23.00
77: 23.70
78: 25.00
79: 21.80
80: 20.60
81: 21.20
82: 19.10
83: 20.60
84: 15.20
85: 7.00
86: 8.10
87: 13.60
88: 20.10
89: 21.80
90: 24.50
91: 23.10
92: 19.70
93: 18.30
94: 21.20
95: 17.50
96: 16.80
97: 22.40
98: 20.60
99: 23.90
100: 22.00
101: 11.90