PaddlePaddle实现线性回归
在本次实验中我们将使用PaddlePaddle来搭建一个简单的线性回归模型,并利用这一模型实现已知某地给定有机质含量对土壤的含氮量进行预测。并且在学习模型搭建的过程中,了解到机器学习的若干重要概念,掌握一个机器学习预测的基本流程。
** 线性回归的基本概念 **
线性回归是机器学习中最简单也是最重要的模型之一,其模型建立遵循此流程:获取数据、数据预处理、训练模型、应用模型。
回归模型可以理解为:存在一个点集,用一条曲线去拟合它分布的过程。如果拟合曲线是一条直线,则称为线性回归。如果是一条二次曲线,则被称为二次回归。线性回归是回归模型中最简单的一种。
在线性回归中有几个基本的概念需要掌握:
- 假设函数(Hypothesis Function)
- 损失函数(Loss Function)
- 优化算法(Optimization Algorithm)
假设函数:
假设函数是指,用数学的方法描述自变量和因变量之间的关系,它们之间可以是一个线性函数或非线性函数。 在本次线性回顾模型中,我们的假设函数为 $ \hat{Y}= aX_1+b $ ,其中,Y^\hat{Y}Y^表示模型的预测结果(预测房价),用来和真实的Y区分。模型要学习的参数即:a,b。
损失函数:
损失函数是指,用数学的方法衡量假设函数预测结果与真实值之间的误差。这个差距越小预测越准确,而算法的任务就是使这个差距越来越小。
建立模型后,我们需要给模型一个优化目标,使得学到的参数能够让预测值Y^\hat{Y}Y^尽可能地接近真实值Y。输入任意一个数据样本的目标值yiy_iyi和模型给出的预测值Yi^\hat{Y_i}Yi^,损失函数输出一个非负的实值。这个实值通常用来反映模型误差的大小。
对于线性模型来讲,最常用的损失函数就是均方误差(Mean Squared Error, MSE)。
MSE=1n∑i=1n(Yi^−Yi)2MSE=\frac{1}{n}\sum_{i=1}^{n}(\hat{Y_i}-Y_i)^2MSE=n1i=1∑n(Yi^−Yi)2
即对于一个大小为n的测试集,MSE是n个数据预测结果误差平方的均值。
优化算法:
在模型训练中优化算法也是至关重要的,它决定了一个模型的精度和运算速度。本章的线性回归实例中主要使用了梯度下降法进行优化。
现在,让我们正式进入实验吧!
1 - 引用库
首先载入需要用到的库,它们分别是:
- numpy:一个python的基本库,用于科学计算
- matplotlib.pyplot:用于生成图,在验证模型准确率和展示成本变化趋势时会使用到
- paddle.fluid:PaddlePaddle其中一种深度学习框架
- pandas:一种基于NumPy的工具,高效处理数据
第一步,先运行!ls /home/aistudio/data/代码看自己的数据集在哪,如我的为data2054,在data2054里面有data_soil.txt
In[2]
!ls /home/aistudio/data/
data2054
In[3]
!ls /home/aistudio/data/data2054
data_soil.txt
安装pandas(一般情况已经存在)
In[4]
!pip install pandas
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/ Requirement already satisfied: pandas in /opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages (0.24.2) Requirement already satisfied: pytz>=2011k in /opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages (from pandas) (2018.9) Requirement already satisfied: numpy>=1.12.0 in /opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages (from pandas) (1.16.2) Requirement already satisfied: python-dateutil>=2.5.0 in /opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages (from pandas) (2.8.0) Requirement already satisfied: six>=1.5 in /opt/conda/envs/python27-paddle120-env/lib/python2.7/site-packages (from python-dateutil>=2.5.0->pandas) (1.12.0)
添加需要的库
In[5]
import sys
import paddle
import paddle.fluid as fluid
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from paddle.utils.plot import Ploter
from __future__ import print_function
from paddle.fluid.contrib.trainer import *
from paddle.fluid.contrib.inferencer import *
查看当前环境中的Python版本和Paddle版本
In[6]
print(sys.version)
print(paddle.__version__)
2.7.15 | packaged by conda-forge | (default, Feb 28 2019, 04:00:11) [GCC 7.3.0] 1.5.0
2 - 数据预处理 本次数据集使用的是某地区的土壤有机质含量和氮含量的数据。氮含量和有机质含量近似成线性关系,数据集只有两列,以TXT的形式储存。本次预测要得到的他们之间的线性关系。 当真实数据被收集到后,它们往往不能直接使用,需要进行预处理。我们首先以表格的形式输出数据的前五行看一下
In[7]
colnames = ['有机质含量']+['氮含量']
print_data = pd.read_csv('/home/aistudio/data/data2054/data_soil.txt',names = colnames)
print_data.head()
有机质含量 | 氮含量 | |
---|---|---|
0 | 3.75 | 0.40 |
1 | 7.95 | 0.41 |
2 | 5.44 | 0.42 |
3 | 7.57 | 0.43 |
4 | 9.83 | 0.44 |
小技巧
如果不确定某个组件下有哪些方法或者属性, 可以尝试使用.之后按下tab键. 这个tab键也可以提示方法或函数需要的参数
归一化 观察一下数据的分布特征,一般而言,如果样本有多个属性,那么各维属性的取值范围差异会很大,这就要用到一个常见的操作-归一化(normalization)了。归一化的目标是把各维属性的取值范围放缩到差不多的区间,例如[-0.5, 0.5]。这里我们使用一种很常见的操作方法:减掉均值,然后除以原取值范围。
In[8]
# coding = utf-8 #
global x_raw,train_data,test_data
#下载原始数据赋值给data,将data分为train_data与test_data
data = np.loadtxt('/home/aistudio/data/data2054/data_soil.txt',delimiter = ',')
x_raw = data.T[0].copy()
maximums, minimums, avgs = data.max(axis=0), data.min(axis=0), data.sum(axis=0)/data.shape[0]
print("the raw area :",data[:,0].max(axis = 0))
#进行归一化操作
feature_num = 2
for i in range(feature_num-1):
#输入归一化的代码
#需要自行编程实现
data[:,i]=(data[:,i]-avgs[i])/(maximums[i] - minimums[i])
print('normalization:',data[:,0].max(axis = 0))
the raw area : 85.67 normalization: 0.5386874418397725
数据集分割 将原始数据处理为可用数据后,为了评估模型的好坏,我们将数据分成两份:训练集和测试集。 训练集数据用于调整模型的参数,即进行模型的训练,模型在这份数据集上的误差被称为训练误差; 测试集数据被用来测试,模型在这份数据集上的误差被称为测试误差。 我们训练模型的目的是为了通过从训练数据中找到规律来预测未知的新数据,所以测试误差是更能反映模型表现的指标。分割数据的比例要考虑到两个因素:更多的训练数据会降低参数估计的方差,从而得到更可信的模型;而更多的测试数据会降低测试误差的方差,从而得到更可信的测试误差。我们这个例子中设置的分割比例为8:2。 定义reader 构造read_data()函数,来读取训练数据集train_set或者测试数据集test_set。它的具体实现是在read_data()函数内部构造一个reader(),使用yield关键字来让reader()成为一个Generator(生成器),注意,yield关键字的作用和使用方法类似return关键字,不同之处在于yield关键字可以构造生成器(Generator)。虽然我们可以直接创建一个包含所有数据的列表,但是由于内存限制,我们不可能创建一个无限大的或者巨大的列表,并且很多时候在创建了一个百万数量级别的列表之后,我们却只需要用到开头的几个或几十个数据,这样造成了极大的浪费,而生成器的工作方式是在每次循环时计算下一个值,不断推算出后续的元素,不会创建完整的数据集列表,从而节约了内存使用。
In[9]
ratio = 0.8
offset = int(data.shape[0]*ratio)
train_data = data[:offset].copy()
test_data = data[offset:].copy()
#可以把分割结果打印出来进行观察
print(len(data))
print(len(train_data))
def read_data(data_set):
def reader():
for data in data_set:
yield data[:-1],data[-1:] #这里是执行迭代,从后往前数的话,最后一个位置为-1
return reader
"""获取训练数据集和测试数据集
定义一个reader来获取训练数据集及其标签
Args:
Return:
read_data -- 用于获取训练数据集及其标签的reader"""
def train():
global train_data
return read_data(train_data)
def test():
global test_data
return read_data(test_data)
524 419
设置训练参数 同学们可以试着调整一下参数值,看会有什么变化。 关于参数的解释如下: paddle.reader.shuffle(train(), buf_size=400)表示trainer从train()这个reader中读取了buf_size=400大小的数据并打乱顺序 paddle.batch(reader(), batch_size=BATCH_SIZE)表示从打乱的数据中再取出BATCH_SIZE=20大小的数据进行一次迭代训练
In[10]
BATCH_SIZE = 30
train_reader = paddle.batch(
paddle.reader.shuffle(
train(),
buf_size=400),
batch_size=BATCH_SIZE)
test_reader = paddle.batch(
paddle.reader.shuffle(
test(),
buf_size=400),
batch_size=BATCH_SIZE)
将设计完成的网络参数写入 train_program() 函数,便于训练时调用 相关函数定义可以查找paddlepaddle的使用文档http://www.paddlepaddle.org/documentation/api/zh/0.14.0/layers.html#permalink-47-fc
In[11]
def train_program():
y = fluid.layers.data(name='y', shape=[1], dtype='float32')
# feature vector of length 13
x = fluid.layers.data(name='x', shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
loss = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(loss)
return avg_loss
损失函数定义确定后,需要定义参数优化方法。
In[12]
def optimizer_program():
return fluid.optimizer.SGD(learning_rate=0.01)
定义运算场所 首先进行最基本的运算场所定义,在 fluid 中使用 place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() 来进行初始化: place 表示fluid program的执行设备,常见的有 fluid.CUDAPlace(0) 和 fluid.CPUPlace() use_cuda = False 表示不使用 GPU 进行加速训练
In[13]
use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
创建训练器 创建训练器时需要提供3个主要信息: 一个配置好的网络拓扑结构 训练的硬件场所 具体的优化方法
In[14]
trainer = Trainer(
train_func=train_program,
place=place,
optimizer_func=optimizer_program)
其它配置 feed_order=['x', 'y'] 是数据层名称和数组索引的映射,用于定义数据的读取顺序。 params_dirname用于定义模型保存路径。 最后定义事件处理器 event_handler_plot(event) 用于打印训练过程
In[15]
feed_order=['x', 'y']
# Specify the directory to save the parameters
import os
params_dirname = "/home/aistudio/inference_model"
# 如果保存路径不存在就创建
if not os.path.exists(params_dirname):
os.makedirs(params_dirname)
import shutil
shutil.rmtree(params_dirname) #递归删除文件夹
train_title = "Train cost"
test_title = "Test cost"
plot_cost = Ploter(train_title, test_title)
step = 0
# event_handler prints training and testing info
def event_handler(event):
global step
if isinstance(event, EndStepEvent):#每步触发事件
if step % 10 == 0: # record a train cost every 10 batches
print("%s, Step %d, Cost %f" %(train_title, step, event.metrics[0]))
if step % 100 == 0: # record a test cost every 100 batches
test_metrics = trainer.test(
reader=test_reader, feed_order=feed_order)
print("%s, Step %d, Cost %f" %(test_title, step, test_metrics[0]))
if test_metrics[0] < 0.01:
# If the accuracy is good enough, we can stop the training.
print('loss is less than 0.001, stop')
trainer.stop()
step += 1
if isinstance(event, EndEpochEvent):#每次迭代触发事件
if event.epoch % 10 == 0:
# We can save the trained parameters for the inferences later
if params_dirname is not None:
trainer.save_params(params_dirname)
开始训练 我们现在可以通过调用trainer.train()来开始训练 关于参数的解释如下: 参数feed_order用到了之前定义的feed_order索引,将数据层x和y按顺序输入trainer,也就是数据的来源。 参数event_handler是事件管理机制,读者可以自定义event_handler,根据事件信息作相应的操作。 参数num_epochs=100表示迭代训练100次后停止训练。
In[16]
%matplotlib inline
# The training could take up to a few minutes.
trainer.train(
reader=train_reader,
num_epochs=300,
event_handler=event_handler,
feed_order=feed_order)
Train cost, Step 0, Cost 4.768987 Test cost, Step 0, Cost 8.029277 Train cost, Step 10, Cost 3.512895 Train cost, Step 20, Cost 4.517720 Train cost, Step 30, Cost 2.638228 Train cost, Step 40, Cost 2.876105 Train cost, Step 50, Cost 2.152664 Train cost, Step 60, Cost 1.750877 Train cost, Step 70, Cost 2.495707 Train cost, Step 80, Cost 1.044865 Train cost, Step 90, Cost 1.266695 Train cost, Step 100, Cost 1.617073 Test cost, Step 100, Cost 1.710509 Train cost, Step 110, Cost 0.870742 Train cost, Step 120, Cost 1.326341 Train cost, Step 130, Cost 1.310452 Train cost, Step 140, Cost 1.335390 Train cost, Step 150, Cost 1.182201 Train cost, Step 160, Cost 0.992906 Train cost, Step 170, Cost 0.919412 Train cost, Step 180, Cost 1.154889 Train cost, Step 190, Cost 0.921385 Train cost, Step 200, Cost 1.428739 Test cost, Step 200, Cost 1.025548 Train cost, Step 210, Cost 0.892617 Train cost, Step 220, Cost 0.624673 Train cost, Step 230, Cost 1.201186 Train cost, Step 240, Cost 0.744943 Train cost, Step 250, Cost 0.881837 Train cost, Step 260, Cost 0.671617 Train cost, Step 270, Cost 0.846477 Train cost, Step 280, Cost 1.087123 Train cost, Step 290, Cost 0.979044 Train cost, Step 300, Cost 0.848427 Test cost, Step 300, Cost 0.790917 Train cost, Step 310, Cost 0.929944 Train cost, Step 320, Cost 0.647987 Train cost, Step 330, Cost 0.558589 Train cost, Step 340, Cost 0.673265 Train cost, Step 350, Cost 0.748646 Train cost, Step 360, Cost 0.664201 Train cost, Step 370, Cost 0.738837 Train cost, Step 380, Cost 0.680803 Train cost, Step 390, Cost 0.592835 Train cost, Step 400, Cost 0.461612 Test cost, Step 400, Cost 0.606999 Train cost, Step 410, Cost 0.588587 Train cost, Step 420, Cost 0.708379 Train cost, Step 430, Cost 0.652785 Train cost, Step 440, Cost 0.739320 Train cost, Step 450, Cost 0.703653 Train cost, Step 460, Cost 0.772891 Train cost, Step 470, Cost 0.569921 Train cost, Step 480, Cost 0.605819 Train cost, Step 490, Cost 0.629869 Train cost, Step 500, Cost 0.564279 Test cost, Step 500, Cost 0.483584 Train cost, Step 510, Cost 0.529205 Train cost, Step 520, Cost 0.453852 Train cost, Step 530, Cost 0.500234 Train cost, Step 540, Cost 0.387460 Train cost, Step 550, Cost 0.469682 Train cost, Step 560, Cost 0.549738 Train cost, Step 570, Cost 0.449353 Train cost, Step 580, Cost 0.432828 Train cost, Step 590, Cost 0.504905 Train cost, Step 600, Cost 0.466302 Test cost, Step 600, Cost 0.378664 Train cost, Step 610, Cost 0.461219 Train cost, Step 620, Cost 0.319102 Train cost, Step 630, Cost 0.476844 Train cost, Step 640, Cost 0.337013 Train cost, Step 650, Cost 0.290477 Train cost, Step 660, Cost 0.396287 Train cost, Step 670, Cost 0.379541 Train cost, Step 680, Cost 0.396960 Train cost, Step 690, Cost 0.426767 Train cost, Step 700, Cost 0.321290 Test cost, Step 700, Cost 0.317931 Train cost, Step 710, Cost 0.467874 Train cost, Step 720, Cost 0.346082 Train cost, Step 730, Cost 0.307868 Train cost, Step 740, Cost 0.394967 Train cost, Step 750, Cost 0.305963 Train cost, Step 760, Cost 0.292577 Train cost, Step 770, Cost 0.321144 Train cost, Step 780, Cost 0.328115 Train cost, Step 790, Cost 0.336515 Train cost, Step 800, Cost 0.263018 Test cost, Step 800, Cost 0.242078 Train cost, Step 810, Cost 0.240048 Train cost, Step 820, Cost 0.246798 Train cost, Step 830, Cost 0.286270 Train cost, Step 840, Cost 0.339424 Train cost, Step 850, Cost 0.252880 Train cost, Step 860, Cost 0.295079 Train cost, Step 870, Cost 0.241332 Train cost, Step 880, Cost 0.212187 Train cost, Step 890, Cost 0.218729 Train cost, Step 900, Cost 0.242188 Test cost, Step 900, Cost 0.198788 Train cost, Step 910, Cost 0.241578 Train cost, Step 920, Cost 0.222606 Train cost, Step 930, Cost 0.189436 Train cost, Step 940, Cost 0.191209 Train cost, Step 950, Cost 0.224156 Train cost, Step 960, Cost 0.209254 Train cost, Step 970, Cost 0.197532 Train cost, Step 980, Cost 0.198264 Train cost, Step 990, Cost 0.206929 Train cost, Step 1000, Cost 0.198488 Test cost, Step 1000, Cost 0.153360 Train cost, Step 1010, Cost 0.135618 Train cost, Step 1020, Cost 0.137078 Train cost, Step 1030, Cost 0.159243 Train cost, Step 1040, Cost 0.175211 Train cost, Step 1050, Cost 0.144821 Train cost, Step 1060, Cost 0.162203 Train cost, Step 1070, Cost 0.197415 Train cost, Step 1080, Cost 0.143427 Train cost, Step 1090, Cost 0.147314 Train cost, Step 1100, Cost 0.142196 Test cost, Step 1100, Cost 0.123154 Train cost, Step 1110, Cost 0.174198 Train cost, Step 1120, Cost 0.201148 Train cost, Step 1130, Cost 0.100205 Train cost, Step 1140, Cost 0.111531 Train cost, Step 1150, Cost 0.096112 Train cost, Step 1160, Cost 0.121104 Train cost, Step 1170, Cost 0.111347 Train cost, Step 1180, Cost 0.148391 Train cost, Step 1190, Cost 0.131636 Train cost, Step 1200, Cost 0.152686 Test cost, Step 1200, Cost 0.098063 Train cost, Step 1210, Cost 0.097262 Train cost, Step 1220, Cost 0.111449 Train cost, Step 1230, Cost 0.154582 Train cost, Step 1240, Cost 0.111647 Train cost, Step 1250, Cost 0.122552 Train cost, Step 1260, Cost 0.113828 Train cost, Step 1270, Cost 0.094590 Train cost, Step 1280, Cost 0.085252 Train cost, Step 1290, Cost 0.095512 Train cost, Step 1300, Cost 0.100617 Test cost, Step 1300, Cost 0.072462 Train cost, Step 1310, Cost 0.106737 Train cost, Step 1320, Cost 0.102805 Train cost, Step 1330, Cost 0.098214 Train cost, Step 1340, Cost 0.078186 Train cost, Step 1350, Cost 0.096731 Train cost, Step 1360, Cost 0.092324 Train cost, Step 1370, Cost 0.096081 Train cost, Step 1380, Cost 0.080706 Train cost, Step 1390, Cost 0.085458 Train cost, Step 1400, Cost 0.066053 Test cost, Step 1400, Cost 0.062014 Train cost, Step 1410, Cost 0.158391 Train cost, Step 1420, Cost 0.096896 Train cost, Step 1430, Cost 0.076681 Train cost, Step 1440, Cost 0.095945 Train cost, Step 1450, Cost 0.061191 Train cost, Step 1460, Cost 0.089816 Train cost, Step 1470, Cost 0.060841 Train cost, Step 1480, Cost 0.075491 Train cost, Step 1490, Cost 0.093773 Train cost, Step 1500, Cost 0.082257 Test cost, Step 1500, Cost 0.054648 Train cost, Step 1510, Cost 0.062478 Train cost, Step 1520, Cost 0.088672 Train cost, Step 1530, Cost 0.098577 Train cost, Step 1540, Cost 0.068808 Train cost, Step 1550, Cost 0.063292 Train cost, Step 1560, Cost 0.046482 Train cost, Step 1570, Cost 0.072079 Train cost, Step 1580, Cost 0.066417 Train cost, Step 1590, Cost 0.056396 Train cost, Step 1600, Cost 0.066187 Test cost, Step 1600, Cost 0.043654 Train cost, Step 1610, Cost 0.064506 Train cost, Step 1620, Cost 0.057062 Train cost, Step 1630, Cost 0.058497 Train cost, Step 1640, Cost 0.049261 Train cost, Step 1650, Cost 0.045824 Train cost, Step 1660, Cost 0.070338 Train cost, Step 1670, Cost 0.068806 Train cost, Step 1680, Cost 0.053915 Train cost, Step 1690, Cost 0.051097 Train cost, Step 1700, Cost 0.068100 Test cost, Step 1700, Cost 0.039543 Train cost, Step 1710, Cost 0.049441 Train cost, Step 1720, Cost 0.051990 Train cost, Step 1730, Cost 0.050742 Train cost, Step 1740, Cost 0.069195 Train cost, Step 1750, Cost 0.058446 Train cost, Step 1760, Cost 0.057564 Train cost, Step 1770, Cost 0.071641 Train cost, Step 1780, Cost 0.052876 Train cost, Step 1790, Cost 0.052155 Train cost, Step 1800, Cost 0.082488 Test cost, Step 1800, Cost 0.034690 Train cost, Step 1810, Cost 0.043171 Train cost, Step 1820, Cost 0.051249 Train cost, Step 1830, Cost 0.042519 Train cost, Step 1840, Cost 0.054430 Train cost, Step 1850, Cost 0.060285 Train cost, Step 1860, Cost 0.054826 Train cost, Step 1870, Cost 0.057194 Train cost, Step 1880, Cost 0.055687 Train cost, Step 1890, Cost 0.071762 Train cost, Step 1900, Cost 0.058110 Test cost, Step 1900, Cost 0.031468 Train cost, Step 1910, Cost 0.044074 Train cost, Step 1920, Cost 0.040255 Train cost, Step 1930, Cost 0.046137 Train cost, Step 1940, Cost 0.050012 Train cost, Step 1950, Cost 0.033987 Train cost, Step 1960, Cost 0.045765 Train cost, Step 1970, Cost 0.048595 Train cost, Step 1980, Cost 0.038905 Train cost, Step 1990, Cost 0.047263 Train cost, Step 2000, Cost 0.033901 Test cost, Step 2000, Cost 0.029656 Train cost, Step 2010, Cost 0.030197 Train cost, Step 2020, Cost 0.036784 Train cost, Step 2030, Cost 0.041115 Train cost, Step 2040, Cost 0.053284 Train cost, Step 2050, Cost 0.040142 Train cost, Step 2060, Cost 0.037007 Train cost, Step 2070, Cost 0.045154 Train cost, Step 2080, Cost 0.027582 Train cost, Step 2090, Cost 0.035088 Train cost, Step 2100, Cost 0.034761 Test cost, Step 2100, Cost 0.027043 Train cost, Step 2110, Cost 0.030808 Train cost, Step 2120, Cost 0.027776 Train cost, Step 2130, Cost 0.037169 Train cost, Step 2140, Cost 0.066838 Train cost, Step 2150, Cost 0.032508 Train cost, Step 2160, Cost 0.033583 Train cost, Step 2170, Cost 0.042142 Train cost, Step 2180, Cost 0.038644 Train cost, Step 2190, Cost 0.037633 Train cost, Step 2200, Cost 0.033084 Test cost, Step 2200, Cost 0.025096 Train cost, Step 2210, Cost 0.039980 Train cost, Step 2220, Cost 0.029732 Train cost, Step 2230, Cost 0.050501 Train cost, Step 2240, Cost 0.041677 Train cost, Step 2250, Cost 0.023526 Train cost, Step 2260, Cost 0.024553 Train cost, Step 2270, Cost 0.036005 Train cost, Step 2280, Cost 0.032678 Train cost, Step 2290, Cost 0.032366 Train cost, Step 2300, Cost 0.023447 Test cost, Step 2300, Cost 0.023911 Train cost, Step 2310, Cost 0.030922 Train cost, Step 2320, Cost 0.028721 Train cost, Step 2330, Cost 0.034513 Train cost, Step 2340, Cost 0.026559 Train cost, Step 2350, Cost 0.043622 Train cost, Step 2360, Cost 0.040104 Train cost, Step 2370, Cost 0.050138 Train cost, Step 2380, Cost 0.029433 Train cost, Step 2390, Cost 0.023240 Train cost, Step 2400, Cost 0.038487 Test cost, Step 2400, Cost 0.025467 Train cost, Step 2410, Cost 0.015854 Train cost, Step 2420, Cost 0.039864 Train cost, Step 2430, Cost 0.033333 Train cost, Step 2440, Cost 0.039830 Train cost, Step 2450, Cost 0.025698 Train cost, Step 2460, Cost 0.023239 Train cost, Step 2470, Cost 0.017733 Train cost, Step 2480, Cost 0.029231 Train cost, Step 2490, Cost 0.036119 Train cost, Step 2500, Cost 0.024184 Test cost, Step 2500, Cost 0.023387 Train cost, Step 2510, Cost 0.033727 Train cost, Step 2520, Cost 0.036388 Train cost, Step 2530, Cost 0.025946 Train cost, Step 2540, Cost 0.039988 Train cost, Step 2550, Cost 0.039029 Train cost, Step 2560, Cost 0.034118 Train cost, Step 2570, Cost 0.037878 Train cost, Step 2580, Cost 0.019043 Train cost, Step 2590, Cost 0.014606 Train cost, Step 2600, Cost 0.017237 Test cost, Step 2600, Cost 0.024894 Train cost, Step 2610, Cost 0.032909 Train cost, Step 2620, Cost 0.046552 Train cost, Step 2630, Cost 0.019126 Train cost, Step 2640, Cost 0.027269 Train cost, Step 2650, Cost 0.023830 Train cost, Step 2660, Cost 0.038029 Train cost, Step 2670, Cost 0.029304 Train cost, Step 2680, Cost 0.023559 Train cost, Step 2690, Cost 0.037625 Train cost, Step 2700, Cost 0.028565 Test cost, Step 2700, Cost 0.024941 Train cost, Step 2710, Cost 0.031061 Train cost, Step 2720, Cost 0.025516 Train cost, Step 2730, Cost 0.034017 Train cost, Step 2740, Cost 0.045984 Train cost, Step 2750, Cost 0.034554 Train cost, Step 2760, Cost 0.023234 Train cost, Step 2770, Cost 0.021406 Train cost, Step 2780, Cost 0.064184 Train cost, Step 2790, Cost 0.043807 Train cost, Step 2800, Cost 0.037964 Test cost, Step 2800, Cost 0.024270 Train cost, Step 2810, Cost 0.025262 Train cost, Step 2820, Cost 0.020918 Train cost, Step 2830, Cost 0.025071 Train cost, Step 2840, Cost 0.032951 Train cost, Step 2850, Cost 0.034100 Train cost, Step 2860, Cost 0.035140 Train cost, Step 2870, Cost 0.026192 Train cost, Step 2880, Cost 0.014990 Train cost, Step 2890, Cost 0.049053 Train cost, Step 2900, Cost 0.026420 Test cost, Step 2900, Cost 0.026627 Train cost, Step 2910, Cost 0.034137 Train cost, Step 2920, Cost 0.032673 Train cost, Step 2930, Cost 0.024193 Train cost, Step 2940, Cost 0.031714 Train cost, Step 2950, Cost 0.024828 Train cost, Step 2960, Cost 0.025747 Train cost, Step 2970, Cost 0.021495 Train cost, Step 2980, Cost 0.017658 Train cost, Step 2990, Cost 0.025132 Train cost, Step 3000, Cost 0.017609 Test cost, Step 3000, Cost 0.024128 Train cost, Step 3010, Cost 0.028900 Train cost, Step 3020, Cost 0.016700 Train cost, Step 3030, Cost 0.026613 Train cost, Step 3040, Cost 0.019744 Train cost, Step 3050, Cost 0.045507 Train cost, Step 3060, Cost 0.026242 Train cost, Step 3070, Cost 0.022545 Train cost, Step 3080, Cost 0.035223 Train cost, Step 3090, Cost 0.022581 Train cost, Step 3100, Cost 0.024111 Test cost, Step 3100, Cost 0.025594 Train cost, Step 3110, Cost 0.034491 Train cost, Step 3120, Cost 0.014520 Train cost, Step 3130, Cost 0.015593 Train cost, Step 3140, Cost 0.023285 Train cost, Step 3150, Cost 0.012638 Train cost, Step 3160, Cost 0.022198 Train cost, Step 3170, Cost 0.019563 Train cost, Step 3180, Cost 0.041903 Train cost, Step 3190, Cost 0.023916 Train cost, Step 3200, Cost 0.021095 Test cost, Step 3200, Cost 0.028093 Train cost, Step 3210, Cost 0.040792 Train cost, Step 3220, Cost 0.029057 Train cost, Step 3230, Cost 0.020520 Train cost, Step 3240, Cost 0.022582 Train cost, Step 3250, Cost 0.017649 Train cost, Step 3260, Cost 0.061936 Train cost, Step 3270, Cost 0.020003 Train cost, Step 3280, Cost 0.020403 Train cost, Step 3290, Cost 0.024036 Train cost, Step 3300, Cost 0.037860 Test cost, Step 3300, Cost 0.026783 Train cost, Step 3310, Cost 0.029722 Train cost, Step 3320, Cost 0.022559 Train cost, Step 3330, Cost 0.032497 Train cost, Step 3340, Cost 0.027590 Train cost, Step 3350, Cost 0.030427 Train cost, Step 3360, Cost 0.037586 Train cost, Step 3370, Cost 0.027731 Train cost, Step 3380, Cost 0.018294 Train cost, Step 3390, Cost 0.032997 Train cost, Step 3400, Cost 0.025673 Test cost, Step 3400, Cost 0.026371 Train cost, Step 3410, Cost 0.020853 Train cost, Step 3420, Cost 0.015484 Train cost, Step 3430, Cost 0.025746 Train cost, Step 3440, Cost 0.030819 Train cost, Step 3450, Cost 0.027891 Train cost, Step 3460, Cost 0.031681 Train cost, Step 3470, Cost 0.023304 Train cost, Step 3480, Cost 0.021340 Train cost, Step 3490, Cost 0.015192 Train cost, Step 3500, Cost 0.030663 Test cost, Step 3500, Cost 0.025029 Train cost, Step 3510, Cost 0.032962 Train cost, Step 3520, Cost 0.020400 Train cost, Step 3530, Cost 0.029337 Train cost, Step 3540, Cost 0.014134 Train cost, Step 3550, Cost 0.040396 Train cost, Step 3560, Cost 0.021215 Train cost, Step 3570, Cost 0.017495 Train cost, Step 3580, Cost 0.024108 Train cost, Step 3590, Cost 0.010949 Train cost, Step 3600, Cost 0.012924 Test cost, Step 3600, Cost 0.028458 Train cost, Step 3610, Cost 0.026737 Train cost, Step 3620, Cost 0.019719 Train cost, Step 3630, Cost 0.015369 Train cost, Step 3640, Cost 0.028902 Train cost, Step 3650, Cost 0.039268 Train cost, Step 3660, Cost 0.017939 Train cost, Step 3670, Cost 0.036227 Train cost, Step 3680, Cost 0.012675 Train cost, Step 3690, Cost 0.018152 Train cost, Step 3700, Cost 0.024145 Test cost, Step 3700, Cost 0.026889 Train cost, Step 3710, Cost 0.025453 Train cost, Step 3720, Cost 0.032178 Train cost, Step 3730, Cost 0.023277 Train cost, Step 3740, Cost 0.038356 Train cost, Step 3750, Cost 0.018018 Train cost, Step 3760, Cost 0.011010 Train cost, Step 3770, Cost 0.023033 Train cost, Step 3780, Cost 0.045767 Train cost, Step 3790, Cost 0.012840 Train cost, Step 3800, Cost 0.019810 Test cost, Step 3800, Cost 0.025651 Train cost, Step 3810, Cost 0.034081 Train cost, Step 3820, Cost 0.041910 Train cost, Step 3830, Cost 0.037905 Train cost, Step 3840, Cost 0.025698 Train cost, Step 3850, Cost 0.039071 Train cost, Step 3860, Cost 0.019030 Train cost, Step 3870, Cost 0.034477 Train cost, Step 3880, Cost 0.024240 Train cost, Step 3890, Cost 0.015157 Train cost, Step 3900, Cost 0.040621 Test cost, Step 3900, Cost 0.025927 Train cost, Step 3910, Cost 0.021204 Train cost, Step 3920, Cost 0.015538 Train cost, Step 3930, Cost 0.023660 Train cost, Step 3940, Cost 0.014193 Train cost, Step 3950, Cost 0.013150 Train cost, Step 3960, Cost 0.017309 Train cost, Step 3970, Cost 0.016231 Train cost, Step 3980, Cost 0.037976 Train cost, Step 3990, Cost 0.035511 Train cost, Step 4000, Cost 0.043626 Test cost, Step 4000, Cost 0.027540 Train cost, Step 4010, Cost 0.028400 Train cost, Step 4020, Cost 0.031141 Train cost, Step 4030, Cost 0.044042 Train cost, Step 4040, Cost 0.045470 Train cost, Step 4050, Cost 0.035228 Train cost, Step 4060, Cost 0.015033 Train cost, Step 4070, Cost 0.014766 Train cost, Step 4080, Cost 0.042850 Train cost, Step 4090, Cost 0.034066 Train cost, Step 4100, Cost 0.016928 Test cost, Step 4100, Cost 0.026744 Train cost, Step 4110, Cost 0.023494 Train cost, Step 4120, Cost 0.045882 Train cost, Step 4130, Cost 0.026959 Train cost, Step 4140, Cost 0.015221 Train cost, Step 4150, Cost 0.030598 Train cost, Step 4160, Cost 0.028655 Train cost, Step 4170, Cost 0.016936 Train cost, Step 4180, Cost 0.026485 Train cost, Step 4190, Cost 0.017895
查看inference_model中的权值与偏执值、学习率
In[17]
!ls /home/aistudio/inference_model
fc_0.b_0 fc_0.w_0 learning_rate_0
设定预测程序 类似于 trainer.train,预测器需要一个预测程序来做预测。我们可以稍加修改我们的训练程序来把预测值包含进来。
In[18]
def inference_program():
x = fluid.layers.data(name='x', shape=[1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
return y_predict
预测 预测器会从params_dirname中读取已经训练好的模型,来对从未遇见过的数据进行预测。 tensor_x:生成batch_size个[0,1]区间的随机数,以 tensor 的格式储存 results:预测对应 tensor_x 有机质含量的氮含量 raw_x:由于数据处理时我们做了归一化操作,为了更直观的判断预测是否准确,将数据进行反归一化,得到随机数对应的原始数据。
In[19]
inferencer = Inferencer(
infer_func=inference_program, param_path=params_dirname, place=place)
batch_size = 2
tensor_x = np.random.uniform(0, 1, [batch_size, 1]).astype("float32")
results = inferencer.infer({'x': tensor_x})
raw_x = tensor_x*(maximums[i]-minimums[i])+avgs[i]
print("有机质含量:",raw_x)
print("含氮量: ", results[0])
有机质含量: [[79.4205 ] [47.303215]] 含氮量: [[3.8053818] [2.2940538]]
根据线性模型的原理,计算a和b的值
- 数学原理:已知两点求直线方程
In[20]
a = (results[0][0][0] - results[0][1][0]) / (raw_x[0][0]-raw_x[1][0])
b = (results[0][0][0] - a * raw_x[0][0])
print(a,b)
0.047056526 0.068128824
绘制拟合图像 通过训练,本次线性回归模型输出了一条拟合的直线,想要直观的判断模型好坏可将拟合直线与数据的图像绘制出来。
In[21]
import numpy as np
import matplotlib.pyplot as plt
import sys #python2 如果要运行在python3需要将这三句话注释
reload(sys) #python2
sys.setdefaultencoding('utf-8') #python2
from matplotlib import rc
rc('font',**{'family':'sans-serif','sans-serif':['AR PL KaitiM GB']})
data = np.loadtxt('/home/aistudio/data/data2054/data_soil.txt',delimiter = ',')
def plot_data(data):
x = data[:,0]
y = data[:,1]
y_predict = x*a + b
plt.scatter(x,y,marker='.',c='r',label='True')
plt.title('Organic matter nitrogen content')#有机质氮含量
plt.xlabel('Organic matter content')#有机质含量
plt.ylabel('Nitrogen content')#氮含量
plt.xlim(0,90)
plt.ylim(0,4)
predict = plt.plot(x,y_predict,label='Predict')
plt.legend(loc='upper left')
plt.savefig('result1.png')
plt.show()
plot_data(data)
总结 通过这个练习我们应该记住: 机器学习的典型过程: 获取数据 数据预处理 -训练模型 -应用模型 fluid训练模型的基本步骤: 配置网络结构: 定义成本函数avg_cost 定义优化器optimizer 获取训练数据 定义运算场所(place)和执行器(exe) 提供数据(feeder) 执行训练(exe.run) 预测infer()并输出拟合图像 练习中的许多参数可以作调整,例如修改学习率会对模型结果产生很大影响,大家可以在本练习或者后面的练习中多做些尝试。 至此线性回归模型的训练工作完成,希望通过本次课程的学习,读者可以利用提供的代码完成一个简单的房价预测模型。通过这一过程,初步了解PaddlePaddle这一易学易用的分布式平台。 本节课作为PaddlePaddle的快速入门章节,希望可以开启您的下一步深度学习之门。