jupyter notebook使用入门2——创建一个基于scikit-Learn的线性预测ipynb文件

最新推荐文章于 2024-06-27 22:29:10 发布

菜鸟知识搬运工

最新推荐文章于 2024-06-27 22:29:10 发布

阅读量3.1k

点赞数

分类专栏： Python学习文章标签： jupyter_notebook

本文链接：https://blog.csdn.net/qq_30815237/article/details/87893278

版权

Python学习专栏收录该内容

28 篇文章 37 订阅

订阅专栏

关于jupyter notebook的安装可参考博客：https://blog.csdn.net/qq_30815237/article/details/86577559

1、接下来新建一个ipynb文件，点击右上角的“new",选择”Python[conda env:anaconda3]"

得到下图：

图中有“code”，点击它右侧的下三角，可以看到不同的格式，先选择“markdown” 在In[]框内输入“

**Chapter 1 – The Machine Learning landscape**

_This is the code used to generate some of the figures in chapter 1._

# setup

”

按下“shift+enter”表示运行当前框内的内容，并自动生成下一个输入框，如下图：

解释：从上面我们可以了解到几件事：

1、“ ** 文本 ** ”这个格式用两个星号，可以使文本格式加粗；

2、“ _ 文本_”用两个下划线表示将文本变为斜体

3、“# 文本”用#号加一个空格表示将文本放大并加粗，随着#号的个数越多，文本会逐渐变小。

2、添加python代码，将“markdown”改为”code“，输入代码，读取加载数据oecd_bli_2015.csv：

注意读取文件的路径要正确！！！

这里讲一下”oecd_bli.head(2)“ ，括号里填写数字2，所以结果只显示两行数据，括号内数字默认为5，即”oecd_bli.head()“的话会显示5行数据。

3、读取加载数据gdp_per_capita.csv，并将两组数据合并：

注意：left_index，right_index参数，进行索引上的合并。

4、显示数据

plot(kind='scatter', x="GDP per capita", y='Life satisfaction', figsize=(10,5))分别表示：画散点图；x轴，y轴；图像尺寸。

5、线性拟合，预测

完整代码：

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn
#Load the data
# Download CSV from http://stats.oecd.org/index.aspx?DataSetCode=BLI
datapath = "datasets/lifesat/"
oecd_bli = pd.read_csv(datapath+"oecd_bli_2015.csv", thousands=',')
oecd_bli = oecd_bli[oecd_bli["INEQUALITY"]=="TOT"]
oecd_bli = oecd_bli.pivot(index="Country", columns="Indicator", values="Value")
oecd_bli.head(2)
#Load GDP date
gdp_per_capita = pd.read_csv(datapath+"gdp_per_capita.csv", thousands=',', delimiter='\t',
                             encoding='latin1', na_values="n/a")
gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True)
gdp_per_capita.set_index("Country", inplace=True)
gdp_per_capita.head(2)
#left_index，right_index参数，进行索引上的合并
full_country_stats = pd.merge(left=oecd_bli, right=gdp_per_capita, left_index=True, right_index=True)
full_country_stats.sort_values(by="GDP per capita", inplace=True)
full_country_stats
#np.c_是按行连接两个矩阵，就是把两矩阵左右相加，要求行数相等
x=np.c_[full_country_stats["GDP per capita"]]
y=np.c_[full_country_stats["Life satisfaction"]]
#Visualize the data
full_country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction', figsize=(10,5))
plt.show()
#选择线性模型来拟合数据，并进行预测
from sklearn import linear_model
lin=sklearn.linear_model.LinearRegression()
lin.fit(x,y)
x_new=[[22587]]
print(lin.predict(x_new))