熊猫在线压缩图_回归图与熊猫和脾气暴躁

这篇博客介绍了如何利用Pandas和NumPy库在Python中创建和压缩回归图,详细阐述了数据处理和图形生成的过程。
摘要由CSDN通过智能技术生成

熊猫在线压缩图

数据可视化 (Data Visualization)

I like the plotting facilities that come with Pandas. Yes, there are many other plotting libraries such as Seaborn, Bokeh and Plotly but for most purposes, I am very happy with the simplicity of Pandas plotting.

我喜欢熊猫随附的绘图设备。 是的,还有许多其他的绘图库,例如Seaborn,Bokeh和Plotly,但对于大多数用途,我对Pandas绘图的简单性感到非常满意。

But there is one thing missing that I would like and that is the ability to plot a regression line over a complex line or scatter plot.

但是我想缺少一件事,那就是能够在复杂线或散点图上绘制回归线。

But, as I have discovered, this is very easily solved. With the Numpy library you can generate regression data in a couple of lines of code and plot it in the same figure as your original line or scatter plot.

但是,正如我发现的那样,这很容易解决。 使用Numpy库,您可以在几行代码中生成回归数据,并将其绘制在与原始线图或散点图相同的图中。

So that is what we are going to do in this article.

这就是我们在本文中要做的。

First, let’s get some data. If you’ve read any of my previous articles on data visualization, you know what’s coming next. I’m going to use a set of weather data that you can download from my Github account. It records the temperatures, sunshine levels and rainfall over several decades for London in the UK and is stored as a CSV file. This file has been created from public domain data recorded by the UK Met Office.

首先,让我们获取一些数据。 如果您阅读过我以前有关数据可视化的任何文章,那么您将了解接下来的内容。 我将使用一组可以从我的Github帐户下载的天气数据。 它记录了英国伦敦数十年来的温度,日照水平和降雨量,并以CSV文件存储。 该文件是根据UK Met Office记录的公共领域数据创建的。

伦敦夏天变热吗 (Are London summers getting hotter)

We are going to check whether the temperatures in London are rising over time. It’s not obvious from the raw data but by plotting a regression line over that data we will be better able to see the trend.

我们将检查伦敦的温度是否随着时间升高。 从原始数据来看并不明显,但是通过在该数据上绘制一条回归线,我们将能够更好地看到趋势。

So to begin we need to import the libraries that we will need.

因此,我们首先需要导入所需的库。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Nothing very unusual there, we are importing Pandas to help with data analysis and visualization, Numpy will give us the routines we need to create the regression data and Matplotlib is used by Pandas to create the plots.

那里没有什么异常的,我们正在导入Pandas以帮助进行数据分析和可视化,Numpy将为我们提供创建回归数据所需的例程,而Matplotlib被Pandas用于创建图。

Next, we download the data.

接下来,我们下载数据。

weather = pd.read_csv(‘https://raw.githubusercontent.com/alanjones2/dataviz/master/londonweather.csv')

(As you probably guessed, that’s all supposed to be on one line.)

(您可能已经猜到了,这些都应该放在一行上。)

We have read the CSV file into a Pandas DataFrame and this is what it looks like — a table containing monthly data that records the maximum and minimum temperatures, the rainfall and the number of hours of sunshine, starting in 1957 and ending part way through 2019.

我们已经将CSV文件读入了Pandas DataFrame,它的样子是这样的-该表包含每月数据,记录最高和最低温度,降雨量和日照小时数,始于1957年,直到2019年结束。

Image for post

I posed the question about whether summers were getting hotter, so I’m going to filter the data to give me only the data for the month of July when the hottest temperatures are normally recorded. And, for convenience, I’m going to add a column that numbers the years starting at year 0 (you’ll see how this is used later).

我提出了一个关于夏天是否变热的问题,所以我将过滤数据以仅提供通常记录最热温度的7月的数据。 并且,为方便起见,我将添加一列以数字表示从0年开始的年份(您将在稍后看到如何使用它)。

july = weather.query(‘Month == 7’)
july.insert(0,’Yr’,range(0,len(july)))

The code above applies a query to the weather dataframe which returns only the rows where the Month is equal to 7 (i.e.July) and creates a new dataframe called july from the result.

上面的代码对天气数据框应用查询,该查询仅返回Month等于7(即7月)的行,并从结果中创建一个称为july的新数据框。

Next, we insert a new column called Yr which numbers the rows from 0 to the length of the table.

接下来,我们插入一个称为Yr的新列,该列对从0到表的长度的行进行编号。

july looks like this:

七月看起来像这样:

Image for post

Now we can plot the maximum temperatures for July since 1957.

现在,我们可以绘制1957年以来7月份的最高温度。

july.plot(y=’Tmax’,x=’Yr’)
Image for post

There is a lot of variation there and high temperatures are not limited to recent years. But there does seem to be a trend, temperatures do seem to be rising a little, over time.

那里有很多变化,高温不仅限于近年来。 但似乎确实存在趋势,随着时间的流逝,温度似乎确实有所上升。

We can try and make this a bit more obvious by doing a linear regression where we attempt to find a straight line graph that represents the trend in the rise in temperature. To do this we use the polyfit function from Numpy. Polyfit does a least squares polynomial fit over the data that it is given. We want a linear regression over the data in columns Yr and Tmax so we pass these as parameters. The final parameter is the degree of the polynomial. For linear regression the degree is 1.

我们可以通过进行线性回归来尝试使这一点更加明显,在线性回归中我们试图找到一个代表温度上升趋势的直线图。 为此,我们使用Numpy中的polyfit函数。 Polyfit对给出的数据进行最小二乘多项式拟合。 我们希望对YrTmax列中的数据进行线性回归,因此我们将它们作为参数传递。 最终参数是多项式的次数。 对于线性回归,度为1。

We then use the convenience function poly1d to provide us with a function that will do the fitting.

然后,我们使用便利函数poly1d为我们提供将进行拟合的函数。

d = np.polyfit(july[‘Yr’],july[‘Tmax’],1)
f = np.poly1d(d)

We now use the function f to produce our linear regression data and inserting that into a new column called Treg.

现在,我们使用函数f生成线性回归数据,并将其插入到名为Treg的新列中。

july.insert(6,’Treg’,f(july[‘Yr’]))

Next, we create a line plot of Yr against Tmax (the wiggly plot we saw above) and another of Yr against Treg which will be our straight line regression plot. We combine the two plot by assigning the first plot to the variable ax and then passing that to the second plot as an additional axis.

接下来,我们创建一个YrTmax的折线图(我们在上面看到的摆动曲线),以及另一个YrTreg的折线图,这将是我们的直线回归图。 我们通过将第一个图分配给变量ax ,然后将其作为附加轴传递给第二个图,来组合这两个图。

ax = july.plot(x = ‘Yr’,y=’Tmax’)
july.plot(x=’Yr’, y=’Treg’,color=’Red’,ax=ax)
Image for post

That’s it, done!

就这样,完成了!

We can now see much more clearly the upward trend of temperature over the years.

现在,我们可以更清楚地看到多年来温度的上升趋势。

And here is the same thing done with a scatter chart.

这就是散点图所做的相同的事情。

ax=july.plot.scatter(x=’Yr’, y=’Tmax’)
july.plot(x=’Yr’,y=’Treg’,color=’Red’,legend=False,ax=ax)
Image for post

That was fairly straightforward, I think, and I hope you found it useful.

我认为那非常简单,希望您发现它有用。

For an introduction to plotting with Pandas see this:

有关使用Pandas进行绘图的介绍,请参见:

翻译自: https://towardsdatascience.com/regression-plots-with-pandas-and-numpy-faf2edbfad4f

熊猫在线压缩图

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值