python计算相关性的函数,在Python中计算Pearson相关性

最新推荐文章于 2024-04-23 20:56:56 发布

成既成矣

最新推荐文章于 2024-04-23 20:56:56 发布

阅读量312

点赞数 1

文章标签： python计算相关性的函数

I have 4 columns "Country, year, GDP, CO2 emissions"

I want to measure the pearson correlation between GDP and CO2emissions for each country.

The country column has all the countries in the world and the year has the values "1990, 1991, ...., 2018".

解决方案

You should use a groupby grouped with corr() as your aggregation function:

country = ['India','India','India','India','India','China','China','China','China','China']

Year = [2018,2017,2016,2015,2014,2018,2017,2016,2015,2014]

GDP = [100,98,94,64,66,200,189,165,134,130]

CO2 = [94,96,90,76,64,180,172,150,121,117]

df = pd.DataFrame({'country':country,'Year':Year,'GDP':GDP,'CO2':CO2})

print(df.groupby('country')[['GDP','CO2']].corr()

If we work this output a bit we can go to something fancier:

df_corr = (df.groupby('country')['GDP','CO2'].corr()).drop(columns='GDP').drop('CO2',level=1).rename(columns={'CO2':'Correlation'})

df_corr = df_corr.reset_index().drop(columns='level_1').set_index('country',drop=True)

print(df_corr)

Output:

Correlation

country

China 0.999581

India 0.932202

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注