pandas协方差和相关系数:
协方差和相关系数表示两个数据集之间存在的相关关系,在pandas中使用cov方法求协方差,使用corr方法来求相关系数,相关系数=协方差/标准差
#转换成矩阵
matrix_newdata = np.array(new_data)
print(matrix_newdata)
'''[[1.11397015e+03 4.26950000e+03 2.88167043e+03 1.92434336e+04
3.15348674e+01]
[1.26969300e+03 4.30440000e+03 3.42342447e+03 2.10167620e+04
3.38999825e+01]
[1.32771597e+03 4.33480000e+03 4.17657786e+03 2.55242657e+04
4.06121790e+01]
[1.60106904e+03 4.36370000e+03 5.01189343e+03 3.18399677e+04
5.06839994e+01]
[1.93002764e+03 4.38900000e+03 5.86391531e+03 3.72373131e+04
5.90975434e+01]]'''
#转置
df_matrix_newdata = pd.DataFrame(matrix_newdata.T)
print(df_matrix_newdata)
''' 0 1 2 3 4
0 1113.970150 1269.693003 1327.715967 1601.069038 1930.027639
1 4269.500000 4304.400000 4334.800000 4363.700000 4389.000000
2 2881.670431 3423.424472 4176.577856 5011.893427 5863.915310
3 19243.433578 21016.761992 25524.265651 31839.967740 37237.313050
4 31.534867 33.899983 40.612179 50.683999 59.097543'''
print(df_matrix_newdata[0].cov(df_matrix_newdata[1]))
print(df_matrix_newdata[0].corr(df_matrix_newdata[1]))
'''协方差:67146730.45383753'''
'''相关系数:0.9996476727699803'''
除了pandas中的cov和corr方法,可以使用numpy直接求协方差矩阵和相关系数矩阵
np.cov(matrix_newdata)
'''
协方差矩阵
array([[6.15981984e+07, 6.71467305e+07, 8.20479822e+07, 1.02863697e+08,
1.20457241e+08],
[6.71467305e+07, 7.32466577e+07, 8.95791295e+07, 1.12382902e+08,
1.31668519e+08],
[8.20479822e+07, 8.95791295e+07, 1.09699667e+08, 1.37783328e+08,
1.61545852e+08],
[1.02863697e+08, 1.12382902e+08, 1.37783328e+08, 1.73247291e+08,
2.03265037e+08],
[1.20457241e+08, 1.31668519e+08, 1.61545852e+08, 2.03265037e+08,
2.38588380e+08]])
'''
np.corrcoef(matrix_newdata)
'''
相关系数矩阵
array([[1. , 0.99964767, 0.99811652, 0.99573758, 0.99362901],
[0.99964767, 1. , 0.99933311, 0.99763829, 0.99600931],
[0.99811652, 0.99933311, 1. , 0.99944935, 0.998547 ],
[0.99573758, 0.99763829, 0.99944935, 1. , 0.99978085],
[0.99362901, 0.99600931, 0.998547 , 0.99978085, 1. ]])
'''
聚合统计:
print(new_data.aggregate(np.mean))
'''
社会消费品零售总额(亿元) 1448.495159
常驻人口 4332.280000
地区生产总值 4271.496299
人均可支配收入(元) 26972.348402
居民消费价格指数 43.165714
dtype: float64
'''
pandas中的aggregate方法不仅可以对整个数据帧进行聚合,对单个列,多个列,单列多种聚合运算,多列多种聚合运算也同样适用
print(new_data.aggregate(np.mean))
'''
社会消费品零售总额(亿元) 1448.495159
常驻人口 4332.280000
地区生产总值 4271.496299
人均可支配收入(元) 26972.348402
居民消费价格指数 43.165714
dtype: float64
'''