【python】pandas库pd.DataFrame.corr、numpy库np.corrcoef求解相关系数

- 一、首先生成数据:

1、df 数据:

import pandas as pd
from pandas import DataFrame,Series
df = pd.DataFrame(np.random.randn(20).reshape(4,5),index = [1,2,3,4],columns=['a','b','c','d','e',])
df
Out[2]: 
          a         b         c         d         e
1 -0.224433 -2.589454  0.107443  0.298472 -0.755109
2  0.062371 -0.621080 -0.098141  0.572026 -0.619506
3  0.135875 -1.189219 -0.879782  0.008277  1.500348
4 -1.306778  1.523779  0.583809 -0.354621  0.800768

2、np数据:

import numpy as np
np_a = df['a'].values
np_b = df['b'].values
np_a
Out[5]: array([-0.22443331,  0.06237092,  0.13587457, -1.30677821])
np_b
Out[6]: array([-2.58945416, -0.62107981, -1.18921924,  1.5237788 ])

- 二、求解corrcoef相关系数

  • 1.1 pd.DataFrame.corr
df.corr?
Signature: df.corr(method='pearson', min_periods=1)
Docstring:
Compute pairwise correlation of columns, excluding NA/null values

Parameters
----------
method : {'pearson', 'kendall', 'spearman'}
    * pearson : standard correlation coefficient   皮尔森相关系数
    * kendall : Kendall Tau correlation coefficient
    * spearman : Spearman rank correlation
min_periods : int, optional
    Minimum number of observations required per pair of columns
    to have a valid result. Currently only available for pearson
    and spearman correlation

Returns
-------
y : DataFrame

实例:

df.corr()
Out[10]: 
          a         b         c         d         e
a  1.000000 **-0.750861** -0.825824  0.764298 -0.182938
b **-0.750861**  1.000000  0.483860 -0.639109  0.433002
c -0.825824  0.483860  1.000000 -0.276338 -0.400660
d  0.764298 -0.639109 -0.276338  1.000000 -0.741822
e -0.182938  0.433002 -0.400660 -0.741822  1.000000
  • 2.2 np.corrcoef
np.corrcoef?
Signature: np.corrcoef(x, y=None, rowvar=True, bias=<class 'numpy._globals._NoValue'>, ddof=<class 'numpy._globals._NoValue'>)
Docstring:
Return Pearson product-moment correlation coefficients.

Parameters
----------
x : array_like
    A 1-D or 2-D array containing multiple variables and observations.
    Each row of `x` represents a variable, and each column a single
    observation of all those variables. Also see `rowvar` below.
y : array_like, optional
    An additional set of variables and observations. `y` has the same
    shape as `x`.
rowvar : bool, optional
    If `rowvar` is True (default), then each row represents a
    variable, with observations in the columns. Otherwise, the relationship
    is transposed: each column represents a variable, while the rows
    contain observations.
bias : _NoValue, optional
    Has no effect, do not use.

    .. deprecated:: 1.10.0
ddof : _NoValue, optional
    Has no effect, do not use.

    .. deprecated:: 1.10.0

Returns
-------
R : ndarray
    The correlation coefficient matrix of the variables.

实例:

np.corrcoef(np_a,np_b)
Out[65]: 
array([[ 1.        , **-0.75086081**],
       [**-0.75086081**,  1.        ]])

- 总结

df 表格数据的相关系数 corrcoef 实质上是对其每一列数据进行相关系数的计算,其结果等同于取出每列数据采用 np.corrcoef 计算,如两个结果中带 * 号的结果所示。

  • 6
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值