【python】pandas库pd.DataFrame.corr、numpy库np.corrcoef求解相关系数

最新推荐文章于 2025-04-22 16:55:29 发布

brucewong0516

最新推荐文章于 2025-04-22 16:55:29 发布

阅读量3.1w

点赞数 6

分类专栏： python 文章标签： python np.corrcoef pd.DataFrame.corr 相关系数皮尔逊检验

本文链接：https://blog.csdn.net/brucewong0516/article/details/79933247

版权

python 专栏收录该内容

125 篇文章

订阅专栏

- 一、首先生成数据：

1、df 数据：

import pandas as pd
from pandas import DataFrame,Series
df = pd.DataFrame(np.random.randn(20).reshape(4,5),index = [1,2,3,4],columns=['a','b','c','d','e',])

df
Out[2]: 
          a         b         c         d         e
1 -0.224433 -2.589454  0.107443  0.298472 -0.755109
2  0.062371 -0.621080 -0.098141  0.572026 -0.619506
3  0.135875 -1.189219 -0.879782  0.008277  1.500348
4 -1.306778  1.523779  0.583809 -0.354621  0.800768

2、np数据：

import numpy as np
np_a = df['a'].values
np_b = df['b'].values

np_a
Out[5]: array([-0.22443331,  0.06237092,  0.13587457, -1.30677821])
np_b
Out[6]: array([-2.58945416, -0.62107981, -1.18921924,  1.5237788 ])

- 二、求解corrcoef相关系数

1.1 pd.DataFrame.corr

df.corr?
Signature: df.corr(method='pearson', min_periods=1)
Docstring:
Compute pairwise correlation of columns, excluding NA/null values

Parameters
----------
method : {'pearson', 'kendall', 'spearman'}
    * pearson : standard correlation coefficient   皮尔森相关系数
    * kendall : Kendall Tau correlation coefficient
    * spearman : Spearman rank correlation
min_periods : int, optional
    Minimum number of observations required per pair of columns
    to have a valid result. Currently only available for pearson
    and spearman correlation

Returns
-------
y : DataFrame

实例：

df.corr()
Out[10]: 
          a         b         c         d         e
a  1.000000 **-0.750861** -0.825824  0.764298 -0.182938
b **-0.750861**  1.000000  0.483860 -0.639109  0.433002
c -0.825824  0.483860  1.000000 -0.276338 -0.400660
d  0.764298 -0.639109 -0.276338  1.000000 -0.741822
e -0.182938  0.433002 -0.400660 -0.741822  1.000000

2.2 np.corrcoef

np.corrcoef?
Signature: np.corrcoef(x, y=None, rowvar=True, bias=<class 'numpy._globals._NoValue'>, ddof=<class 'numpy._globals._NoValue'>)
Docstring:
Return Pearson product-moment correlation coefficients.

Parameters
----------
x : array_like
    A 1-D or 2-D array containing multiple variables and observations.
    Each row of `x` represents a variable, and each column a single
    observation of all those variables. Also see `rowvar` below.
y : array_like, optional
    An additional set of variables and observations. `y` has the same
    shape as `x`.
rowvar : bool, optional
    If `rowvar` is True (default), then each row represents a
    variable, with observations in the columns. Otherwise, the relationship
    is transposed: each column represents a variable, while the rows
    contain observations.
bias : _NoValue, optional
    Has no effect, do not use.

    .. deprecated:: 1.10.0
ddof : _NoValue, optional
    Has no effect, do not use.

    .. deprecated:: 1.10.0

Returns
-------
R : ndarray
    The correlation coefficient matrix of the variables.

实例：

np.corrcoef(np_a,np_b)
Out[65]: 
array([[ 1.        , **-0.75086081**],
       [**-0.75086081**,  1.        ]])

- 总结

求 df 表格数据的相关系数 corrcoef 实质上是对其每一列数据进行相关系数的计算，其结果等同于取出每列数据采用 np.corrcoef 计算，如两个结果中带 * 号的结果所示。