- 一、首先生成数据:
1、df 数据:
import pandas as pd
from pandas import DataFrame,Series
df = pd.DataFrame(np.random.randn(20).reshape(4,5),index = [1,2,3,4],columns=['a','b','c','d','e',])
df
Out[2]:
a b c d e
1 -0.224433 -2.589454 0.107443 0.298472 -0.755109
2 0.062371 -0.621080 -0.098141 0.572026 -0.619506
3 0.135875 -1.189219 -0.879782 0.008277 1.500348
4 -1.306778 1.523779 0.583809 -0.354621 0.800768
2、np数据:
import numpy as np
np_a = df['a'].values
np_b = df['b'].values
np_a
Out[5]: array([-0.22443331, 0.06237092, 0.13587457, -1.30677821])
np_b
Out[6]: array([-2.58945416, -0.62107981, -1.18921924, 1.5237788 ])
- 二、求解corrcoef相关系数
- 1.1 pd.DataFrame.corr
df.corr?
Signature: df.corr(method='pearson', min_periods=1)
Docstring:
Compute pairwise correlation of columns, excluding NA/null values
Parameters
----------
method : {'pearson', 'kendall', 'spearman'}
* pearson : standard correlation coefficient 皮尔森相关系数
* kendall : Kendall Tau correlation coefficient
* spearman : Spearman rank correlation
min_periods : int, optional
Minimum number of observations required per pair of columns
to have a valid result. Currently only available for pearson
and spearman correlation
Returns
-------
y : DataFrame
实例:
df.corr()
Out[10]:
a b c d e
a 1.000000 **-0.750861** -0.825824 0.764298 -0.182938
b **-0.750861** 1.000000 0.483860 -0.639109 0.433002
c -0.825824 0.483860 1.000000 -0.276338 -0.400660
d 0.764298 -0.639109 -0.276338 1.000000 -0.741822
e -0.182938 0.433002 -0.400660 -0.741822 1.000000
- 2.2 np.corrcoef
np.corrcoef?
Signature: np.corrcoef(x, y=None, rowvar=True, bias=<class 'numpy._globals._NoValue'>, ddof=<class 'numpy._globals._NoValue'>)
Docstring:
Return Pearson product-moment correlation coefficients.
Parameters
----------
x : array_like
A 1-D or 2-D array containing multiple variables and observations.
Each row of `x` represents a variable, and each column a single
observation of all those variables. Also see `rowvar` below.
y : array_like, optional
An additional set of variables and observations. `y` has the same
shape as `x`.
rowvar : bool, optional
If `rowvar` is True (default), then each row represents a
variable, with observations in the columns. Otherwise, the relationship
is transposed: each column represents a variable, while the rows
contain observations.
bias : _NoValue, optional
Has no effect, do not use.
.. deprecated:: 1.10.0
ddof : _NoValue, optional
Has no effect, do not use.
.. deprecated:: 1.10.0
Returns
-------
R : ndarray
The correlation coefficient matrix of the variables.
实例:
np.corrcoef(np_a,np_b)
Out[65]:
array([[ 1. , **-0.75086081**],
[**-0.75086081**, 1. ]])
- 总结
求 df 表格数据的相关系数 corrcoef 实质上是对其每一列数据进行相关系数的计算,其结果等同于取出每列数据采用 np.corrcoef 计算,如两个结果中带 * 号的结果所示。