随手记录, 避免以后重复工作.
任务:
- 计算两个二维数组 (
DataFrame
形式) 每一列之间的相关性; - 并画出热图;
- 标记显著程度 (p值小于0.05标记为
*
, 小于0.01标记为**
).
参考:
Functions: 计算相关性 (Spearman&Pearson)
import numpy as np
import pandas as pd
# compute correlations
from scipy.stats import spearmanr, pearsonr
from scipy.spatial.distance import cdist
def calc_spearman(df1, df2):
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
n1 = df1.shape[1]
n2 = df2.shape[1]
corr0, pval0 = spearmanr(df1.values, df2.values)
# (n1 + n2) x (n1 + n2)
corr = pd.DataFrame(corr0[:n1, -n2:], index=df1.columns, columns=df2.columns)
pval = pd.DataFrame(pval0[:n1, -n2:], index=df1.columns, columns=df2.columns)
return corr, pval
# 简便法, 但是不能获取 pvalue
# from scipy.spatial.distance import cdist
# def cal