大家好,今天来讲讲数据分析中的相关分析。相关性分析是指对两个或多个具备相关性的变量元素进行分析,从而衡量两个变量因素的相关密切程度。相关性的元素之间需要存在一定的联系或者概率才可以进行相关性分析。
本篇主要介绍相关分析中的典型相关分析。
两个随机变量X、Y的相关性可用它们的相关系数[插图]来度量。但在许多实际问题中,需要研究多个变量间的相关性。比如,在变量组(X1,X2,…,Xp)和(Y1,Y2,…,Yq)中,虽然每个Xi与每个Yj之间的相关性也反映了两组变量中各对数据之间的联系,但不能反映这两组变量整体之间的相关性,而且使用这么多相关系数来整体描述两组变量之间的相关性显得过于烦琐。
典型相关分析由霍特林提出,其基本思想和主成分分析非常相似:首先在每组变量中找出变量的线性组合,使两组的线性组合之间具有最大的相关系数;然后选取和最初挑选的这对线性组合不相关的线性组合,使其配对,并选取相关系数最大的一对,如此继续下去,直到两组变量之间的相关性被提取完毕为止。被选出的线性组合称为典型变量,它们的相关系数称为典型相关系数。典型相关系数度量了两组变量之间相关关系的强度。
使用scikit-learn库中的CCA类可实现典型相关分析:
sklearn.cross_decomposition.CCA(n_components=2,scale=Ture,
max_iter=500,tol=1e-06,copy=Ture)
# n_components:接受int,表示要保留的组分数。默认为2
# scale:接受bool,表示是否对数据进行测量。默认为Ture
# max_iter:接受int,表示指定最大迭代数。默认为500
实例:将scikit-learn库自带的iris数据集划分为两个分别含两个变量的数据集,并计算这两个数据集的典型相关系数。
from sklearn import datasets
from sklearn.cross_decomposition import CCA
iris = datasets.load_iris() # 导入鸢尾花的数据集
iris_x = iris.data[:,0:2] # 取样本数据前两个特征
iris_y = iris.data[:,2:4] # 取样本数据后两个特征
cca = CCA() # 定义一个典型相关分析对象
# 调用该对象的训练方法,主要接收两个参数:两个不同的数据集
cca.fit(iris_x, iris_y)
print('降维结果为:', cca.transform(iris_x, iris_y)) # 输出降维结果
输出结果:较长,略看
降维结果为: (array([[-1.22140847e+00, 4.95152798e-01],
[-9.98912611e-01, -6.38966306e-01],
[-1.39955712e+00, -3.40140243e-01],
[-1.42183927e+00, -6.00140117e-01],
[-1.42173073e+00, 6.44565829e-01],
[-1.24358208e+00, 1.47985887e+00],
[-1.68889943e+00, 1.39792400e-02],
[-1.24369062e+00, 2.35152925e-01],
[-1.46640357e+00, -1.12013986e+00],
[-1.08793266e+00, -4.34259853e-01],
[-1.06554197e+00, 1.07044597e+00],
[-1.46629503e+00, 1.24566082e-01],
[-1.11021481e+00, -6.94259727e-01],
[-1.66672583e+00, -9.70726833e-01],
[-8.87393324e-01, 1.90573901e+00],
[-1.35477574e+00, 2.66927140e+00],
[-1.24358208e+00, 1.47985887e+00],
[-1.22140847e+00, 4.95152798e-01],
[-8.20655420e-01, 1.44103268e+00],
[-1.48846863e+00, 1.10927215e+00],
[-7.98481813e-01, 4.56326609e-01],
[-1.39944858e+00, 9.04565703e-01],
[-1.86693954e+00, 4.23392145e-01],
[-1.04336837e+00, 8.57398935e-02],
[-1.46629503e+00, 1.24566082e-01],
[-8.87610409e-01, -5.83672885e-01],
[-1.24369062e+00, 2.35152925e-01],
[-1.11010627e+00, 5.50446219e-01],
[-1.02108622e+00, 3.45739767e-01],
[-1.39955712e+00, -3.40140243e-01],
[-1.19923487e+00, -4.89553275e-01],
[-7.98481813e-01, 4.56326609e-01],
[-1.64422659e+00, 1.77868493e+00],
[-1.39934004e+00, 2.14927165e+00],
[-1.08793266e+00, -4.34259853e-01],
[-1.06565052e+00, -1.74259980e-01],
[-7.76199664e-01, 7.16326483e-01],
[-1.53303293e+00, 5.89272408e-01],
[-1.55542362e+00, -9.15433412e-01],
[-1.13238842e+00, 2.90446346e-01],
[-1.33271068e+00, 4.39859377e-01],
[-8.20981047e-01, -2.29308516e+00],
[-1.73346373e+00, -5.06020507e-01],
[-1.33271068e+00, 4.39859377e-01],
[-1.48846863e+00, 1.10927215e+00],
[-1.11021481e+00, -6.94259727e-01],
[-1.48846863e+00, 1.10927215e+00],
[-1.51085932e+00, -3.95433665e-01],
[-1.17684418e+00, 1.01515255e+00],
[-1.15467057e+00, 3.04464723e-02],
[ 1.16039353e+00, 9.31608443e-01],
[ 4.92580317e-01, 5.99847916e-01],
[ 1.13811138e+00, 6.71608570e-01],
[ 2.92040976e-01, -1.74015094e+00],
[ 9.59962733e-01, -1.63684472e-01],
[ 6.95451141e-02, -6.06031841e-01],
[ 2.92258061e-01, 7.49260948e-01],
[-4.64792291e-01, -1.86720502e+00],
[ 9.82244881e-01, 9.63154019e-02],
[-3.97945844e-01, -1.08720540e+00],
[ 2.59012502e-03, -2.63073741e+00],
[ 1.14109412e-01, -8.60320940e-02],
[ 9.37572041e-01, -1.66839029e+00],
[ 4.25733870e-01, -1.80151704e-01],
[-1.30777142e-01, -4.56618810e-01],
[ 9.15506977e-01, 5.61021728e-01],
[-2.19797195e-01, -2.51912357e-01],
[ 2.69867370e-01, -7.55444872e-01],
[ 1.16017645e+00, -1.55780345e+00],
[ 2.25303072e-01, -1.27544462e+00],
[-6.39306949e-02, 3.23380811e-01],
[ 5.14753923e-01, -3.84858156e-01],
[ 1.00441849e+00, -8.88390671e-01],
[ 5.14753923e-01, -3.84858156e-01],
[ 7.59640477e-01, -1.42714404e-02],
[ 8.93224828e-01, 3.01021854e-01],
[ 1.29386934e+00, 2.19579198e-03],
[ 1.00452703e+00, 3.56315275e-01],
[ 3.14431668e-01, -2.35445125e-01],
[ 2.47585221e-01, -1.01544475e+00],
[ 2.03020923e-01, -1.53544449e+00],
[ 2.03020923e-01, -1.53544449e+00],
[ 2.69867370e-01, -7.55444872e-01],
[ 4.92471774e-01, -6.44858030e-01],
[-4.42401600e-01, -3.62499200e-01],
[-1.30668599e-01, 7.88087136e-01],
[ 9.15506977e-01, 5.61021728e-01],
[ 1.18245859e+00, -1.29780358e+00],
[-2.19797195e-01, -2.51912357e-01],
[ 1.14000870e-01, -1.33073804e+00],
[ 2.49808162e-02, -1.12603159e+00],
[ 3.36713817e-01, 2.45547484e-02],
[ 3.58887423e-01, -9.60151324e-01],
[-2.64470035e-01, -2.01661805e+00],
[ 4.72629652e-02, -8.66031714e-01],
[-1.08494993e-01, -1.96618936e-01],
[-1.94749393e-02, -4.01325389e-01],
[ 5.37036072e-01, -1.24858283e-01],
[-3.31207940e-01, -1.55191172e+00],
[ 6.95451141e-02, -6.06031841e-01],
[ 2.92258061e-01, 7.49260948e-01],
[ 2.69867370e-01, -7.55444872e-01],
[ 1.44973584e+00, 5.77488960e-01],
[ 6.48338275e-01, -6.95648616e-02],
[ 7.81922626e-01, 2.45728433e-01],
[ 2.00624685e+00, 8.53956066e-01],
[-5.53812344e-01, -1.66249857e+00],
[ 1.76136030e+00, 4.83369350e-01],
[ 1.44962730e+00, -6.67216986e-01],
[ 1.02691772e+00, 1.86102109e+00],
[ 6.03882519e-01, 6.55141338e-01],
[ 9.37680584e-01, -4.23684345e-01],
[ 1.11582923e+00, 4.11608697e-01],
[ 3.36605274e-01, -1.22015120e+00],
[ 1.80847316e-01, -5.50738420e-01],
[ 4.92580317e-01, 5.99847916e-01],
[ 7.81922626e-01, 2.45728433e-01],
[ 1.40538863e+00, 2.54690111e+00],
[ 2.47362927e+00, 9.04236779e-02],
[ 9.37572041e-01, -1.66839029e+00],
[ 1.04909133e+00, 8.76315022e-01],
[-4.17570882e-02, -6.61325262e-01],
[ 2.29558916e+00, 4.99836583e-01],
[ 8.26378381e-01, -4.78977766e-01],
[ 7.37466870e-01, 9.70434632e-01],
[ 1.38299794e+00, 1.04219529e+00],
[ 6.26056126e-01, -3.29564735e-01],
[ 3.36713817e-01, 2.45547484e-02],
[ 8.48660530e-01, -2.18977893e-01],
[ 1.56103804e+00, 6.32782381e-01],
[ 1.96168255e+00, 3.33956319e-01],
[ 1.62799303e+00, 2.65748795e+00],
[ 8.48660530e-01, -2.18977893e-01],
[ 7.37358328e-01, -2.74271314e-01],
[ 6.92794030e-01, -7.94271061e-01],
[ 2.11754905e+00, 9.09249487e-01],
[ 2.03238008e-01, 9.53967400e-01],
[ 5.81600370e-01, 3.95141464e-01],
[ 2.25411614e-01, -3.07386728e-02],
[ 1.13811138e+00, 6.71608570e-01],
[ 9.15506977e-01, 5.61021728e-01],
[ 1.13811138e+00, 6.71608570e-01],
[ 2.69867370e-01, -7.55444872e-01],
[ 9.37789126e-01, 8.21021601e-01],
[ 7.37466870e-01, 9.70434632e-01],
[ 1.00452703e+00, 3.56315275e-01],
[ 1.00441849e+00, -8.88390671e-01],
[ 7.81922626e-01, 2.45728433e-01],
[ 9.19358053e-02, 8.98673979e-01],
[ 1.14109412e-01, -8.60320940e-02]]),
array([[-0.82345966, -0.12102114],
[-0.82345966, -0.12102114],
[-0.87688013, -0.03868193],
[-0.7700392 , -0.20336036],
[-0.82345966, -0.12102114],
[-0.75049545, 0.04475237],
[-0.86710826, 0.08537443],
[-0.7700392 , -0.20336036],
[-0.82345966, -0.12102114],
[-0.7263906 , -0.40975593],
[-0.7700392 , -0.20336036],
[-0.71661873, -0.28569957],
[-0.77981107, -0.32741672],
[-0.94007247, -0.08039908],
[-0.9303006 , 0.04365728],
[-0.85733639, 0.20943079],
[-0.96417733, 0.37410921],
[-0.86710826, 0.08537443],
[-0.70684686, -0.16164321],
[-0.81368779, 0.00303522],
[-0.66319826, -0.36803878],
[-0.85733639, 0.20943079],
[-1.03714154, 0.2083357 ],
[-0.79414405, 0.25114794],
[-0.55635732, -0.5327172 ],
[-0.71661873, -0.28569957],
[-0.80391592, 0.12709158],
[-0.7700392 , -0.20336036],
[-0.82345966, -0.12102114],
[-0.71661873, -0.28569957],
[-0.71661873, -0.28569957],
[-0.85733639, 0.20943079],
[-0.7263906 , -0.40975593],
[-0.82345966, -0.12102114],
[-0.7700392 , -0.20336036],
[-0.9303006 , 0.04365728],
[-0.87688013, -0.03868193],
[-0.77981107, -0.32741672],
[-0.87688013, -0.03868193],
[-0.7700392 , -0.20336036],
[-0.92052873, 0.16771364],
[-0.92052873, 0.16771364],
[-0.87688013, -0.03868193],
[-0.89121312, 0.53988272],
[-0.64365452, -0.11992606],
[-0.86710826, 0.08537443],
[-0.71661873, -0.28569957],
[-0.82345966, -0.12102114],
[-0.7700392 , -0.20336036],
[-0.82345966, -0.12102114],
[ 0.41563263, -0.36146826],
[ 0.26514309, 0.00960574],
[ 0.47882497, -0.31975111],
[ 0.08533795, 0.00851065],
[ 0.31856356, -0.07273347],
[ 0.35244029, -0.40318541],
[ 0.32833543, 0.05132289],
[-0.15765954, -0.03430159],
[ 0.40586076, -0.48552462],
[-0.01173112, 0.29724544],
[-0.0508186 , -0.19898001],
[ 0.10488169, 0.25662337],
[ 0.21628374, -0.61067607],
[ 0.41563263, -0.36146826],
[-0.12834393, 0.3378675 ],
[ 0.25537122, -0.11445062],
[ 0.26514309, 0.00960574],
[ 0.26970421, -0.69301528],
[ 0.26514309, 0.00960574],
[ 0.11921467, -0.32194128],
[ 0.29445871, 0.38177482],
[ 0.08533795, 0.00851065],
[ 0.47882497, -0.31975111],
[ 0.50292982, -0.7742594 ],
[ 0.24559935, -0.23850698],
[ 0.25537122, -0.11445062],
[ 0.4690531 , -0.44380747],
[ 0.44494824, 0.01070082],
[ 0.26514309, 0.00960574],
[-0.0508186 , -0.19898001],
[ 0.0657942 , -0.23960207],
[ 0.05602233, -0.36365843],
[ 0.07556607, -0.11554571],
[ 0.54201731, -0.27803396],
[ 0.26514309, 0.00960574],
[ 0.2214945 , 0.21600131],
[ 0.37198403, -0.15507269],
[ 0.29901982, -0.3208462 ],
[ 0.13875841, -0.07382856],
[ 0.08533795, 0.00851065],
[ 0.34266842, -0.52724177],
[ 0.36221216, -0.27912905],
[ 0.12898654, -0.19788492],
[-0.15765954, -0.03430159],
[ 0.19217888, -0.15616777],
[ 0.23582748, -0.36256335],
[ 0.19217888, -0.15616777],
[ 0.24559935, -0.23850698],
[-0.36156954, 0.41911162],
[ 0.13875841, -0.07382856],
[ 0.62996415, 0.83847329],
[ 0.41107152, 0.34115276],
[ 0.75113807, 0.09523021],
[ 0.72182245, -0.27693887],
[ 0.654069 , 0.38396499],
[ 1.12508135, -0.48114427],
[ 0.1778459 , 0.42239688],
[ 1.09576573, -0.85331336],
[ 0.82866339, -0.4416173 ],
[ 0.68338462, 0.75613408],
[ 0.36742292, 0.54754833],
[ 0.51791245, 0.17647433],
[ 0.53745619, 0.42458706],
[ 0.31400245, 0.62988754],
[ 0.19282853, 1.37313062],
[ 0.34331806, 1.00205663],
[ 0.66840199, -0.19459966],
[ 1.13485322, -0.35708791],
[ 1.19804556, -0.31537076],
[ 0.53224544, -0.40209032],
[ 0.55699994, 0.67269978],
[ 0.26058198, 0.71222675],
[ 1.22215041, -0.76987906],
[ 0.34787918, 0.29943561],
[ 0.64429713, 0.25990863],
[ 0.93550433, -0.60629572],
[ 0.29445871, 0.38177482],
[ 0.34787918, 0.29943561],
[ 0.59087666, 0.34224784],
[ 0.91596059, -0.85440844],
[ 0.9452762 , -0.48223936],
[ 1.06188901, -0.52286142],
[ 0.54722807, 0.54864342],
[ 0.5856659 , -0.48442953],
[ 0.89641684, -1.10252117],
[ 0.77068181, 0.34334293],
[ 0.45993087, 0.96143456],
[ 0.66840199, -0.19459966],
[ 0.29445871, 0.38177482],
[ 0.48403573, 0.50692627],
[ 0.45993087, 0.96143456],
[ 0.23647713, 1.16673505],
[ 0.41107152, 0.34115276],
[ 0.66384087, 0.50802135],
[ 0.46970274, 1.08549092],
[ 0.28989759, 1.08439584],
[ 0.35765105, 0.42349197],
[ 0.42084339, 0.46520912],
[ 0.39673853, 0.91971741],
[ 0.45472011, 0.13475719]]))