数据分析与预测(四)——相关性分析

本文详细介绍了变量相关性分析的几种方法,包括自相关分析、偏相关分析和简单相关分析,并通过绘制相关矩阵图和相关层次图进行可视化。此外,探讨了互相关分析在航空成本与航线输出之间的应用,利用LakeHuron数据集展示了时间序列的互相关。最后,通过典型相关性分析,展示了如何在鸢尾花数据集中提取高度相关的特征变量,以提高数据分析效率。
摘要由CSDN通过智能技术生成

0 前言

变量之间的相关性分析主要包括:

  1. 分析变量自身的规律
    • 自相关分析
    • 偏相关分析
  2. 分析任意两个等长数列之间的相关性
    • 简单相关分析
  3. 允许在一定的间隔下进行简单的相关分析
    • 互相关分析
  4. 分析两组变量的相关性
    • 典型的相关分析

1 相关图的绘制

1.1 相关矩阵图

	import matplotlib.pyplot as plt
	import numpy as np
    import pandas as pd
    from sklearn import datasets
    import seaborn as sns

    iris = datasets.load_iris()
    iris_data = pd.DataFrame(iris.data, columns=iris.feature_names)
    iris_data['species'] = iris.target_names[iris.target]

    df = iris_data.drop(columns='species')

    corr = df.corr()
    corrplot(corr, cmap='Spectral', s=2000)

    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_rows', None)
    print('corr: \n', corr)
    

corrplot函数

def corrplot(corr, cmap, s):
    import matplotlib.pyplot as plt
    x, y, z = [], [], []
    N = corr.shape[0]
    for row in range(N):
        for column in range(N):
            x.append(row)
            y.append(N - 1 - column)
            z.append(round(corr.iloc[row, column], 2))
    sc = plt.scatter(x, y, c=z, vmin=-1, vmax=1, s=s * np.absolute(z), cmap=plt.cm.get_cmap(cmap))
    plt.colorbar(sc)
    plt.xlim((-0.5, N - 0.5))
    plt.ylim((-0.5, N - 0.5))
    plt.xticks(range(N), corr.columns, rotation=90)
    plt.yticks(range(N)[::-1], corr.columns)
    plt.grid(False)
    ax = plt.gca()

    ax.xaxis.set_ticks_position('top')

    internal_space = [0.5 + k for k in range(4)]
    [plt.plot([m, m], [-.05, N - 0.5], c='lightgray') for m in internal_space]
    [plt.plot([-.05, N - 0.5], [m, m], c='lightgray') for m in internal_space]
    plt.show() 

iris 数据集

     sepal length (cm)  sepal width (cm)  ...  petal width (cm)    species
0                  5.1               3.5  ...               0.2     setosa
1                  4.9               3.0  ...               0.2     setosa
2                  4.7               3.2  ...               0.2     setosa
3                  4.6               3.1  ...               0.2     setosa
4                  5.0               3.6  ...               0.2     setosa
..                 ...               ...  ...               ...        ...
145                6.7               3.0  ...               2.3  virginica
146                6.3               2.5  ...               1.9  virginica
147                6.5               3.0  ...               2.0  virginica
148                6.2               3.4  ...               2.3  virginica
149                5.9               3.0  ...               1.8  virginica

计算相关系数矩阵

                    sepal length (cm)  sepal width (cm)  petal length (cm)   petal width (cm)  
sepal length (cm)           1.000000         -0.117570           0.871754           0.817941     
sepal width (cm)           -0.117570          1.000000          -0.428440           -0.366126   
petal length (cm)           0.871754         -0.428440           1.000000           0.962865   
petal width (cm)            0.817941         -0.366126           0.962865           1.000000  

在这里插入图片描述

1.2 相关层次图

    import numpy as np
    import pandas as pd
    mtcars = pd.read_csv('data/mtcars.csv', index_col=0)
    print(mtcars)
    d = np.sqrt(1 - mtcars.corr() * mtcars.corr())
    d.fillna(0,inplace=True)
    print(d)
    
    d.dropna()
    from scipy.spatial.distance import pdist, squareform
    from scipy.cluster.hierarchy import linkage
    from scipy.cluster.hierarchy import dendrogram
    row_cluster = linkage(pdist(d, metric='euclidean'), method='ward')
    row_dendr = dendrogram(row_cluster, labels=d.index)
    plt.tight_layout()
    plt.ylabel('Euclidean distance')
    plt.plot([0, 2000], [1.5, 1.5], c='gray', linestyle='--')
    plt.show()

mtcars.csv

"","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
"Valiant",18.1,6,225,105,2.76,3.46,20.22,1,0,3,1
"Duster 360",14.3,8,360,245,3.21,3.57,15.84,0,0,3,4
"Merc 240D",24.4,4,146.7,62,3.69,3.19,20,1,0,4,2
"Merc 230",22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
"Merc 280",19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
"Merc 280C",17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
"Merc 450SE",16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
"Merc 450SL",17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3
"Merc 450SLC",15.2,8,275.8,180,3.07,3.78,18,0,0,3,3
"Cadillac Fleetwood",10.4,8,472,205,2.93,5.25,17.98,0,0,3,4
"Lincoln Continental",10.4,8,460,215,3,5.424,17.82,0,0,3,4
"Chrysler Imperial",14.7,8,440,230,3.23,5.345,17.42,0,0,3,4
"Fiat 128",32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
"Honda Civic",30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
"Toyota Corolla",33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
"Toyota Corona",21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
"Dodge Challenger",15.5,8,318,150,2.76,3.52,16.87,0,0,3,2
"AMC Javelin",15.2,8,304,150,3.15,3.435,17.3,0,0,3,2
"Camaro Z28",13.3,8,350,245,3.73,3.84,15.41,0,0,3,4
"Pontiac Firebird",19.2,8,400,175,3.08,3.845,17.05,0,0,3,2
"Fiat X1-9",27.3,4,79,66,4.08,1.935,18.9,1,1,4,1
"Porsche 914-2",26,4,120.3,91,4.43,2.14,16.7,0,1,5,2
"Lotus Europa",30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
"Ford Pantera L",15.8,8,351,264,4.22,3.17,14.5,0,1,5,4
"Ferrari Dino",19.7,6,145,175,3.62,2.77,15.5,0,1,5,6
"Maserati Bora",15,8,301,335,3.54,3.57,14.6,0,1,5,8
"Volvo 142E",21.4,4,121,109,4.11,2.78,18.6,1,1,4,2

mtcars数据集读取结果:

                    mpg  cyl   disp   hp  drat  ...   qsec  vs  am  gear  carb
Mazda RX4            21.0    6  160.0  110  3.90  ...  16.46   0   1     4     4
Mazda RX4 Wag        21.0    6  160.0  110  3.90  ...  17.02   0   1     4     4
Datsun 710           22.8    4  108.0   93  3.85  ...  18.61   1   1     4     1
Hornet 4 Drive       21.4    6  258.0  110  3.08  ...  19.44   1   0     3     1
Hornet Sportabout    18.7    8  360.0  175  3.15  ...  17.02   0   0     3     2
Valiant              18.1    6  225.0  105  2.76  ...  20.22   1   0     3     1
Duster 360           14.3    8  360.0  245  3.21  ...  15.84   0   0     3     4
Merc 240D            24.4    4  146.7   62  3.69  ...  20.00   1   0     4     2
Merc 230             22.8    4  140.8   95  3.92  ...  22.90   1   0     4     2
Merc 280             19.2    6  167.6  123  3.92  ...  18.30   1   0     4     4
Merc 280C            17.8    6  167.6  123  3.92  ...  18.90   1   0     4     4
Merc 450SE           16.4    8  275.8  180  3.07  ...  17.40   0   0     3     3
Merc 450SL           17.3    8  275.8  180  3.07  ...  17.60   0   0     3     3
Merc 450SLC          15.2    8  275.8  180  3.07  ...  18.00   0   0     3     3
Cadillac Fleetwood   10.4    8  472.0  205  2.93  ...  17.98   0   0     3     4
Lincoln Continental  10.4    8  460.0  215  3.00  ...  17.82   0   0     3     4
Chrysler Imperial    14.7    8  440.0  230  3.23  ...  17.42   0   0     3     4
Fiat 128             32.4    4   78.7   66  4.08  ...  19.47   1   1     4     1
Honda Civic          30.4    4   75.7   52  4.93  ...  18.52   1   1     4     2
Toyota Corolla       33.9    4   71.1   65  4.22  ...  19.90   1   1     4     1
Toyota Corona        21.5    4  120.1   97  3.70  ...  20.01   1   0     3     1
Dodge Challenger     15.5    8  318.0  150  2.76  ...  16.87   0   0     3     2
AMC Javelin          15.2    8  304.0  150  3.15  ...  17.30   0   0     3     2
Camaro Z28           13.3    8  350.0  245  3.73  ...  15.41   0   0     3     4
Pontiac Firebird     19.2    8  400.0  175  3.08  ...  17.05   0   0     3     2
Fiat X1-9            27.3    4   79.0   66  4.08  ...  18.90   1   1     4     1
Porsche 914-2        26.0    4  120.3   91  4.43  ...  16.70   0   1     5     2
Lotus Europa         30.4    4   95.1  113  3.77  ...  16.90   1   1     5     2
Ford Pantera L       15.8    8  351.0  264  4.22  ...  14.50   0   1     5     4
Ferrari Dino         19.7    6  145.0  175  3.62  ...  15.50   0   1     5     6
Maserati Bora        15.0    8  301.0  335  3.54  ...  14.60   0   1     5     8
Volvo 142E           21.4    4  121.0  109  4.11  ...  18.60   1   1     4     2

计算获得相关系数矩阵

           mpg       cyl          disp  ...        am      gear      carb
mpg   0.000000  0.523278  5.307133e-01  ...  0.800126  0.877113  0.834555
cyl   0.523278  0.000000  4.316673e-01  ...  0.852574  0.870207  0.849873
disp  0.530713  0.431667  2.107342e-08  ...  0.806505  0.831470  0.918691
hp    0.630526  0.554104  6.118826e-01  ...  0.969975  0.992068  0.661650
drat  0.732124  0.714203  7.039859e-01  ...  0.701458  0.714525  0.995870
wt    0.497159  0.622656  4.598822e-01  ...  0.721422  0.812266  0.903965
qsec  0.908132  0.806494  9.010583e-01  ...  0.973224  0.977121  0.754544
vs    0.747698  0.585307  7.037821e-01  ...  0.985728  0.978547  0.821917
am    0.800126  0.852574  8.065052e-01  ...  0.000000  0.607841  0.998344
gear  0.877113  0.870207  8.314703e-01  ...  0.607841  0.000000  0.961709
carb  0.834555  0.849873  9.186911e-01  ...  0.998344  0.961709  0.000000

相关层次图
在这里插入图片描述

2 互相关分析

Airline.csv 数据集:

"","airline","year","cost","output","pf","lf"
"1",1,1,1140640,0.952757,106650,0.534487
"2",1,2,1215690,0.986757,110307,0.532328
"3",1,3,1309570,1.09198,110574,0.547736
"4",1,4,1511530,1.17578,121974,0.540846
"5",1,5,1676730,1.16017,196606,0.591167
"6",1,6,1823740,1.17376,265609,0.575417
"7",1,7,2022890,1.29051,263451,0.594495
"8",1,8,2314760,1.39067,316411,0.597409
"9",1,9,2639160,1.61273,384110,0.638522
"10",1,10,3247620,1.82544,569251,0.676287
"11",1,11,3787750,1.54604,871636,0.605735
"12",1,12,3867750,1.5279,997239,0.61436
"13",1,13,3996020,1.6602,938002,0.633366
"14",1,14,4282880,1.82231,859572,0.650117
"15",1,15,4748320,1.93646,823411,0.625603
"16",2,1,569292,0.520635,103795,0.490851
"17",2,2,640614,0.534627,111477,0.473449
"18",2,3,777655,0.655192,118664,0.503013
"19",2,4,999294,0.791575,114797,0.512501
"20",2,5,1203970,0.842945,215322,0.566782
"21",2,6,1358100,0.852892,281704,0.558133
"22",2,7,1501350,0.922843,304818,0.558799
"23",2,8,1709270,1,348609,0.57207
"24",2,9,2025400,1.19845,374579,0.624763
"25",2,10,2548370,1.34067,544109,0.628706
"26",2,11,3137740,1.32624,853356,0.58915
"27",2,12,3557700,1.24852,1003200,0.532612
"28",2,13,3717740,1.25432,941977,0.526652
"29",2,14,3962370,1.37177,856533,0.540163
"30",2,15,4209390,1.38974,821361,0.528775
"31",3,1,286298,0.262424,118788,0.524334
"32",3,2,309290,0.266433,123798,0.537185
"33",3,3,342056,0.306043,122882,0.582119
"34",3,4,374595,0.325586,131274,0.579489
"35",3,5,450037,0.345706,222037,0.606592
"36",3,6,510412,0.367517,278721,0.60727
"37",3,7,575347,0.409937,306564,0.582425
"38",3,8,669331,0.448023,356073,0.573972
"39",3,9,783799,0.539595,378311,0.654256
"40",3,10,913883,0.539382,555267,0.631055
"41",3,11,1041520,0.467967,850322,0.56924
"42",3,12,1125800,0.450544,1015610,0.589682
"43",3,13,1096070,0.468793,954508,0.587953
"44",3,14,1198930,0.494397,886999,0.565388
"45",3,15,1170470,0.493317,844079,0.577078
"46",4,1,145167,0.086393,114987,0.432066
"47",4,2,170192,0.09674,120501,0.439669
"48",4,3,247506,0.1415,121908,0.488932
"49",4,4,309391,0.169715,127220,0.484181
"50",4,5,354338,0.173805,209405,0.529925
"51",4,6,373941,0.164272,263148,0.532723
"52",4,7,420915,0.170906,316724,0.549067
"53",4,8,474017,0.17784,363598,0.55714
"54",4,9,532590,0.192248,389436,0.611377
"55",4,10,676771,0.242469,547376,0.645319
"56",4,11,880438,0.256505,850418,0.611734
"57",4,12,1052020,0.249657,1011170,0.580884
"58",4,13,1193680,0.273923,951934,0.572047
"59",4,14,1303390,0.371131,881323,0.59457
"60",4,15,1436970,0.421411,831374,0.585525
"61",5,1,91361,0.051028,118222,0.442875
"62",5,2,95428,0.052646,116223,0.462473
"63",5,3,98187,0.056348,115853,0.519118
"64",5,4,115967,0.066953,129372,0.529331
"65",5,5,138382,0.070308,243266,0.557797
"66",5,6,156228,0.073961,277930,0.556181
"67",5,7,183169,0.084946,317273,0.569327
"68",5,8,210212,0.095474,358794,0.583465
"69",5,9,274024,0.119814,397667,0.631818
"70",5,10,356915,0.150046,566672,0.604723
"71",5,11,432344,0.144014,848393,0.587921
"72",5,12,524294,0.1693,1005740,0.616159
"73",5,13,530924,0.172761,958231,0.605868
"74",5,14,581447,0.18667,872924,0.594688
"75",5,15,610257,0.213279,844622,0.635545
"76",6,1,68978,0.037682,117112,0.448539
"77",6,2,74904,0.039784,119420,0.475889
"78",6,3,83829,0.044331,116087,0.500562
"79",6,4,98148,0.050245,122997,0.500344
"80",6,5,118449,0.055046,194309,0.528897
"81",6,6,133161,0.052462,307923,0.495361
"82",6,7,145062,0.056977,323595,0.510342
"83",6,8,170711,0.06149,363081,0.518296
"84",6,9,199775,0.069027,386422,0.546723
"85",6,10,276797,0.092749,564867,0.554276
"86",6,11,381478,0.11264,874818,0.517766
"87",6,12,506969,0.154154,1013170,0.580049
"88",6,13,633388,0.186461,930477,0.556024
"89",6,14,804388,0.246847,851676,0.537791
"90",6,15,1009500,0.304013,819476,0.525775

LakeHuron.csv 数据集

"","time","value"
"1",1875,580.38
"2",1876,581.86
"3",1877,580.97
"4",1878,580.8
"5",1879,579.79
"6",1880,580.39
"7",1881,580.42
"8",1882,580.82
"9",1883,581.4
"10",1884,581.32
"11",1885,581.44
"12",1886,581.68
"13",1887,581.17
"14",1888,580.53
"15",1889,580.01
"16",1890,579.91
"17",1891,579.14
"18",1892,579.16
"19",1893,579.55
"20",1894,579.67
"21",1895,578.44
"22",1896,578.24
"23",1897,579.1
"24",1898,579.09
"25",1899,579.35
"26",1900,578.82
"27",1901,579.32
"28",1902,579.01
"29",1903,579
"30",1904,579.8
"31",1905,579.83
"32",1906,579.72
"33",1907,579.89
"34",1908,580.01
"35",1909,579.37
"36",1910,578.69
"37",1911,578.19
"38",1912,578.67
"39",1913,579.55
"40",1914,578.92
"41",1915,578.09
"42",1916,579.37
"43",1917,580.13
"44",1918,580.14
"45",1919,579.51
"46",1920,579.24
"47",1921,578.66
"48",1922,578.86
"49",1923,578.05
"50",1924,577.79
"51",1925,576.75
"52",1926,576.75
"53",1927,577.82
"54",1928,578.64
"55",1929,580.58
"56",1930,579.48
"57",1931,577.38
"58",1932,576.9
"59",1933,576.94
"60",1934,576.24
"61",1935,576.84
"62",1936,576.85
"63",1937,576.9
"64",1938,577.79
"65",1939,578.18
"66",1940,577.51
"67",1941,577.23
"68",1942,578.42
"69",1943,579.61
"70",1944,579.05
"71",1945,579.26
"72",1946,579.22
"73",1947,579.38
"74",1948,579.1
"75",1949,577.95
"76",1950,578.12
"77",1951,579.75
"78",1952,580.85
"79",1953,580.41
"80",1954,579.96
"81",1955,579.61
"82",1956,578.76
"83",1957,578.18
"84",1958,577.21
"85",1959,577.13
"86",1960,579.1
"87",1961,578.25
"88",1962,577.91
"89",1963,576.89
"90",1964,575.96
"91",1965,576.8
"92",1966,577.68
"93",1967,578.38
"94",1968,578.52
"95",1969,579.74
"96",1970,579.31
"97",1971,579.89
"98",1972,579.96

定义cff函数

def ccf(x, y, lag_max=100):
    import scipy.signal as sg
    result = sg.correlate(y - np.mean(y), x - np.mean(x), method='direct') / (np.std(y) * np.std(x) * len(x))
    print(result)
    length = int((len(result) - 1) / 2)
    low = length - lag_max
    high = length + (lag_max + 1)
    return result[low:high]

主程序

  import pandas as pd
  airmiles = pd.read_csv('data/Airline.csv', index_col=0)
  lakehuron = pd.read_csv('data/LakeHuron.csv', index_col=0)
  print(airmiles, lakehuron)
  lhdata = lakehuron.query("1937<=time<=1960")
  print('lhdata: \n', lhdata)

  x, y = airmiles.cost, lhdata.value

  out = ccf(x, y)
  for i in range(len(out)):
      plt.plot([i, i], [0, out[i]], 'k-')
      plt.plot(i, out[i], 'ko')
  plt.xlabel('lag', fontsize=14)
  plt.xticks(range(41), range(-10, 31, 1))
  plt.ylabel('cff', fontsize=14)
  plt.show()

相关系数:

[ 1.78514145e-03  5.92478798e-03  1.07711431e-02  1.62375213e-02
  2.35968067e-02  3.15228859e-02  3.72504754e-02  4.01446767e-02
  4.15659730e-02  4.14522639e-02  3.95607208e-02  3.65886233e-02
  3.43924642e-02  3.40786661e-02  3.34771847e-02  2.13545194e-02
  1.08781073e-02  1.37924353e-03 -1.23951224e-02 -2.60706526e-02
 -3.15678025e-02 -2.76053541e-02 -2.22309106e-02 -1.42409766e-02
 -6.88516151e-03 -8.77237939e-05  4.41299368e-03  1.61837117e-03
 -1.44623257e-03  2.93227138e-03 -9.09069824e-03 -1.07912110e-02
 -9.17867659e-03 -1.65786592e-02 -3.06394910e-02 -3.21292275e-02
 -2.46986825e-02 -2.58258455e-02 -1.39540595e-02 -5.12455438e-03
  3.61885614e-03  8.18961198e-03 -1.36093966e-03 -7.34720817e-03
  5.40984674e-03  1.37862342e-02  2.36918259e-02  3.23408573e-02
  2.67739290e-02  5.80319092e-03 -1.04151900e-02 -2.46682806e-02
 -4.15288769e-02 -2.92077906e-02 -1.88467386e-02 -5.74407471e-03
  4.22121761e-03  8.05875056e-05 -6.07171743e-03  2.21379437e-03
 -4.15210624e-02 -5.52956772e-02 -5.70528738e-02 -8.40669073e-02
 -1.24176191e-01 -1.23609410e-01 -9.15715344e-02 -8.36522415e-02
 -4.85569423e-02 -1.80509939e-02  1.16717406e-02  2.84501489e-02
  2.75126738e-03 -1.69804285e-02  1.95958583e-02  2.46379553e-02
  5.13412159e-02  7.74494465e-02  6.13194003e-02 -1.95774331e-03
 -3.82584311e-02 -6.36872665e-02 -1.15718388e-01 -8.11512076e-02
 -5.29467432e-02 -1.55029292e-02  8.13337443e-03 -1.47351742e-02
 -3.35766058e-02  4.80996133e-03  7.72490808e-02  1.26265288e-01
  1.62023469e-01  1.90127394e-01  1.82039714e-01  1.43368414e-01
  6.60608938e-02 -9.82960999e-03 -7.53935963e-03 -2.02840960e-02
 -2.83214760e-02 -2.89375092e-02 -2.24663341e-02 -1.90200474e-02
 -1.86750669e-02 -1.67850052e-02 -1.43932769e-02 -1.08643424e-02
 -6.23774854e-03 -2.48892458e-03 -8.53106509e-04  8.25399081e-05
  6.45593989e-05]

在这里插入图片描述

3 典型相关性分析

import pandas as pd
    import numpy as np
    from sklearn import datasets

    iris = datasets.load_iris()
    iris_data = pd.DataFrame(iris.data, columns=iris.feature_names)
    print(iris_data)

    # 计算相关系数
    iris_corr = iris_data.corr()
    print(iris_corr)

    # 将系数矩阵进行分组
    iris_corr_11 = iris_corr.iloc[0:2, 0:2]
    iris_corr_12 = iris_corr.iloc[0:2, 2:4]
    iris_corr_21 = iris_corr.iloc[2:4, 0:2]
    iris_corr_22 = iris_corr.iloc[2:4, 2:4]

    # 按照公式求解矩阵A B
    A = np.matmul(np.matmul(np.matmul(np.linalg.inv(iris_corr_11), iris_corr_12), np.linalg.inv(iris_corr_22)),
                  iris_corr_21)
    B = np.matmul(np.matmul(np.matmul(np.linalg.inv(iris_corr_22), iris_corr_21), np.linalg.inv(iris_corr_11)),
                  iris_corr_12)

    A_eig_values, A_eig_vectors = np.linalg.eig(A)
    B_eig_values, B_eig_vectors = np.linalg.eig(B)

    result = np.sqrt(A_eig_values)
    print("result: \t", result)

    # 验证
    a = round(A - np.matmul(np.matmul(A_eig_vectors, np.diag(A_eig_values)), np.linalg.inv(A_eig_vectors)), 5)
    b = round(B - np.matmul(np.matmul(B_eig_vectors, np.diag(B_eig_values)), np.linalg.inv(B_eig_vectors)), 5)
    print(a)
    print(b)

    # 验证典型变量的标准差是否为1

    iris_g1 = iris_data.iloc[:, 0:2]
    iris_g1 = iris_g1.apply(lambda x: (x - np.mean(x)) / np.std(x))
    iris_g2 = iris_data.iloc[:, 2:4]
    iris_g2 = iris_g2.apply(lambda x: (x - np.mean(x)) / np.std(x))

    # 求解A对应的特征变量并计算典型变量C1
    C1 = np.matmul(iris_g1, A_eig_vectors)
    print(C1.apply(np.std))
    print(C1.apply(np.mean))
    # 均值为0 标准差不为1 对特征向量进行伸缩变换
    eA=np.matmul(A_eig_vectors, np.diag(1/C1.apply(np.std)))
    C1=np.matmul(iris_g1,eA)
    print(C1.apply(np.std))
    print(C1.apply(np.mean))

    # 计算B

    C2= np.matmul(iris_g2, B_eig_vectors)
    print(C2.apply(np.std))
    print(C2.apply(np.mean))
    # 均值为0 标准差不为1 对特征向量进行伸缩变换
    eB=np.matmul(B_eig_vectors, np.diag(1/C2.apply(np.std)))
    C2=np.matmul(iris_g2,eB)
    print(C2.apply(np.std))
    print(C2.apply(np.mean))

    # 对C1 C2 的相关性进行验证
    print(round(pd.concat([C1, C2], axis=1).corr(),5))

    # 求解两组数据的典型的相关数据
    from sklearn.cross_decomposition import CCA
    cca=CCA(n_components=2)
    cca.fit(iris_g1, iris_g2)

    X_c, Y_c=cca.transform(iris_g1, iris_g2)
    result =round(pd.concat([pd.DataFrame(X_c,columns=iris_g1.columns), pd.DataFrame(Y_c,columns=iris_g2.columns)], axis=1).corr(),5)

    print(result)

输出结果:

     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                  5.1               3.5                1.4               0.2
1                  4.9               3.0                1.4               0.2
2                  4.7               3.2                1.3               0.2
3                  4.6               3.1                1.5               0.2
4                  5.0               3.6                1.4               0.2
..                 ...               ...                ...               ...
145                6.7               3.0                5.2               2.3
146                6.3               2.5                5.0               1.9
147                6.5               3.0                5.2               2.0
148                6.2               3.4                5.4               2.3
149                5.9               3.0                5.1               1.8

[150 rows x 4 columns]
                   sepal length (cm)  ...  petal width (cm)
sepal length (cm)           1.000000  ...          0.817941
sepal width (cm)           -0.117570  ...         -0.366126
petal length (cm)           0.871754  ...          0.962865
petal width (cm)            0.817941  ...          1.000000

[4 rows x 4 columns
result: 	 [0.940969   0.12393688]

     0    1
0  0.0 -0.0
1 -0.0  0.0
     0    1
0 -0.0 -0.0
1 -0.0  0.0

0    1.041196
1    0.951045
dtype: float64

0   -1.421085e-16
1   -9.118632e-16
dtype: float64

0    1.0
1    1.0
dtype: float64
0   -9.473903e-17
1   -9.592327e-16
dtype: float64

0    0.629124
1    0.200353
dtype: float64
0   -1.894781e-16
1   -7.993606e-17
dtype: float64

0    1.0
1    1.0
dtype: float64
0   -2.368476e-16
1   -3.552714e-16
dtype: float64

         0        1        0        1
0  1.00000  0.00000  0.94097  0.00000
1  0.00000  1.00000  0.00000  0.12394
0  0.94097  0.00000  1.00000  0.00000
1  0.00000  0.12394  0.00000  1.00000

                   sepal length (cm)  ...  petal width (cm)
sepal length (cm)            1.00000  ...          -0.00000
sepal width (cm)             0.00000  ...           0.12394
petal length (cm)            0.94097  ...          -0.00000
petal width (cm)            -0.00000  ...           1.00000

[4 rows x 4 columns]

Process finished with exit code 0

分析结果发现,典型的两个相关关系分别是0.94097 和 0.12394, 说明第一组典型的变量的相关性很强,后一组相关性较弱,通常选择第一相关系数用于分析

  • 5
    点赞
  • 74
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值