Covariance Matrices and Data Distributions

最新推荐文章于 2023-01-07 17:13:45 发布

juliosun

最新推荐文章于 2023-01-07 17:13:45 发布

阅读量418

点赞数

分类专栏：学习笔记

学习笔记专栏收录该内容

56 篇文章 0 订阅

订阅专栏

Correlation between variables in a $K$ -dimensional dataset are often summarized by a $K \times K$ covariance matrix. To get a better understanding of how correlation matrices characterize correlations between data points, we plot data points drawn from 3 different 2-dimensional Gaussian distributions, each of which is defined by a different covariance matrix.

The left plots below display the $2 \times 2$ covariance matrix for each Gaussian distribution. The values along the diagonal represent the variance of the data along each dimension, and the off-diagonal values represent the covariances between the dimensions. Thus the $i,j$ -th entry of each matrix represents the correlation between the $i$ -th and $j$ -th dimensions. The right plots show data drawn from the corresponding 2D Gaussian.

The top row plot display a covariance matrix equal to the identity matrix, and the points drawn from the corresponding Gaussian distribution. The diagonal values are 1, indicating the data have variance of 1 along both of the dimensions. Additionally, the off-diagonal elements are zero, meaning that the two dimensions are uncorrelated. We can see this in the data drawn from the distribution as well. The data are distributed in a sphere about origin. For such a distribution of points, it is difficult (impossible) to draw any single regression line that can predict the second dimension from the first, and vice versa. Thus an identity covariance matrix is equivalent to having independent dimensions, each of which has unit (i.e. 1) variance. Such a dataset is often called “white” (this naming convention comes from the notion that white noise signals–which can be sampled from independent Gaussian distributions–have equal power at all frequencies in the Fourier domain).

The middle row plots the points that result from a diagonal, but not identity covariance matrix. The off-diagonal elements are still zero, indicating that the dimensions are uncorrelated. However, the variances along each dimension are not equal to one, and are not equal. This is demonstrated by the elongated distribution in red. The elongation is along the second dimension, as indicated by the larger value in the bottom-right (point $(i,j) = (2,2)$ ) of the covariance matrix.

The bottom row plots points that result from a non-diagonal covariance matrix. Here the off-diagonal elements of covariance matrix have non-zero values, indicating a correlation between the dimensions. This correlation is reflected in the distribution of drawn datapoints (in blue). We can see that the primary axis along which the points are distributed is not along either of the dimensions, but a linear combination of the dimensions.

The MATLAB code to create the above plots is here

 
        % INITIALIZE SOME CONSTANTS 
       
        mu = [0 0];          
        % ZERO MEAN 
       
        S = [1 .9; .9 3];    
        % NON-DIAGONAL COV. 
       
        SDiag = [1 0; 0 3];  
        % DIAGONAL COV. 
       
        SId = eye(2);        
        % IDENTITY COV. 
       
        % SAMPLE SOME DATAPOINTS 
       
        nSamples = 1000; 
       
        samples = mvnrnd(mu,S,nSamples)'; 
       
        samplesId = mvnrnd(mu,SId,nSamples)'; 
       
        samplesDiag = mvnrnd(mu,SDiag,nSamples)'; 
       
        % DISPLAY 
       
        subplot(321); 
       
        imagesc(SId); axis image, 
       
        caxis([0 1]), colormap hot, colorbar 
       
        title( 
        'Identity Covariance' 
        ) 
       
        subplot(322) 
       
        plot(samplesId(1,:),samplesId(2,:), 
        'ko' 
        ); axis square 
       
        xlim([-5 5]), ylim([-5 5]) 
       
        grid 
       
        title( 
        'White Data' 
        ) 
       
        subplot(323); 
       
        imagesc(SDiag); axis image, 
       
        caxis([0 3]), colormap hot, colorbar 
       
        title( 
        'Diagonal Covariance' 
        ) 
       
        subplot(324) 
       
        plot(samplesDiag(1,:),samplesDiag(2,:), 
        'r.' 
        ); axis square 
       
        xlim([-5 5]), ylim([-5 5]) 
       
        grid 
       
        title( 
        'Uncorrelated Data' 
        ) 
       
        subplot(325); 
       
        imagesc(S); axis image, 
       
        caxis([0 3]), colormap hot, colorbar 
       
        title( 
        'Non-diagonal Covariance' 
        ) 
       
        subplot(326) 
       
        plot(samples(1,:),samples(2,:), 
        'b.' 
        ); axis square 
       
        xlim([-5 5]), ylim([-5 5]) 
       
        grid 
       
        title( 
        'Correlated Data' 
        )

juliosun

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Covariance Matrices and Data Distributions

Correlation between variables in a -dimensional dataset are often summarized by a covariance matrix. To get a better understanding of how correlation matrices characterize correlations between da
复制链接

扫一扫