Covariance Matrices and Data Distributions

Correlation between variables in a K-dimensional dataset are often summarized by a K \times K covariance matrix. To get a better understanding of how correlation matrices characterize correlations between data points, we plot data points drawn from 3 different 2-dimensional Gaussian distributions, each of which is defined by a different covariance matrix.

The left plots below display the 2 \times 2 covariance matrix for each Gaussian distribution. The values along the diagonal represent the variance of the data along each dimension, and the off-diagonal values represent the covariances between the dimensions. Thus the i,j-th entry of each matrix represents the correlation between the i-th and j-th dimensions. The right plots show data drawn from the corresponding 2D Gaussian.

whitening-Covariances

The top row plot display a covariance matrix equal to the identity matrix, and the points drawn from the corresponding Gaussian distribution. The diagonal values are 1, indicating the data have variance of 1 along both of the dimensions. Additionally, the off-diagonal elements are zero, meaning that the two dimensions are uncorrelated.  We can see this in the data drawn from the distribution as well. The data are distributed in a sphere about origin. For such a distribution of points, it is difficult (impossible) to draw any single regression line that can predict the second dimension from the first, and vice versa. Thus an identity covariance matrix is equivalent to having independent dimensions, each of which has unit (i.e. 1) variance. Such a dataset is often called “white” (this naming convention comes from the notion that white noise signals–which can be sampled from independent Gaussian distributions–have equal power at all frequencies in the Fourier domain).

The middle row plots the points that result from a diagonal, but not identity covariance matrix. The off-diagonal elements are still zero, indicating that the dimensions are uncorrelated. However, the variances along each dimension are not equal to one, and are not equal. This is demonstrated by the elongated distribution in red. The elongation is along the second dimension, as indicated by the larger value in the bottom-right (point (i,j) = (2,2)) of the covariance matrix.

The bottom row plots points that result from a non-diagonal covariance matrix. Here the off-diagonal elements of covariance matrix have non-zero values, indicating a correlation between the dimensions. This correlation is reflected in the distribution of drawn datapoints (in blue). We can see that the primary axis along which the points are distributed is not along either of the dimensions, but a linear combination of the dimensions.

The MATLAB code to create the above plots is here

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
% INITIALIZE SOME CONSTANTS
mu = [0 0];         % ZERO MEAN
S = [1 .9; .9 3];   % NON-DIAGONAL COV.
SDiag = [1 0; 0 3]; % DIAGONAL COV.
SId = eye(2);       % IDENTITY COV.
 
% SAMPLE SOME DATAPOINTS
nSamples = 1000;
samples = mvnrnd(mu,S,nSamples)';
samplesId = mvnrnd(mu,SId,nSamples)';
samplesDiag = mvnrnd(mu,SDiag,nSamples)';
 
% DISPLAY
subplot(321);
imagesc(SId); axis image,
caxis([0 1]), colormap hot, colorbar
title( 'Identity Covariance' )
 
subplot(322)
plot(samplesId(1,:),samplesId(2,:), 'ko' ); axis square
xlim([-5 5]), ylim([-5 5])
grid
title( 'White Data' )
 
subplot(323);
imagesc(SDiag); axis image,
caxis([0 3]), colormap hot, colorbar
title( 'Diagonal Covariance' )
 
subplot(324)
plot(samplesDiag(1,:),samplesDiag(2,:), 'r.' ); axis square
xlim([-5 5]), ylim([-5 5])
grid
title( 'Uncorrelated Data' )
 
subplot(325);
imagesc(S); axis image,
caxis([0 3]), colormap hot, colorbar
title( 'Non-diagonal Covariance' )
 
subplot(326)
plot(samples(1,:),samples(2,:), 'b.' ); axis square
xlim([-5 5]), ylim([-5 5])
grid
title( 'Correlated Data' )
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本方法。编译原理不仅是计算机科学理论的重要组成部分,也是实现高效、可靠的计算机程序设计的关键。本文将对编译原理的基本概念、发展历程、主要内容和实际应用进行详细介绍编译原理是计算机专业的一门核心课程,旨在介绍编译程序构造的一般原理和基本

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值