在多元统计和概率论中,散点矩阵是一种统计量,用来估计协方差矩阵,例如多元正态分布。
In multivariate statistics and probability theory, the scatter matrix is a statistic that is used to make estimates of the covariance matrix, for instance of the multivariate normal distribution.
定义(Definition)
Given n samples of m-dimensional data, represented as the m-by-n matrix,
X
=
[
x
1
,
x
2
,
.
.
.
,
x
n
]
X=[x_1,x_2,...,x_n]
X=[x1,x2,...,xn], the sample mean is
x
‾
=
1
n
∑
j
=
1
n
x
j
\overline x=\frac{1}{n}\sum_{j=1}^nx_j
x=n1j=1∑nxj
where x j x_j xj is the j-th column of X.
The scatter matrix is the m-by-m positive semi-definite matrix(协方差矩阵也是)
S
=
∑
j
=
1
n
(
x
j
−
x
‾
)
(
x
j
−
x
‾
)
T
=
∑
j
=
1
n
(
x
j
−
x
‾
)
⊗
(
x
j
−
x
‾
)
=
(
∑
j
=
1
n
x
j
x
j
T
−
n
x
‾
x
‾
T
)
S=\sum_{j=1}^n(x_j-\overline x)(x_j-\overline x)^T=\sum_{j=1}^n(x_j-\overline x)\otimes(x_j-\overline x)=\bigg(\sum_{j=1}^nx_jx_j^T-n\overline x \overline x^T\bigg)
S=j=1∑n(xj−x)(xj−x)T=j=1∑n(xj−x)⊗(xj−x)=(j=1∑nxjxjT−nxxT)
where T denotes matrix transpose, and multiplication is with regards to the outer product. The scatter matrix may be expressed more succinctly (简洁地) as
S
=
X
C
n
X
T
S=XC_nX^T
S=XCnXT
where C n C_n Cn is the n-by-n centering matrix.
Outer product: u ⊗ v = u v T u\otimes v=uv^T u⊗v=uvT
还不懂看这个:Outer product - Wikipedia
Centering matrix multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component of that vector.
Centering matrix - Wikipedia
应用(Application)
在给定n个样本的情况下,多元正态分布的协方差矩阵的最大似然估计可以表示为归一化散点矩阵
C
M
L
=
1
n
S
C_{ML}=\frac{1}{n}S
CML=n1S
当X的列分别从多元正态分布中独立采样时,则S具有Wishart分布。
The maximum likelihood estimate, given n samples, for the covariance matrix of a multivariate normal distribution can be expressed as the normalized scatter matrix
When the columns of X are independently sampled from a multivariate normal distribution, then S has a Wishart distribution.