Multivariate Analysis ch1 (Overview)

2 篇文章 0 订阅
1 篇文章 0 订阅

This is the note of book “Applied multivariate methods for data analysts”1.

Outlines:

  • Data: Response Variable v.s. Experimental Units;
  • Type of Methods: Variable-directed techinique v.s. Individual-directed techniques;
  • Notations and definitions;
  • Estimator Statistics;
  • About multivariate computing: outliers, missing values, standardization;

1. Data

Multivariate data is common but complex, thus is very important and the major goal is simplify it.

Two aspects of data:

  1. Response variables;
  2. The experimental units;

Methods focus on the relationship among the response variables, the relationship among the experimental units and the relationship between the response variables and the experimental units;

2. Type of Methods

2.1 Variable-directed techinique

  • PCA (Principle component analysis)
  • FA (Factor analysis)
  • regression (Logistic regression)
  • CCA (Canonical correlation analysis)

These methods mainly operates on the correlation matrix and focus on the column of data matrix: the response variable.

2.2 Individual-directed techniques

  • DA (Discriminant analysis)
  • CA (Cluster analysis)
  • MANOVA (Multivariate analysis of variance)

Remarks:

  1. These methods focus on the row of data matrix: the observations or experimental units;
  2. Many MA methods require the Independence of experimental units;

3. Notations and definitions

NotationExplanation
p variables
Nsample size
X=(xrj)N×p,r=1,...,N;j=1,...,p data matrix
xr=(xr1,...,xrp) the r ’th observation
r,s,tsubscript for experimental units
i,j,k subscript for response variables

3.1 Multivariate normal distribution

(Def): x=(x1,...,xp) follows a multivariate normal distribution if a ,

ax=i=1paixi

follows a univariate normal distribution.
Denote it by: xNp(μ,Σ) , the p.d.f is :
fx(x,μ,Σ)=1(2π)p|Σ|1/2exp{12[(xμ)Σ1(xμ)]};

3.2 Mathmatical numbers:

  • Mean vector: μ=E(x)=(E(x1),...,E(xp))=(μ1,...,μp);
  • Variance-covariance matrix: Σ=cov(X=E[(xμ)(xμ)])=(σij)p×p=σ11σp1σ12σp2σ1pσpp
    • σii=E[(xiμi)2] ;
    • σij=cov(xi,xj) ;
  • Correlation matrix;
    P=1ρ21ρp1ρ121ρp2ρ1pρ2p1

4. Estimator Statistics;

4.1. Unbiased estimators:

μ^=1N(r=1Nxr)=rowMeans(X);

Σ^=1N1[r=1N(xrμ^)(xrμ^)]=(σ^ij)p×p;

σ^ij=1N1[r=1N(xriμ^i)(xrjμ^j))];

4.2. Biased but commonly use estimators:

rij=ρ^ij=σ^ijσ^iiσ^jj;

R=P^=1r21rp1r121rp2r1pr2p1;

5. About multivariate computing: outliers, missing values, standardization;

5.1 outliers

Detect it by plot or PCA;
Dealing with it:: analyze the impact of outliers on the results (with outliers v.s. without outliers);

5.2 missing values

use row means (or  KNN) to replace it;
remove the corresponding row;

5.3 standardization

zrj=xrjμ^jσ^jj;r=1,..,N;j=1,..,p.

Standardization is the default operation in the computer programs.


References


  1. Johnson D E, 约翰逊. Applied multivariate methods for data analysts[M]. Pacific Grove, CA: Duxbury Press, 1998.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值