reduced rank regression model
关于reduced rank regression model的内容可以直接跳到Multivariate Reduced-Rank Regression的部分
Multivariate linear regression is a natural extension of multiple linear regression in that both techniques try to interpret possible linear relationships between certain input and output variables. Multiple regression is concerned with studying to what extent the behavior of a single output variable Y is influenced by a set of r input variables X = (X1, ··· ,Xr) T ^T T .
Multivariate regression has s output variables Y = (Y1, ··· ,Ys) T ^T T , each of whose behavior may be influenced by exactly the same set of inputs
X = (X1, ··· ,Xr) T ^T T .
So, not only are the components of X correlated with each other, but in multivariate regression, the components of Y are also correlated with each other (and with the components of X). In this chapter, we are interested in estimating the regression relationship between Y and X, taking into account the various dependencies between the r-vector X and the s-vector Y and the dependencies within X and within Y.
we describe the multivariate reduced-rank regression model (RRR) (Izenman, 1975), which is an enhancement of the classical multivariate regression model and has recently received research attention in the statistics and econometrics literature. The following reasons explain the popularity of this model: RRR provides a unified approach to many of the diverse classical multivariate statistical techniques; it lends itself quite naturally to analyzing a wide variety of statistical problems involving reduction of dimensionality and the search for structure in multivariate data; and it is relatively simple to program because the regression estimates depend only upon the sample covariance matrices of X and Y and the eigendecomposition of a certain symmetric matrix that generalizes the multiple squared correlation coefficient R2 from multiple regression.
我们需要考虑X是随机和非随机两种情况
The Fixed-X Case
Let Y = (Y1, ··· ,Ys) T ^T T be a random s-vector-valued output variate with
mean vector µ Y µ_Y µY and covariance matrix Σ Y Y Σ_{Y Y} ΣYY , and let X = (X1, ··· ,Xr) T ^T T
be a fixed (nonstochastic) r-vector-valued input variate. The components
of the output vector Y will typically be continuous responses, and the
components of the input vector X may be indicator or “dummy” variables
that are set up by the researcher to identify known groupings of the data
associated with distinct subpopulations or experimental conditions.
Suppose we observe n replications,
( X j T , Y j T ) τ (X_j^T , Y_j^T )^τ (XjT,YjT)τ, j = 1, 2,… ,n,
on the (r + s)-vector ( X τ , Y τ ) τ (X^τ , Y^τ )^τ (Xτ,Yτ)τ . We define an (r × n)-matrix X and an
(s × n)-matrix Y by X = ( X 1 , ⋅ ⋅ ⋅ , X n ) , Y = ( Y 1 , ⋅ ⋅ ⋅ , Y n ) \mathcal X = (X1, ··· , Xn), \mathcal Y = (Y1, ··· , Yn) X=(X1,⋅⋅⋅,Xn),Y=(Y1,⋅⋅⋅,Yn).
Form the mean vectors,
X ˉ = n − 1 ∑ j = 1 n X j \bar{X} = n^{−1}\sum_{j=1}^n Xj Xˉ=n