统计学习精要 (Elements of Statistical Learning ) 习题 2.4

最新推荐文章于 2018-11-25 17:44:43 发布

秋海棠的歌声

最新推荐文章于 2018-11-25 17:44:43 发布

阅读量1.2k

点赞数 2

分类专栏：统计学习精要文章标签：统计学习精要 ESL 机器学习

本文链接：https://blog.csdn.net/zejianli/article/details/53886917

版权

统计学习精要专栏收录该内容

13 篇文章 0 订阅

订阅专栏

统计学习精要 (Elements of Statistical Learning ) 习题 2.4

问题：

The edge effect problem discussed on page 23 is not peculiar to uniform sampling from bounded domains. Consider inputs drawn from a spherical multinormal distribution $X \sim {N}(0,{I}_p)$ . The squared distance from any sample point to the origin has a ${X}^2_p$ distribution with mean $p$ . Consider a prediction point $x_0$ drawn from this distribution, and let $a=x_0/\|x_0\|$ be an associated unit vector. Let $z_i=a^T x_i$ be the projection of each of the training points on this direction.
Show that the $z_i$ are distributed $N(0,1)$ with expected squared distance from the origin $1$ , while the target point has expected squared distance $p$ from the origin. Hence for $p = 10$ , a randomly drawn test point is about $3.1$ standard deviations from the origin, while all the training points are on average one standard deviation along direction $a$ . So most prediction points see themselves as lying on the edge of the training set.

对于在一定有界范围中的均匀抽样来说，第23页所讨论的边界效应问题并不是一个特殊或奇怪的现象。假设我们有一些从多维球状正态分布 $X \sim {N}({0},{I}_p)$ 抽样的输入数据，那么从任何一个抽样点到原点的距离的平方都服从自由度为 $p$ 的卡方分布，其期望为 $p$ 。记其中一个从这分布中采样的点为 $x_0$ ，并令 $a=x_0/\|x_0\|$ 为 $x_0$ 方向上的单位向量。让 $z_i=a^T x_i$ 为每一个训练数据点在 $a$ 方向上的投影。

证明 $z_i$ 服从标准正态分布 $N(0,1)$ 且到原点的距离平方的期望为1，而原来的 $x_0$ 到原点距离平方的期望则为 $p$ 。因此，对于 $p=10$ , 一个随机抽样的测试点到原点的距离大约是3.1个标准差，而所有训练点在 $a$ 方向上的距离平均只有一个标准差。所以，在大部分测试点看来，他们都位于训练集的边缘。

思路：

首先说明一下第一部分的一个点。对于任意随机向量 $x_i$ ，其到原点的距离平方为 $\|x_i-0\|^2=\sum_{j=1}^p x_{ij}^2$ 。因为其协方差矩阵是 ${I}_p$ ，所以向量中任意两个元素线性独立。而对于多维正态分布线性独立等同于独立，因此上述距离平方则是 $p$ 个独立的服从标准正态分布的随机变量的平方和，正好服从自由度为 $p$ 的卡方分布。