The solution on the Elements of Statistical Learning

最新推荐文章于 2021-11-30 18:09:05 发布

Tongust

最新推荐文章于 2021-11-30 18:09:05 发布

阅读量4.2k

点赞数

分类专栏： CLRS ML 文章标签：统计学习

本文链接：https://blog.csdn.net/baidu_17640849/article/details/53707861

版权

ML 同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

CLRS

6 篇文章 0 订阅

订阅专栏

Preface
Ex 8
Ex 15 Random Forest
- 1
  - 0rho0
  - 0rho 0

Preface

If you find any errata or have a good idea, please contact me via tongust@163.com.

Ex. 8

Ex 8.1

First of all, we should refer to the theory Kullback-Leibler Divergence. Here, I just give a brief derivation.
To proof KL divergence, we use the Jensen Inequality:

\int p (x) f (x) d x \geq f {\int p (x) x d x} . . . . . . (1)

$\int p(x)f(x)dx \ge f\{\int p(x)xdx\}......(1)$
The constrains which the formula must satisfy are: the function

f(x) $f(x)$ must be a convex function at convex set.
In this case, we construct a simple convex function

f(x)=−ln(x) $f(x)=-ln(x)$ , and it has the following property:

- \int p (x) l n (x) d x \geq - l n {\int p (x) x d x} . . . . . . (2)

$-\int p(x)ln(x)dx \ge -ln\{\int p(x)xdx\}......(2)$
Substitute

x $x$ =

q(x)p(x) $\frac{q(x)}{p(x)}$ into (2):

- \int p (x) l n (q ( x ) p ( x )) d x \geq - l n {\int p (x) q ( x ) p ( x ) d x} . . . . . . (3)

$-\int p(x)ln(\frac{q(x)}{p(x)})dx \ge -ln\{\int p(x)\frac{q(x)}{p(x)}dx\}......(3)$

\int p (x) l n (q ( x ) p ( x )) d x \leq l n {\int q (x) d x} . . . . . . (4)

$\int p(x)ln(\frac{q(x)}{p(x)})dx \le ln\{\int q(x)dx\}......(4)$
As we know

q(x) $q(x)$ is a distribution function, it is obvious that

∫q(x)dx=1 $\int q(x)dx = 1$ .
Therefore, we get the KL divergence:

D K L (p | | q) = \int p (x) l n (q ( x ) p ( x )) d x \leq 0...... (5)

$D_{KL}(p||q) = \int p(x)ln(\frac{q(x)}{p(x)})dx \le 0......(5)$
Since we has shown that (8.61) is maximized as a function of

r(y) $r(y)$ when

r(y)=q(y) $r(y) = q(y)$ ,

−R(θ′,θ)=−∫p(Z|θ)ln(Zm|Z,θ′,θ) $-R(\theta',\theta)=-\int p(Z|\theta)ln(Z^m|Z, \theta',\theta)$ is a convex function and satisfies KL divergence. Hence

R(θ′,θ)−R(θ,θ)≤0 $R(\theta',\theta) - R(\theta,\theta) \le 0$

◻ $\square$

Ex. 8.2

Off the topics

Since this exercise is based on one paper $^{[1]}$ , I would say that our lovely authors of ESL just overestimated those poor readers like me to be excellent mathematicians.;)

Proof

I will use the notations in [1] instead of those in ESL which is a bit confusing to me. (Denote $Z^m$ as $y$ )
We want to proof : For a fixed value $\theta$ , there is a unique distribution, $P_{\theta}$ , given by $P_{\theta}(y)=P(y|z,\theta)$ , which maximizes the log-likelihood (8.48).
From the hint, we use Lagrange Multiplier to rewrite our formula.

$L (P (y), λ) =$ $L({P}(y), \lambda) =$ $- \sum y = 1 n P (y i) l n (P θ (z, y)) + \sum y i n P (y i) l n (P (y i)) + λ (1 - \sum y = 1 n P (y i)) . . . . . . (1)$ $-\sum_{y=1}^{n}P(y_i)ln(P_{\theta}(z, y))+\sum_{y_i}^{n}P(y_i)ln(P(y_i))+\lambda(1-\sum_{y=1}^{n}P(y_i))......(1)$
To get the stationary points, we set the gradient of $L(P(y), \lambda)$ W.R.T (with respect to) $P(y_i) (i = 1,2,...,n)$ with zero :
$d L d P ( y i ) = - l n P θ (z, y) + 1 + l n P (y i) - λ = 0...... (2)$ $\frac{\mathrm{d}L}{\mathrm{d}P(y_i)} = -lnP_{\theta}(z,y) +1+lnP(y_i)-\lambda =0......(2)$
To simplify:
$λ = 1 - l n (P ( y i ) ) P ( z , y | θ )) . . . . . . (3)$ $\lambda=1-ln(\frac{P(y_i))}{P(z,y|\theta)})......(3)$
$P (y i) = e x p (1 - λ) P (z, y | θ) . . . . . . (4)$ $P(y_i)=exp(1-\lambda)P(z,y|\theta)......(4)$
$i = 1, 2, 3, . . ., n$ $i = 1,2,3,...,n$
From (4), it follows that $P(y)$ must be proportional to $P_{\theta}(z,y)=P(y,z|\theta)$ . Also we notice that $\sum_y P(y)= 1$
Summing (4) W.R.T $y_i$ , we can see:
$1 = \sum y P (y) = e x p (1 - λ) \sum y P (y, z | θ) . . . . . . (5)$ $1=\sum_yP(y)=exp(1-\lambda)\sum_yP(y,z|\theta)......(5)$
$\sum y P (y, z | θ) = P (z | θ) = 1 e x p ( 1 - λ ) . . . . . . (6)$ $\sum_yP(y,z|\theta)=P(z|\theta)=\frac{1}{exp(1-\lambda)}......(6)$
$e x p (1 - λ) = 1 P ( z | θ ) . . . . . . (7)$ $exp(1-\lambda)=\frac{1}{P(z|\theta)}......(7)$
Substitute (7) into (4):
$P (y i) = P ( z , y | θ ) P ( z | θ ) = P (y | z, θ) . . . . . . (8)$ $P(y_i)=\frac{P(z,y|\theta)}{P(z|\theta)}=P(y|z,\theta)......(8)$
$\square$

Ex. 8.3

Ex. 8.4

Ex. 8.5

Ex. 8.6

Ex. 8.7

Proof f(x) is non-decreasing under update (8.63)

From (8.62), we have

$f (x s + 1) \geq g (x s + 1, x s) \geq g (x s, x s) = f (x s) . . . . . . (1)$ $f(x^{s+1}) \ge g(x^{s+1}, x^s)\ge g(x^s, x^s) = f(x^s) ......(1)$
$\square$

Proof EM algorithm (Sec. 8.5.2) is an example of an EM algorithms

This exercise need us to show following:

$Q (θ', θ) + l o g (P r (Z | θ)) - Q (θ, θ) \leq l o g (θ', Z) . . . . . . (2)$ $Q(\theta', \theta) + log(Pr(\bf{Z}|\theta))-Q(\theta, \theta) \le log(\theta', \bf{Z} )......(2)$
On one hand, from (8.46), we can denote that:
$l o g (P r (Z | θ)) = Q (θ, θ) - R (θ, θ) . . . . . . (3)$ $log(Pr(\bf{Z}|\theta)) = Q(\theta, \theta)-R(\theta, \theta)......(3)$
Hence, the left hand side (l.h.s) of equation (2) can be simplified as:
$Q (θ', θ) + Q (θ, θ) - R (θ, θ) - Q (θ, θ) = (θ', θ) - R (θ', θ) . . . . . . (4)$ $Q(\theta', \theta) +Q(\theta, \theta)-R(\theta, \theta)-Q(\theta, \theta) = (\theta', \theta) - R(\theta', \theta)......(4)$
On the other hand, also from (8.46), the r.h.s of (2) can be written as:
$l o g (θ', Z) = Q (θ', θ) - R (θ', θ) . . . . . . (5)$ $log(\theta', \bf{Z}) = Q(\theta', \theta) -R(\theta', \theta)......(5)$
From Ex. 8.1, we see:
$R (θ, θ) \geq R (θ', θ) . . . . . . . (6)$ $R(\theta, \theta) \ge R(\theta', \theta).......(6)$
$- R (θ, θ) \leq - R (θ', θ) . . . . . . . (7)$ $-R(\theta, \theta) \le -R(\theta', \theta).......(7)$
$Q (θ', θ) - R (θ, θ) \leq Q (θ', θ) - R (θ', θ) . . . . . . . (8)$ $Q(\theta', \theta)-R(\theta, \theta) \le Q(\theta', \theta) -R(\theta', \theta).......(8)$
Finally, we get (4) $\le$ (5) to finish our demonstration.
$\square$

Reference

[1] Neal, Radford M., and G. E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models. Springer Netherlands, 2000:355-368.

Ex 15 Random Forest

15.1

对于bagging我们有B个trees，这些trees是identical distributed 而不是，i.i.d (independent identical distributed data). 这里的trees是correlated, $\rho = \frac{Var(x_ix_j)}{\sigma^2}$ ，因为 $x_i$ 是同分布的，所以其 $\sigma$ 是identical.

$y = 1 B \sum i = 1 B x i E [y] = E x V a r [y] = E [y 2] - E 2 [x] = 1 B 2 E [\sum i = 1 B x 2 i + \sum i \neq j x i x j] - E 2 [x] = B B 2 E [x 2] + B 2 - B B 2 E i \neq j [x i x j] - E 2 [x] . . . . . . . . . . . . (1)$ $y = \frac{1}{B}\sum_{i=1}^{B}x_i\\ E[y] = Ex\\ Var[y] = E[y^2]-E^2[x] = \frac{1}{B^2}E[\sum_{i=1}^{B}x_i^2+\sum_{i\ne j}x_ix_j] - E^2[x]\\ = \frac{B}{B^2}E[x^2]+\frac{B^2-B}{B^2}E_{i\ne j}[x_ix_j] - E^2[x]............(1)$

1. $\rho=0$

$V a r [ ( x i - E [ x ] ) ( x j - E [ x ] ) ] σ 2 = E [ x i x j ] - E 2 [ x ] σ 2 = 0 \Rightarrow E i \neq j [x i x j] = E 2 [x] . . . . . . . . . . . . . . . . . . (2)$ $\frac{Var[(x_i-E[x])(x_j-E[x])]}{\sigma^2} = \frac{E[x_ix_j]-E^2[x]}{\sigma^2} = 0\\ ⇒ E_{i\ne j}[x_ix_j] = E^2[x]..................(2)$
(2) $\to$ (1):
$V a r [y] = 1 B (B E [ x 2 ] + ( B 2 - B ) E 2 [ x ] - B 2 E 2 [ x ] B 2) = 1 B σ 2$ $Var[y]=\frac{1}{B}(\frac{BE[x^2]+(B^2-B)E^2[x]-B^2E^2[x]}{B^2})=\frac{1}{B}\sigma^2$

2. $\rho > 0$

$E [x i x j] = σ 2 ρ + E 2 [x] V a r [y] = B B 2 E 2 [x] + B 2 - B B 2 (σ 2 ρ + E 2 [x]) - E 2 [x] V a r [y] = 1 B σ 2 + ρ σ 2 - 1 B ρ σ 2$ $E[x_ix_j]=\sigma^2\rho + E^2[x]\\ Var[y] = \frac{B}{B^2}E^2[x]+\frac{B^2-B}{B^2}(\sigma^2\rho+E^2[x])-E^2[x]\\ Var[y]=\frac{1}{B}\sigma^2+\rho\sigma^2-\frac{1}{B}\rho\sigma^2\\$
$\square$

确定要放弃本次机会？
福利倒计时
: :

立减 ¥
普通VIP年卡可用
立即使用

Tongust

关注关注

0
点赞

踩

4

收藏

觉得还不错? 一键收藏

0
评论

The solution on the Elements of Statistical Learning

统计学习基础第二版参考答案
复制链接

扫一扫

专栏目录

The Elements of Statistical Learning (Hastie, Tibshiran

禅与计算机程序设计艺术

08-19 1065

作者：禅与计算机程序设计艺术 1.简介 The Elements of Statistical Learning (简称 ESL)是一本机器学习方面的著作，作者是统计学习方面的杰出前辈，香农、克里姆林宫及加州大学洛杉矶分校的计算机科学系教授哈斯蒂尔·海斯菲尔德（）等。它对统计学习的研究一直伴

The Elements of Statistical Learning

04-23

Q-learning、策略梯度和深度强化学习（DQN、DPPO等）是强化学习中的重要概念，它们被广泛应用于游戏控制、机器人路径规划等领域。《统计学习要素》还讨论了模型选择、交叉验证、过拟合与欠拟合的识别，以及正则化...

参与评论您还未登录，请先登录后发表或查看评论

《The elements of statistical learning》

qq_21478755的博客

11-30 2741

《The elements of Statistical Learning》是由统计学界三位泰斗级人物Trevor Hastie, Robert Tibshirani和Jerome Friedman共同编著的。堪称统计学界《圣经》一般的存在。 Trevor Hastie，目前担任斯坦福大学的John A. Overdeck数学科学教授和统计教授。Hastie以对应用统计的贡献而著称，特别是在机器学习，数据挖掘和生物信息学领域。 Robert Tibshirani，于1996年获得COPSS总统奖。该奖项由

≪统计学习精要(The Elements of Statistical Learning)≫课堂笔记（一）

热门推荐

FE攻城狮

06-29 2万+

前两天微博上转出来的，复旦计算机学院的吴立德吴老师在开?统计学习精要(The Elements of Statistical Learning)?这门课，还在张江...大牛的课怎能错过，果断请假去蹭课...为了减轻心理压力，还拉了一帮同事一起去听，eBay浩浩荡荡的十几人杀过去好不壮观！总感觉我们的人有超过复旦本身学生的阵势，五六十人的教室坐的满满当当，壮观啊。这本书正好前阵子一直在看，所以才...

The Elements of Statistical Learning学习资源+中文版

ujn20161222的博客

04-18 7438

1.中文版 https://esl.hohoweiya.xyz 2.相关视频以及An Introduction to Statistical Learning with Applications in R https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-...

统计学习精要 (Elements of Statistical Learning ) 习题 2.1

秋海棠的歌声

12-23 614

统计学习精要（Elements of Statistical learning）习题2.1

The elements of statistical learning

最新发布

09-30

《统计学习》是统计学领域的一本重要著作，由Trevor Hastie、Robert Tibshirani和Jerome Friedman三位斯坦福大学统计学教授合著。这本书深入浅出地介绍了统计学习这一领域的核心概念和方法，特别强调了理解和应用而...

《The Elements of Statistical Learning》教材答案

10-06

《The Elements of Statistical Learning》是一本由Trevor Hastie / Robert Tibshiran著作，Springer出版的Hardcover图书，本书只是提供了机器学习、数据挖掘、模式识别等领域的统计学观点，所以还是建议继续阅读...

统计学习精要 (Elements of Statistical Learning ) 习题 2.2

秋海棠的歌声

12-24 830

统计学习精要 (Elements of Statistical Learning ) 习题 2.2

The Elements of Statistical Learning学习开篇

极度视界

11-30 4103

这学期开了机器学习讨论班，为了下学期的《机器学习与数据分析》课程做准备。先期讲解《The Elements of Statistical Learning》这本书，此书已经出到第二版。豆瓣截图：评价非常高本书主页：http://www-stat.stanford.edu/~tibs/ElemStatLearn/ Trevor Hastie, Robert Tibsh

The Elements of Statistical Learning solution manual

12-23

The Elements of Statistical Learning solution manual

The Elements of Statistical Learning 答案

07-17

统计学习精要答案

A Solution Manual and Notes for: The Elements of Statistical Learning ESL答案

02-13

ESL答案 Solution ESL, A Solution Manual and Notes for: The Elements of Statistical Learning，2017最新版

The Elements of Statistical Learning:Travor Hastie(2ed) 2018 中文+英文版+习题解

05-12

《统计学习基础-数据挖掘、推理与预测》 The Elements of Statistical Learning:Data Mining, Inference, and Prediction：Travor Hastie(2ed) 2018 中文+英文合并版第2版+习题答案

introduction to statistical learning with R

08-10

介绍如何用R语言进行大数据分析，监督学习以及无监督学习的实现

The Elements of Statistical Learning 读书总结（持续更新中）

durong123123123的专栏

02-23 2504

第一遍读书总结 2015.02.23 ：学长推荐的这本书，花了10天左右通读了一遍，由于之前看过点修炼“外功”的书和视频（programming collective intelligence、machine learning in action，以及coursera上stanford的PGM和NLP的视频，Ng的machine learning没能看下去，不太爱听Ng的发音），所以对这方面的

The Elements of Statistical Learning的笔记

sdulibh的专栏

01-19 8000

第二章最近在阅读这本elements of statistical learning，这本书的题目翻译成中文可以叫做统计学系基础。但是我认为这本书非常之不基础。需要读者具备比较扎实的数理统计和概率论的功底，对于统计推断，统计决策以及基本的机器学习的概念有所了解。我所阅读的是从学校图书馆借来的影印本，内容为本书第一版（2001）。据说第二版有700多页那么厚。这本书有中文翻译版，但是翻译作者可能

统计学习精要 (Elements of Statistical Learning ) 习题 2.4

秋海棠的歌声

12-26 1253

统计学习精要 (Elements of Statistical Learning ) 习题 2.4

The Elements of Statistical Learning （第4章）

8rfuz的博客

06-02 940

这本书的写法真不符合我口味。判别准则，判别分析，其实说的就是分类方法，分类公式。 LDA(Linear Discriminant Analysis) 从概率角度看待这个问题，可以认为分类问题就是知道了x值，求x属于哪一个类别的概率最大的问题，也就是求所有后验概率1,...,k中最大的那一个。然后比较这K个后验证概率的大小，找最大了。最简单的方法

the elements of statistical learning

03-16

"Elements of Statistical Learning" 是一本关于统计学习的教科书，由 Trevor Hastie, Robert Tibshirani 和 Jerome Friedman 撰写，重点介绍了统计学习中的理论、方法和算法。

“相关推荐”对你有帮助么？

非常没帮助

没帮助

一般

有帮助

非常有帮助

提交