1
• [决策树] 基于信息增益,对下述数据集进行决策树构建,描述过程
一个关于配眼镜的一个决策分类所需要的数据,数据集包含4属性:
age
astigmatism
trear-prod-rate为输入特征,
contact-lenses为决策属性。
第一特征
我们可以考虑以下公式
G
(
D
,
a
)
=
H
(
D
)
−
∑
v
=
1
V
∣
D
v
∣
D
H
(
D
v
)
G(D,a)=H(D)-\sum^V_{v=1}\frac{|D^v|}{D}H(D^v)
G(D,a)=H(D)−v=1∑VD∣Dv∣H(Dv)
H ( D ) H(D) H(D)在数据确定的时候已经定下来了,所以我们只需要考虑后半部分 ∑ v = 1 V ∣ D v ∣ D \sum^V_{v=1}\frac{|D^v|}{D} ∑v=1VD∣Dv∣
先考虑三个特征值
- 针对年龄
特征值 | soft | hard | none | sum |
---|---|---|---|---|
young | 1 | 1 | 1 | 3 |
pre-prebyopic | 1 | 1 | 3 | 5 |
prebyopic | 0 | 1 | 3 | 4 |
通过公式不难得到
a
g
e
=
−
[
3
12
(
1
3
l
o
g
2
1
3
+
1
3
l
o
g
2
1
3
+
1
3
l
o
g
2
1
3
)
+
5
12
(
1
5
l
o
g
2
1
5
+
1
5
l
o
g
2
1
5
+
3
5
l
o
g
2
3
5
)
+
4
12
(
1
4
l
o
g
2
1
4
+
3
4
l
o
g
2
3
4
)
]
=
1.238
\begin{aligned}age = &-[\frac{3}{12}(\frac{1}{3}log_2\frac{1}{3}+\frac{1}{3}log_2\frac{1}{3}+\frac{1}{3}log_2\frac{1}{3})\\ &+\frac{5}{12}(\frac{1}{5}log_2\frac{1}{5}+\frac{1}{5}log_2\frac{1}{5}+\frac{3}{5}log_2\frac{3}{5})\\&+\frac{4}{12}(\frac{1}{4}log_2\frac{1}{4}+\frac{3}{4}log_2\frac{3}{4})] = 1.238\end{aligned}
age=−[123(31log231+31log231+31log231)+125(51log251+51log251+53log253)+124(41log241+43log243)]=1.238
- 针对散光
特征值 | soft | hard | none | sum |
---|---|---|---|---|
yes | 0 | 3 | 4 | 7 |
no | 1 | 1 | 3 | 5 |
代入公式
a s t i g m a t i s m = 0.979 astigmatism = 0.979 astigmatism=0.979
- 泪液生成率
特征值 | soft | hard | none | sum |
---|---|---|---|---|
reduced | 0 | 0 | 4 | 4 |
normal | 2 | 3 | 3 | 8 |
代入公式
t e a r _ p r o d u c t i o n _ r a t e = 1.041 tear\_production\_rate = 1.041 tear_production_rate=1.041
所以我们首先取astigmatism可以让函数最大
第二特征
然后再考虑剩下的特征
首先基于Yes情况下的输入特征
特征值 | soft | hard | none | sum |
---|---|---|---|---|
young | 0 | 1 | 1 | 2 |
pre-prebyopic | 0 | 1 | 2 | 3 |
prebyopic | 0 | 1 | 1 | 2 |
reduced | 0 | 0 | 2 | 2 |
normal | 0 | 3 | 2 | 5 |
a
g
e
=
−
[
2
7
(
1
2
l
o
g
2
1
2
+
1
2
l
o
g
2
1
2
)
+
3
7
(
1
3
l
o
g
2
1
3
+
1
3
l
o
g
2
1
3
+
1
3
l
o
g
2
1
3
)
+
2
7
(
1
2
l
o
g
2
1
2
+
1
2
l
o
g
2
1
2
)
]
=
0.965
\begin{aligned}age= &-[\frac{2}{7}(\frac{1}{2}log_2\frac{1}{2}+\frac{1}{2}log_2\frac{1}{2}) \\ & +\frac{3}{7}(\frac{1}{3}log_2\frac{1}{3}+\frac{1}{3}log_2\frac{1}{3}+\frac{1}{3}log_2\frac{1}{3})\\ &+\frac{2}{7}(\frac{1}{2}log_2\frac{1}{2}+\frac{1}{2}log_2\frac{1}{2})] = 0.965\end{aligned}
age=−[72(21log221+21log221)+73(31log231+31log231+31log231)+72(21log221+21log221)]=0.965
t
e
a
r
_
p
r
o
d
u
c
t
i
o
n
_
r
a
t
e
=
0.694
tear\_production\_rate = 0.694
tear_production_rate=0.694
**取yes的时候选tear
**
基于No的情况
特征值 | soft | hard | none | sum |
---|---|---|---|---|
young | 1 | 0 | 0 | 1 |
pre-prebyopic | 1 | 0 | 1 | 2 |
prebyopic | 0 | 0 | 2 | 2 |
reduced | 0 | 0 | 2 | 2 |
normal | 2 | 0 | 1 | 3 |
a
g
e
=
0.4
age = 0.4
age=0.4
t
e
a
r
=
0.551
tear=0.551
tear=0.551
取no的时候应选择age
可以得到如下的决策树
2.
[线性分类] 推导下述logit function和logistic function等价:
p
(
X
)
=
e
β
0
+
β
1
X
1
+
e
β
0
+
β
1
X
p
(
X
)
1
−
p
(
X
)
=
e
β
0
+
β
1
X
p(X)=\frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}}\quad \frac{p(X)}{1-p(X)}=e^{\beta_0+\beta_1X}
p(X)=1+eβ0+β1Xeβ0+β1X1−p(X)p(X)=eβ0+β1X
换元,令
f
(
X
)
=
p
(
X
)
1
−
p
(
X
)
,
f
(
X
)
1
−
f
(
X
)
=
p
(
X
)
f(X)=\frac{p(X)}{1-p(X)}, \frac{f(X)}{1-f(X)}=p(X)
f(X)=1−p(X)p(X),1−f(X)f(X)=p(X)
p ( X ) 1 − p ( X ) = f ( X ) = e β 0 + β 1 X 1 + e β 0 + β 1 X 1 − e β 0 + β 1 X 1 + e β 0 + β 1 X = e β 0 + β 1 X 1 + e β 0 + β 1 X − ( e β 0 + β 1 X ) = e β 0 + β 1 X = f ( X ) 1 − f ( X ) = p ( X ) \left.\begin{aligned} \frac{p(X)}{1-p(X)}=f(X)& =\frac{\frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}}} {1- \frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X}}}\\ \\ & =\frac{e^{\beta_0+\beta_1X}}{1+e^{\beta_0+\beta_1X} -(e^{\beta_0+\beta_1X}) }\\ &=e^{\beta_0+\beta_1X}\\ &=\frac{f(X)}{1-f(X)} = p(X) \end{aligned}\right. 1−p(X)p(X)=f(X)=1−1+eβ0+β1Xeβ0+β1X1+eβ0+β1Xeβ0+β1X=1+eβ0+β1X−(eβ0+β1X)eβ0+β1X=eβ0+β1X=1−f(X)f(X)=p(X)
综上等价