# [HITML]哈工大2020秋机器学习Lab2实验报告

Gtihub仓库

2020年春季学期

Lab2 实验报告

 姓名 学号 班号 电子邮件 手机号码

# 2 实验要求及实验环境

## 2.1 实验要求

1. 可以手工生成两个分别类别数据（可以用高斯分布），验证你的算法。考察类条件分布不满足朴素贝叶斯假设，会得到什么样的结果。

2. 逻辑回归有广泛的用处，例如广告预测。可以到UCI网站上，找一实际数据加以测试。

## 2.2 实验环境

Windows 10, Python 3.8.5, Jupyter notebook

# 3 实验原理

π = P ( Y = 1 ) \pi = P(Y=1) 1 − π = P ( Y = 0 ) 1-\pi = P(Y=0)
P ( Y = 1 ∣ X ) = 1 1 + exp ⁡ ( ln ⁡ 1 − π π + ∑ i ln ⁡ P ( X i ∣ Y = 0 ) P ( X i ∣ Y = 1 ) ) P(Y=1|X)=\frac{1}{1+\exp(\ln \frac{1-\pi}{\pi}+\sum_i \ln \frac {P(X_i|Y=0)}{P(X_i|Y=1)})}\\

w 0 = ln ⁡ 1 − π π + ∑ i μ i 1 2 − μ i 0 2 2 σ i 2 w_0=\ln \frac{1-\pi}{\pi} + \sum_i\frac{\mu_{i1}^2-\mu_{i0}^2}{2\sigma_i^2} w i = μ i 0 − μ i 1 σ i 2 w_i=\frac{\mu_{i0}-\mu_{i1}}{\sigma_i^2} ，则有：
P ( Y = 1 ∣ X ) = 1 1 + exp ⁡ ( w 0 + ∑ i = 1 n w i X i ) P(Y=1|X)=\frac{1}{1+\exp{(w_0+\sum_{i=1}^n w_iX_i)}}

P ( Y = 0 ∣ X , W ) = 1 1 + exp ⁡ ( w 0 + ∑ i = 1 n w i X i ) P(Y=0|X,W)=\frac{1}{1+\exp{(w_0+\sum_{i=1}^n w_iX_i)}}

l ( W ) = ∑ l Y l ln ⁡   P ( Y l = 1 ∣ X l , W ) + ( 1 − Y l ) ln ⁡   P ( Y l = 0 ∣ X l , W ) = ∑ l Y l ln ⁡   P ( Y l = 1 ∣ X l , W ) P ( Y l = 0 ∣ X l , W ) − ln ⁡   P ( Y l = 0 ∣ X l , W ) = ∑ l Y l ( w 0 + ∑ i = 1 n w i X i ) − ln ⁡ ( 1 + e x p ( w 0 + ∑ i = 1 n w i X i ) (1) l(W)=\sum_lY^l\ln\,P(Y^l=1|X^l,W) + (1-Y^l)\ln\,P(Y^l=0|X^l,W)\\ =\sum_l Y^l\ln\,\frac{P(Y^l=1|X^l,W)}{P(Y^l=0|X^l,W)} - \ln\,P(Y^l=0|X^l,W)\quad\\ =\sum_l Y^l(w_0+\sum_{i=1}^n w_iX_i) - \ln (1+exp(w_0+\sum_{i=1}^n w_iX_i) \tag{1}

∂   l ( W ) ∂   w i = ∑ l   X i l ( Y l − 1 1 + exp ⁡ ( w 0 + ∑ i = 1 n w i X i l ) ) w i = w i − η ∑ l   X i l ( Y l − 1 1 + exp ⁡ ( w 0 + ∑ i = 1 n w i X i l ) ) (2) \frac {\partial\,l(W)} {\partial\, w_i} = \sum_l\,X_i^l(Y^l-\frac{1}{1+\exp{(w_0+\sum_{i=1}^n w_iX_i^l)}}) \tag{2}\\ w_i = w_i - \eta\sum_l\,X_i^l(Y^l-\frac{1}{1+\exp{(w_0+\sum_{i=1}^n w_iX_i^l)}})

W = W − η ∑ l X l ( Y l − 1 1 + exp ⁡ ( W X l ) ) \pmb W=\pmb W-\eta \sum_l \pmb X^l(\pmb Y^l-\frac{1}{1+\exp(\pmb {WX^l})})

w i = w i − η λ w i − η ∑ l   X i l ( Y l − 1 1 + exp ⁡ ( w 0 + ∑ i = 1 n w i X i l ) ) w_i = w_i - \eta\lambda w_i - \eta\sum_l\,X_i^l(Y^l-\frac{1}{1+\exp{(w_0+\sum_{i=1}^n w_iX_i^l)}})

W = W − η λ W − η ∑ l X l ( Y l − 1 1 + exp ⁡ ( W X l ) ) \pmb W=\pmb W - \eta\lambda \pmb W - \eta \sum_l \pmb X^l(\pmb Y^l-\frac{1}{1+\exp(\pmb {WX^l})})

l ( W ) = 1 l ∑ l Y l ( w 0 + ∑ i = 1 n w i X i ) − ln ⁡ ( 1 + e x p ( w 0 + ∑ i = 1 n w i X i ) ∂   l ( W ) ∂   w i = 1 l ∑ l   X i l ( Y l − 1 1 + exp ⁡ ( w 0 + ∑ i = 1 n w i X i l ) ) w i = w i − η l ∑ l   X i l ( Y l − 1 1 + exp ⁡ ( w 0 + ∑ i = 1 n w i X i l ) ) W = W − η l ∑ l X l ( Y l − 1 1 + exp ⁡ ( W X l ) ) l(W)=\frac{1}{l}\sum_l Y^l(w_0+\sum_{i=1}^n w_iX_i) - \ln (1+exp(w_0+\sum_{i=1}^n w_iX_i)\\ \frac {\partial\,l(W)} {\partial\, w_i} = \frac{1}{l}\sum_l\,X_i^l(Y^l-\frac{1}{1+\exp{(w_0+\sum_{i=1}^n w_iX_i^l)}})\\ w_i = w_i - \frac{\eta}{l}\sum_l\,X_i^l(Y^l-\frac{1}{1+\exp{(w_0+\sum_{i=1}^n w_iX_i^l)}})\\ \pmb W=\pmb W-\frac{\eta}{l} \sum_l \pmb X^l(\pmb Y^l-\frac{1}{1+\exp(\pmb {WX^l})})

w i = w i − η λ w i − η l ∑ l   X i l ( Y l − 1 1 + exp ⁡ ( w 0 + ∑ i = 1 n w i X i l ) ) W = W − η λ W − η l ∑ l X l ( Y l − 1 1 + exp ⁡ ( W X l ) ) w_i = w_i - \eta\lambda w_i - \frac\eta l\sum_l\,X_i^l(Y^l-\frac{1}{1+\exp{(w_0+\sum_{i=1}^n w_iX_i^l)}})\\ \pmb W=\pmb W - \eta\lambda \pmb W - \frac\eta l \sum_l \pmb X^l(\pmb Y^l-\frac{1}{1+\exp(\pmb {WX^l})})

def loss(train_x, train_y, w, lamda):
"""
利用极大条件似然得到loss，并对loss做归一化
"""
size = train_x.shape[0]
W_dot_X = np.zeros((size, 1))
ln_part = 0
for i in range(size):
W_dot_X[i] = w @ train_x[i].T
ln_part += np.log(1 + np.exp(W_dot_X[i]))
loss_mcle = train_y @ W_dot_X - ln_part
return -loss_mcle / size

def gradient_descent(train_x, train_y, lamda, eta, epsilon, times=100000):
"""
梯度下降
:papam eta: 步长
:param epsilon: 精度，算法终止距离
:param times: 最大迭代次数
:return 决策面方程系数数组和w
"""
size = train_x.shape[0]
dimension = train_x.shape[1]
X = np.ones((size, dimension + 1)) # 构造X矩阵，第一维都设置成1，方便与w相乘
X[:, 1:dimension+1] = train_x
w = np.ones((1, X.shape[1]))
new_loss = loss(X, train_y, w, lamda)
for i in range(times):
old_loss = new_loss
t = np.zeros((size, 1))
for j in range(size):
t[j] = w @ X[j].T
gradient_w = - (train_y - sigmoid(t.T)) @ X / size
old_w = w
w = w - eta * lamda * w - eta * gradient_w
new_loss = loss(X, train_y, w, lamda)
if old_loss < new_loss: #不下降了，说明步长过大
w = old_w
eta /= 2
continue
if old_loss - new_loss < epsilon:
break
print(i)
w = w.reshape(dimension+1) # 得到的w是一个矩阵, 需要先改成行向量
coefficient = -(w / w[dimension])[0:dimension] # 对w做归一化得到方程系数
return coefficient, w


# 4 实验结果分析

## 4.1 生成数据

### 4.1.1 有惩罚，满足朴素贝叶斯假设

Discriminant function: y = − 1.668 x − 0.02619 y = -1.668 x - 0.02619

• Train data accuracy: 1.0

• Test data accuracy: 0.9675

### 4.1.2 有惩罚，不满足朴素贝叶斯假设

Discriminant function: y = − 2.299 x − 0.0379 y = -2.299 x - 0.0379

• Train data accuracy: 0.97

• Test data accuracy: 0.9475

### 4.1.3 无惩罚，满足朴素贝叶斯假设

Discriminant function: y = − 2.128 x + 0.04361 y = -2.128 x + 0.04361

• Train data accuracy: 0.96

• Test data accuracy: 0.9625

### 4.1.4无惩罚，不满足朴素贝叶斯假设

Discriminant function: y = − 1.67 x − 0.1875 y = -1.67 x - 0.1875

• Train data accuracy: 0.97

• Test data accuracy: 0.9525

## 4.2 UCI数据集

### 4.2.1 Skin Segmentation Data Set

• Train data accuracy: 0.946

• Test data accuracy: 0.9348471873373843

### 4.2.2 Haberman’s Survival Data Set

• Train data accuracy: 0.7582417582417582

• Test data accuracy: 0.7581395348837209

# 5 结论

1. 逻辑回归并没有对数据的分布进行建模，也就是说，逻辑回归模型并不知道数据的具体分布，而是直接根据已有的数据求解分类超平面。
2. 逻辑回归可以很好地解决线性分类问题，而且收敛速度较快，在真实的数据集上往往只需要数百次的迭代就能得到结果。
3. 正则项在数据量较大时，对结果的影响不大。在数据量较小时，可以有效解决过拟合问题。
4. 从结果中可以看出，类条件分布在满足朴素贝叶斯假设时的分类表现略好于不满足朴素贝叶斯假设时。
09-11

05-31
04-22
10-30 8万+
09-11 6636
09-19 2万+