线性回归
拟合函数
误差
符合高斯分布(正态分布)
真实值
似然函数
1.根据样本推导参数
2.根据样本估计参数可能性的函数
3.值越大越好
在这里插入图片描述
推荐地址(似然函数推导)
添加链接描述
梯度下降实现逻辑回归
梯度下降:
1. 批量梯度下降
2. 随机梯度下降
3. 小批量梯度下降(常用)
学习率(步长):
常用的调参的参数
逻辑回归
- 二分类算法
- 数据集可以是非线性的
- 也可以解决多分类的
- 一般解决分类问题,可以先做逻辑回归,再使用其他模型
Sigmoid函数
- 值 --> 概率
g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+e−z1
Softmax(多分类)
代码示例:
Logistic Regression
The data
我们将建立一个逻辑回归模型来预测一个学生是否被大学录取。假设你是一个大学系的管理员,你想根据两次考试的结果来决定每个申请人的录取机会。你有以前的申请人的历史数据,你可以用它作为逻辑回归的训练集。对于每一个培训例子,你有两个考试的申请人的分数和录取决定。为了做到这一点,我们将建立一个分类模型,根据考试成绩估计入学概率。
#三大件
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import os
print(os.sep)
path = 'data' + os.sep +'data48267/' +'LogiReg_data.txt'
pdData = pd.read_csv(path, header=None, names=['Exam 1', 'Exam 2', 'Admitted'])
pdData.head()
/
Exam 1 | Exam 2 | Admitted | |
---|---|---|---|
0 | 34.623660 | 78.024693 | 0 |
1 | 30.286711 | 43.894998 | 0 |
2 | 35.847409 | 72.902198 | 0 |
3 | 60.182599 | 86.308552 | 1 |
4 | 79.032736 | 75.344376 | 1 |
pdData.shape
(100, 3)
positive = pdData[pdData['Admitted'] == 1] # returns the subset of rows such Admitted = 1, i.e. the set of *positive* examples
negative = pdData[pdData['Admitted'] == 0] # returns the subset of rows such Admitted = 0, i.e. the set of *negative* examples
positive
Exam 1 | Exam 2 | Admitted | |
---|---|---|---|
3 | 60.182599 | 86.308552 | 1 |
4 | 79.032736 | 75.344376 | 1 |
6 | 61.106665 | 96.511426 | 1 |
7 | 75.024746 | 46.554014 | 1 |
8 | 76.098787 | 87.420570 | 1 |
9 | 84.432820 | 43.533393 | 1 |
12 | 82.307053 | 76.481963 | 1 |
13 | 69.364589 | 97.718692 | 1 |
15 | 53.971052 | 89.207350 | 1 |
16 | 69.070144 | 52.740470 | 1 |
18 | 70.661510 | 92.927138 | 1 |
19 | 76.978784 | 47.575964 | 1 |
21 | 89.676776 | 65.799366 | 1 |
24 | 77.924091 | 68.972360 | 1 |
25 | 62.271014 | 69.954458 | 1 |
26 | 80.190181 | 44.821629 | 1 |
30 | 61.379289 | 72.807887 | 1 |
31 | 85.404519 | 57.051984 | 1 |
33 | 52.045405 | 69.432860 | 1 |
37 | 64.176989 | 80.908061 | 1 |
40 | 83.902394 | 56.308046 | 1 |
42 | 94.443368 | 65.568922 | 1 |
46 | 77.193035 | 70.458200 | 1 |
47 | 97.771599 | 86.727822 | 1 |
48 | 62.073064 | 96.768824 | 1 |
49 | 91.564974 | 88.696293 | 1 |
50 | 79.944818 | 74.163119 | 1 |
51 | 99.272527 | 60.999031 | 1 |
52 | 90.546714 | 43.390602 | 1 |
56 | 97.645634 | 68.861573 | 1 |
58 | 74.248691 | 69.824571 | 1 |
59 | 71.796462 | 78.453562 | 1 |
60 | 75.395611 | 85.759937 | 1 |
66 | 40.457551 | 97.535185 | 1 |
68 | 80.279574 | 92.116061 | 1 |
69 | 66.746719 | 60.991394 | 1 |
71 | 64.039320 | 78.031688 | 1 |
72 | 72.346494 | 96.227593 | 1 |
73 | 60.457886 | 73.094998 | 1 |
74 | 58.840956 | 75.858448 | 1 |
75 | 99.827858 | 72.369252 | 1 |
76 | 47.264269 | 88.475865 | 1 |
77 | 50.458160 | 75.809860 | 1 |
80 | 88.913896 | 69.803789 | 1 |
81 | 94.834507 | 45.694307 | 1 |
82 | 67.319257 | 66.589353 | 1 |
83 | 57.238706 | 59.514282 | 1 |
84 | 80.366756 | 90.960148 | 1 |
85 | 68.468522 | 85.594307 | 1 |
87 | 75.477702 | 90.424539 | 1 |
88 | 78.635424 | 96.647427 | 1 |
90 | 94.094331 | 77.159105 | 1 |
91 | 90.448551 | 87.508792 | 1 |
93 | 74.492692 | 84.845137 | 1 |
94 | 89.845807 | 45.358284 | 1 |
95 | 83.489163 | 48.380286 | 1 |
96 | 42.261701 | 87.103851 | 1 |
97 | 99.315009 | 68.775409 | 1 |
98 | 55.340018 | 64.931938 | 1 |
99 | 74.775893 | 89.529813 | 1 |
negative
Exam 1 | Exam 2 | Admitted | |
---|---|---|---|
0 | 34.623660 | 78.024693 | 0 |
1 | 30.286711 | 43.894998 | 0 |
2 | 35.847409 | 72.902198 | 0 |
5 | 45.083277 | 56.316372 | 0 |
10 | 95.861555 | 38.225278 | 0 |
11 | 75.013658 | 30.603263 | 0 |
14 | 39.538339 | 76.036811 | 0 |
17 | 67.946855 | 46.678574 | 0 |
20 | 67.372028 | 42.838438 | 0 |
22 | 50.534788 | 48.855812 | 0 |
23 | 34.212061 | 44.209529 | 0 |
27 | 93.114389 | 38.800670 | 0 |
28 | 61.830206 | 50.256108 | 0 |
29 | 38.785804 | 64.995681 | 0 |
32 | 52.107980 | 63.127624 | 0 |
34 | 40.236894 | 71.167748 | 0 |
35 | 54.635106 | 52.213886 | 0 |
36 | 33.915500 | 98.869436 | 0 |
38 | 74.789253 | 41.573415 | 0 |
39 | 34.183640 | 75.237720 | 0 |
41 | 51.547720 | 46.856290 | 0 |
43 | 82.368754 | 40.618255 | 0 |
44 | 51.047752 | 45.822701 | 0 |
45 | 62.222676 | 52.060992 | 0 |
53 | 34.524514 | 60.396342 | 0 |
54 | 50.286496 | 49.804539 | 0 |
55 | 49.586677 | 59.808951 | 0 |
57 | 32.577200 | 95.598548 | 0 |
61 | 35.286113 | 47.020514 | 0 |
62 | 56.253817 | 39.261473 | 0 |
63 | 30.058822 | 49.592974 | 0 |
64 | 44.668262 | 66.450086 | 0 |
65 | 66.560894 | 41.092098 | 0 |
67 | 49.072563 | 51.883212 | 0 |
70 | 32.722833 | 43.307173 | 0 |
78 | 60.455556 | 42.508409 | 0 |
79 | 82.226662 | 42.719879 | 0 |
86 | 42.075455 | 78.844786 | 0 |
89 | 52.348004 | 60.769505 | 0 |
92 | 55.482161 | 35.570703 | 0 |
fig, ax = plt.subplots(figsize=(10,5))
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=30, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=30, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
Text(0,0.5,'Exam 2 Score')
The logistic regression
目标:建立分类器(求解出三个参数 $\theta_0 \theta_1 \theta_2 $)
设定阈值,根据阈值判断录取结果
要完成的模块
-
sigmoid
: 映射到概率的函数 -
model
: 返回预测结果值 -
cost
: 根据参数计算损失 -
gradient
: 计算每个参数的梯度方向 -
descent
: 进行参数更新 -
accuracy
: 计算精度
sigmoid
函数
g ( z ) = 1 1 + e − z g(z) = \frac{1}{1+e^{-z}} g(z)=1+e−z1
def sigmoid(z):
return 1 / (1 + np.exp(-z))
nums = np.arange(-10, 10, step=1) #creates a vector containing 20 equally spaced values from -10 to 10
fig, ax = plt.subplots(figsize=(12,4))
ax.plot(nums, sigmoid(nums), 'r')
[<matplotlib.lines.Line2D at 0x7f1064254c10>]
Sigmoid
- g : R → [ 0 , 1 ] g:\mathbb{R} \to [0,1] g:R→[0,1]
- g ( 0 ) = 0.5 g(0)=0.5 g(0)=0.5
- g ( − ∞ ) = 0 g(- \infty)=0 g(−∞)=0
- g ( + ∞ ) = 1 g(+ \infty)=1 g(+∞)=1
def model(X, theta):
return sigmoid(np.dot(X, theta.T))
( θ 0 θ 1 θ 2 ) × ( 1 x 1 x 2 ) = θ 0 + θ 1 x 1 + θ 2 x 2 \begin{array}{ccc} \begin{pmatrix}\theta_{0} & \theta_{1} & \theta_{2}\end{pmatrix} & \times & \begin{pmatrix}1\\ x_{1}\\ x_{2} \end{pmatrix}\end{array}=\theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2} (θ0θ1θ2)×⎝⎛1x1x2⎠⎞=θ0+θ1x1+θ2x2
print(pdData.head())
Exam 1 Exam 2 Admitted
0 34.623660 78.024693 0
1 30.286711 43.894998 0
2 35.847409 72.902198 0
3 60.182599 86.308552 1
4 79.032736 75.344376 1
pdData.insert(0, 'Ones', 1) # in a try / except structure so as not to return an error if the block si executed several times
print(pdData.head())
Ones Exam 1 Exam 2 Admitted
0 1 34.623660 78.024693 0
1 1 30.286711 43.894998 0
2 1 35.847409 72.902198 0
3 1 60.182599 86.308552 1
4 1 79.032736 75.344376 1
# set X (training data) and y (target variable)
# 将表格转换为矩阵
orig_data = pdData.as_matrix() # convert the Pandas representation of the data to an array useful for further computations
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel_launcher.py:3: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
This is separate from the ipykernel package so we can avoid doing imports until
orig_data
array([[ 1. , 34.62365962, 78.02469282, 0. ],
[ 1. , 30.28671077, 43.89499752, 0. ],
[ 1. , 35.84740877, 72.90219803, 0. ],
[ 1. , 60.18259939, 86.3085521 , 1. ],
[ 1. , 79.03273605, 75.34437644, 1. ],
[ 1. , 45.08327748, 56.31637178, 0. ],
[ 1. , 61.10666454, 96.51142588, 1. ],
[ 1. , 75.02474557, 46.55401354, 1. ],
[ 1. , 76.0987867 , 87.42056972, 1. ],
[ 1. , 84.43281996, 43.53339331, 1. ],
[ 1. , 95.86155507, 38.22527806, 0. ],
[ 1. , 75.01365839, 30.60326323, 0. ],
[ 1. , 82.30705337, 76.4819633 , 1. ],
[ 1. , 69.36458876, 97.71869196, 1. ],
[ 1. , 39.53833914, 76.03681085, 0. ],
[ 1. , 53.97105215, 89.20735014, 1. ],
[ 1. , 69.07014406, 52.74046973, 1. ],
[ 1. , 67.94685548, 46.67857411, 0. ],
[ 1. , 70.66150955, 92.92713789, 1. ],
[ 1. , 76.97878373, 47.57596365, 1. ],
[ 1. , 67.37202755, 42.83843832, 0. ],
[ 1. , 89.67677575, 65.79936593, 1. ],
[ 1. , 50.53478829, 48.85581153, 0. ],
[ 1. , 34.21206098, 44.2095286 , 0. ],
[ 1. , 77.92409145, 68.97235999, 1. ],
[ 1. , 62.27101367, 69.95445795, 1. ],
[ 1. , 80.19018075, 44.82162893, 1. ],
[ 1. , 93.1143888 , 38.80067034, 0. ],
[ 1. , 61.83020602, 50.25610789, 0. ],
[ 1. , 38.7858038 , 64.99568096, 0. ],
[ 1. , 61.37928945, 72.80788731, 1. ],
[ 1. , 85.40451939, 57.05198398, 1. ],
[ 1. , 52.10797973, 63.12762377, 0. ],
[ 1. , 52.04540477, 69.43286012, 1. ],
[ 1. , 40.23689374, 71.16774802, 0. ],
[ 1. , 54.63510555, 52.21388588, 0. ],
[ 1. , 33.91550011, 98.86943574, 0. ],
[ 1. , 64.17698887, 80.90806059, 1. ],
[ 1. , 74.78925296, 41.57341523, 0. ],
[ 1. , 34.18364003, 75.23772034, 0. ],
[ 1. , 83.90239366, 56.30804622, 1. ],
[ 1. , 51.54772027, 46.85629026, 0. ],
[ 1. , 94.44336777, 65.56892161, 1. ],
[ 1. , 82.36875376, 40.61825516, 0. ],
[ 1. , 51.04775177, 45.82270146, 0. ],
[ 1. , 62.22267576, 52.06099195, 0. ],
[ 1. , 77.19303493, 70.4582 , 1. ],
[ 1. , 97.77159928, 86.72782233, 1. ],
[ 1. , 62.0730638 , 96.76882412, 1. ],
[ 1. , 91.5649745 , 88.69629255, 1. ],
[ 1. , 79.94481794, 74.16311935, 1. ],
[ 1. , 99.27252693, 60.999031 , 1. ],
[ 1. , 90.54671411, 43.39060181, 1. ],
[ 1. , 34.52451385, 60.39634246, 0. ],
[ 1. , 50.28649612, 49.80453881, 0. ],
[ 1. , 49.58667722, 59.80895099, 0. ],
[ 1. , 97.64563396, 68.86157272, 1. ],
[ 1. , 32.57720017, 95.59854761, 0. ],
[ 1. , 74.24869137, 69.82457123, 1. ],
[ 1. , 71.79646206, 78.45356225, 1. ],
[ 1. , 75.39561147, 85.75993667, 1. ],
[ 1. , 35.28611282, 47.02051395, 0. ],
[ 1. , 56.2538175 , 39.26147251, 0. ],
[ 1. , 30.05882245, 49.59297387, 0. ],
[ 1. , 44.66826172, 66.45008615, 0. ],
[ 1. , 66.56089447, 41.09209808, 0. ],
[ 1. , 40.45755098, 97.53518549, 1. ],
[ 1. , 49.07256322, 51.88321182, 0. ],
[ 1. , 80.27957401, 92.11606081, 1. ],
[ 1. , 66.74671857, 60.99139403, 1. ],
[ 1. , 32.72283304, 43.30717306, 0. ],
[ 1. , 64.03932042, 78.03168802, 1. ],
[ 1. , 72.34649423, 96.22759297, 1. ],
[ 1. , 60.45788574, 73.0949981 , 1. ],
[ 1. , 58.84095622, 75.85844831, 1. ],
[ 1. , 99.8278578 , 72.36925193, 1. ],
[ 1. , 47.26426911, 88.475865 , 1. ],
[ 1. , 50.4581598 , 75.80985953, 1. ],
[ 1. , 60.45555629, 42.50840944, 0. ],
[ 1. , 82.22666158, 42.71987854, 0. ],
[ 1. , 88.91389642, 69.8037889 , 1. ],
[ 1. , 94.83450672, 45.6943068 , 1. ],
[ 1. , 67.31925747, 66.58935318, 1. ],
[ 1. , 57.23870632, 59.51428198, 1. ],
[ 1. , 80.366756 , 90.9601479 , 1. ],
[ 1. , 68.46852179, 85.5943071 , 1. ],
[ 1. , 42.07545454, 78.844786 , 0. ],
[ 1. , 75.47770201, 90.424539 , 1. ],
[ 1. , 78.63542435, 96.64742717, 1. ],
[ 1. , 52.34800399, 60.76950526, 0. ],
[ 1. , 94.09433113, 77.15910509, 1. ],
[ 1. , 90.44855097, 87.50879176, 1. ],
[ 1. , 55.48216114, 35.57070347, 0. ],
[ 1. , 74.49269242, 84.84513685, 1. ],
[ 1. , 89.84580671, 45.35828361, 1. ],
[ 1. , 83.48916274, 48.3802858 , 1. ],
[ 1. , 42.26170081, 87.10385094, 1. ],
[ 1. , 99.31500881, 68.77540947, 1. ],
[ 1. , 55.34001756, 64.93193801, 1. ],
[ 1. , 74.775893 , 89.5298129 , 1. ]])
cols = orig_data.shape[1]
print(cols)
X = orig_data[:,0:cols-1]
y = orig_data[:,cols-1:cols]
print(X)
print('*'*15)
print(y)
# convert to numpy arrays and initalize the parameter array theta
#X = np.matrix(X.values)
#y = np.matrix(data.iloc[:,3:4].values) #np.array(y.values)
theta = np.zeros([1, 3])
theta
4
[[ 1. 34.62365962 78.02469282]
[ 1. 30.28671077 43.89499752]
[ 1. 35.84740877 72.90219803]
[ 1. 60.18259939 86.3085521 ]
[ 1. 79.03273605 75.34437644]
[ 1. 45.08327748 56.31637178]
[ 1. 61.10666454 96.51142588]
[ 1. 75.02474557 46.55401354]
[ 1. 76.0987867 87.42056972]
[ 1. 84.43281996 43.53339331]
[ 1. 95.86155507 38.22527806]
[ 1. 75.01365839 30.60326323]
[ 1. 82.30705337 76.4819633 ]
[ 1. 69.36458876 97.71869196]
[ 1. 39.53833914 76.03681085]
[ 1. 53.97105215 89.20735014]
[ 1. 69.07014406 52.74046973]
[ 1. 67.94685548 46.67857411]
[ 1. 70.66150955 92.92713789]
[ 1. 76.97878373 47.57596365]
[ 1. 67.37202755 42.83843832]
[ 1. 89.67677575 65.79936593]
[ 1. 50.53478829 48.85581153]
[ 1. 34.21206098 44.2095286 ]
[ 1. 77.92409145 68.97235999]
[ 1. 62.27101367 69.95445795]
[ 1. 80.19018075 44.82162893]
[ 1. 93.1143888 38.80067034]
[ 1. 61.83020602 50.25610789]
[ 1. 38.7858038 64.99568096]
[ 1. 61.37928945 72.80788731]
[ 1. 85.40451939 57.05198398]
[ 1. 52.10797973 63.12762377]
[ 1. 52.04540477 69.43286012]
[ 1. 40.23689374 71.16774802]
[ 1. 54.63510555 52.21388588]
[ 1. 33.91550011 98.86943574]
[ 1. 64.17698887 80.90806059]
[ 1. 74.78925296 41.57341523]
[ 1. 34.18364003 75.23772034]
[ 1. 83.90239366 56.30804622]
[ 1. 51.54772027 46.85629026]
[ 1. 94.44336777 65.56892161]
[ 1. 82.36875376 40.61825516]
[ 1. 51.04775177 45.82270146]
[ 1. 62.22267576 52.06099195]
[ 1. 77.19303493 70.4582 ]
[ 1. 97.77159928 86.72782233]
[ 1. 62.0730638 96.76882412]
[ 1. 91.5649745 88.69629255]
[ 1. 79.94481794 74.16311935]
[ 1. 99.27252693 60.999031 ]
[ 1. 90.54671411 43.39060181]
[ 1. 34.52451385 60.39634246]
[ 1. 50.28649612 49.80453881]
[ 1. 49.58667722 59.80895099]
[ 1. 97.64563396 68.86157272]
[ 1. 32.57720017 95.59854761]
[ 1. 74.24869137 69.82457123]
[ 1. 71.79646206 78.45356225]
[ 1. 75.39561147 85.75993667]
[ 1. 35.28611282 47.02051395]
[ 1. 56.2538175 39.26147251]
[ 1. 30.05882245 49.59297387]
[ 1. 44.66826172 66.45008615]
[ 1. 66.56089447 41.09209808]
[ 1. 40.45755098 97.53518549]
[ 1. 49.07256322 51.88321182]
[ 1. 80.27957401 92.11606081]
[ 1. 66.74671857 60.99139403]
[ 1. 32.72283304 43.30717306]
[ 1. 64.03932042 78.03168802]
[ 1. 72.34649423 96.22759297]
[ 1. 60.45788574 73.0949981 ]
[ 1. 58.84095622 75.85844831]
[ 1. 99.8278578 72.36925193]
[ 1. 47.26426911 88.475865 ]
[ 1. 50.4581598 75.80985953]
[ 1. 60.45555629 42.50840944]
[ 1. 82.22666158 42.71987854]
[ 1. 88.91389642 69.8037889 ]
[ 1. 94.83450672 45.6943068 ]
[ 1. 67.31925747 66.58935318]
[ 1. 57.23870632 59.51428198]
[ 1. 80.366756 90.9601479 ]
[ 1. 68.46852179 85.5943071 ]
[ 1. 42.07545454 78.844786 ]
[ 1. 75.47770201 90.424539 ]
[ 1. 78.63542435 96.64742717]
[ 1. 52.34800399 60.76950526]
[ 1. 94.09433113 77.15910509]
[ 1. 90.44855097 87.50879176]
[ 1. 55.48216114 35.57070347]
[ 1. 74.49269242 84.84513685]
[ 1. 89.84580671 45.35828361]
[ 1. 83.48916274 48.3802858 ]
[ 1. 42.26170081 87.10385094]
[ 1. 99.31500881 68.77540947]
[ 1. 55.34001756 64.93193801]
[ 1. 74.775893 89.5298129 ]]
***************
[[0.]
[0.]
[0.]
[1.]
[1.]
[0.]
[1.]
[1.]
[1.]
[1.]
[0.]
[0.]
[1.]
[1.]
[0.]
[1.]
[1.]
[0.]
[1.]
[1.]
[0.]
[1.]
[0.]
[0.]
[1.]
[1.]
[1.]
[0.]
[0.]
[0.]
[1.]
[1.]
[0.]
[1.]
[0.]
[0.]
[0.]
[1.]
[0.]
[0.]
[1.]
[0.]
[1.]
[0.]
[0.]
[0.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[1.]
[0.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[0.]
[0.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[0.]
[1.]
[1.]
[0.]
[1.]
[1.]
[0.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]]
array([[0., 0., 0.]])
X[:5]
array([[ 1. , 34.62365962, 78.02469282],
[ 1. , 30.28671077, 43.89499752],
[ 1. , 35.84740877, 72.90219803],
[ 1. , 60.18259939, 86.3085521 ],
[ 1. , 79.03273605, 75.34437644]])
y[:5]
array([[0.],
[0.],
[0.],
[1.],
[1.]])
theta
array([[0., 0., 0.]])
X.shape, y.shape, theta.shape
((100, 3), (100, 1), (1, 3))
损失函数
将对数似然函数去负号
D
(
h
θ
(
x
)
,
y
)
=
−
y
log
(
h
θ
(
x
)
)
−
(
1
−
y
)
log
(
1
−
h
θ
(
x
)
)
D(h_\theta(x), y) = -y\log(h_\theta(x)) - (1-y)\log(1-h_\theta(x))
D(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))
求平均损失
J
(
θ
)
=
1
n
∑
i
=
1
n
D
(
h
θ
(
x
i
)
,
y
i
)
J(\theta)=\frac{1}{n}\sum_{i=1}^{n} D(h_\theta(x_i), y_i)
J(θ)=n1i=1∑nD(hθ(xi),yi)
def cost(X, y, theta):
# np.multiply() 数组和矩阵对应位置相乘
left = np.multiply(-y, np.log(model(X, theta)))
print(left)
right = np.multiply(1 - y, np.log(1 - model(X, theta)))
print(right)
return np.sum(left - right) / (len(X))
cost(X, y, theta)
[[0. ]
[0. ]
[0. ]
[0.69314718]
[0.69314718]
[0. ]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0. ]
[0. ]
[0.69314718]
[0.69314718]
[0. ]
[0.69314718]
[0.69314718]
[0. ]
[0.69314718]
[0.69314718]
[0. ]
[0.69314718]
[0. ]
[0. ]
[0.69314718]
[0.69314718]
[0.69314718]
[0. ]
[0. ]
[0. ]
[0.69314718]
[0.69314718]
[0. ]
[0.69314718]
[0. ]
[0. ]
[0. ]
[0.69314718]
[0. ]
[0. ]
[0.69314718]
[0. ]
[0.69314718]
[0. ]
[0. ]
[0. ]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0. ]
[0. ]
[0. ]
[0.69314718]
[0. ]
[0.69314718]
[0.69314718]
[0.69314718]
[0. ]
[0. ]
[0. ]
[0. ]
[0. ]
[0.69314718]
[0. ]
[0.69314718]
[0.69314718]
[0. ]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0. ]
[0. ]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0. ]
[0.69314718]
[0.69314718]
[0. ]
[0.69314718]
[0.69314718]
[0. ]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]
[0.69314718]]
[[-0.69314718]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0. ]
[-0.69314718]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0. ]
[-0.69314718]
[-0. ]
[-0. ]
[-0.69314718]
[-0. ]
[-0. ]
[-0.69314718]
[-0. ]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0. ]
[-0. ]
[-0.69314718]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0. ]
[-0.69314718]
[-0. ]
[-0.69314718]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0.69314718]
[-0. ]
[-0.69314718]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0.69314718]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0.69314718]
[-0. ]
[-0. ]
[-0. ]
[-0.69314718]
[-0.69314718]
[-0.69314718]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0.69314718]
[-0. ]
[-0. ]
[-0.69314718]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0.69314718]
[-0.69314718]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0.69314718]
[-0. ]
[-0. ]
[-0.69314718]
[-0. ]
[-0. ]
[-0.69314718]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]
[-0. ]]
0.6931471805599453
计算梯度
∂ J ∂ θ j = − 1 m ∑ i = 1 n ( y i − h θ ( x i ) ) x i j \frac{\partial J}{\partial \theta_j}=-\frac{1}{m}\sum_{i=1}^n (y_i - h_\theta (x_i))x_{ij} ∂θj∂J=−m1i=1∑n(yi−hθ(xi))xij
def gradient(X, y, theta):
grad = np.zeros(theta.shape)
print('start')
print(grad.shape)
print(grad)
# ravel() 扁平化函数
# grad为梯度下降的参数(此处为w0,w1,w2)
error = (model(X, theta)- y).ravel()
for j in range(len(theta.ravel())): #for each parmeter
term = np.multiply(error, X[:,j])
grad[0, j] = np.sum(term) / len(X)
print('for grad')
print(grad)
print('grad:')
print(grad)
print('grad end'+'-'*20)
return grad
Gradient descent
比较3中不同梯度下降方法
STOP_ITER = 0
STOP_COST = 1
STOP_GRAD = 2
def stopCriterion(type, value, threshold):
#设定三种不同的停止策略
if type == STOP_ITER: return value > threshold
elif type == STOP_COST: return abs(value[-1]-value[-2]) < threshold
elif type == STOP_GRAD: return np.linalg.norm(value) < threshold
import numpy.random
#洗牌
def shuffleData(data):
np.random.shuffle(data)
cols = data.shape[1]
X = data[:, 0:cols-1]
y = data[:, cols-1:]
return X, y
import time
def descent(data, theta, batchSize, stopType, thresh, alpha):
#梯度下降求解
init_time = time.time()
i = 0 # 迭代次数
k = 0 # batch
X, y = shuffleData(data)
grad = np.zeros(theta.shape) # 计算的梯度
costs = [cost(X, y, theta)] # 损失值
while True:
grad = gradient(X[k:k+batchSize], y[k:k+batchSize], theta)
k += batchSize #取batch数量个数据
if k >= n:
k = 0
X, y = shuffleData(data) #重新洗牌
theta = theta - alpha*grad # 参数更新
costs.append(cost(X, y, theta)) # 计算新的损失
i += 1
if stopType == STOP_ITER: value = i
elif stopType == STOP_COST: value = costs
elif stopType == STOP_GRAD: value = grad
if stopCriterion(stopType, value, thresh): break
return theta, i-1, costs, grad, time.time() - init_time
def runExpe(data, theta, batchSize, stopType, thresh, alpha):
#import pdb; pdb.set_trace();
theta, iter, costs, grad, dur = descent(data, theta, batchSize, stopType, thresh, alpha)
name = "Original" if (data[:,1]>2).sum() > 1 else "Scaled"
name += " data - learning rate: {} - ".format(alpha)
if batchSize==n: strDescType = "Gradient"
elif batchSize==1: strDescType = "Stochastic"
else: strDescType = "Mini-batch ({})".format(batchSize)
name += strDescType + " descent - Stop: "
if stopType == STOP_ITER: strStop = "{} iterations".format(thresh)
elif stopType == STOP_COST: strStop = "costs change < {}".format(thresh)
else: strStop = "gradient norm < {}".format(thresh)
name += strStop
print ("***{}\nTheta: {} - Iter: {} - Last cost: {:03.2f} - Duration: {:03.2f}s".format(
name, theta, iter, costs[-1], dur))
fig, ax = plt.subplots(figsize=(12,4))
ax.plot(np.arange(len(costs)), costs, 'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title(name.upper() + ' - Error vs. Iteration')
return theta
不同的停止策略
设定迭代次数
#选择的梯度下降方法是基于所有样本的
n=100
runExpe(orig_data, theta, n, STOP_ITER, thresh=5000, alpha=0.000001)
grad--------------------
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tW5Z07Dj-1596542635969)(output_40_2.png)]
根据损失值停止
设定阈值 1E-6, 差不多需要110 000次迭代
runExpe(orig_data, theta, n, STOP_COST, thresh=0.000001, alpha=0.001)
***Original data - learning rate: 0.001 - Gradient descent - Stop: costs change < 1e-06
Theta: [[-5.13364014 0.04771429 0.04072397]] - Iter: 109901 - Last cost: 0.38 - Duration: 21.67s
array([[-5.13364014, 0.04771429, 0.04072397]])
根据梯度变化停止
设定阈值 0.05,差不多需要40 000次迭代
runExpe(orig_data, theta, n, STOP_GRAD, thresh=0.05, alpha=0.001)
***Original data - learning rate: 0.001 - Gradient descent - Stop: gradient norm < 0.05
Theta: [[-2.37033409 0.02721692 0.01899456]] - Iter: 40045 - Last cost: 0.49 - Duration: 8.06s
array([[-2.37033409, 0.02721692, 0.01899456]])
对比不同的梯度下降方法
Stochastic descent
runExpe(orig_data, theta, 1, STOP_ITER, thresh=5000, alpha=0.001)
***Original data - learning rate: 0.001 - Stochastic descent - Stop: 5000 iterations
Theta: [[-0.38651143 0.06743607 -0.07215581]] - Iter: 5000 - Last cost: 1.13 - Duration: 0.34s
array([[-0.38651143, 0.06743607, -0.07215581]])
有点爆炸。。。很不稳定,再来试试把学习率调小一些
runExpe(orig_data, theta, 1, STOP_ITER, thresh=15000, alpha=0.000002)
***Original data - learning rate: 2e-06 - Stochastic descent - Stop: 15000 iterations
Theta: [[-0.0020209 0.01004422 0.00097837]] - Iter: 15000 - Last cost: 0.63 - Duration: 1.00s
array([[-0.0020209 , 0.01004422, 0.00097837]])
速度快,但稳定性差,需要很小的学习率
Mini-batch descent
runExpe(orig_data, theta, 16, STOP_ITER, thresh=15000, alpha=0.001)
***Original data - learning rate: 0.001 - Mini-batch (16) descent - Stop: 15000 iterations
Theta: [[-1.03594432 0.02756836 0.00581629]] - Iter: 15000 - Last cost: 0.59 - Duration: 1.30s
array([[-1.03594432, 0.02756836, 0.00581629]])
浮动仍然比较大,我们来尝试下对数据进行标准化
将数据按其属性(按列进行)减去其均值,然后除以其方差。最后得到的结果是,对每个属性/每列来说所有数据都聚集在0附近,方差值为1
from sklearn import preprocessing as pp
scaled_data = orig_data.copy()
scaled_data[:, 1:3] = pp.scale(orig_data[:, 1:3])
runExpe(scaled_data, theta, n, STOP_ITER, thresh=5000, alpha=0.001)
***Scaled data - learning rate: 0.001 - Gradient descent - Stop: 5000 iterations
Theta: [[0.3080807 0.86494967 0.77367651]] - Iter: 5000 - Last cost: 0.38 - Duration: 0.83s
array([[0.3080807 , 0.86494967, 0.77367651]])
它好多了!原始数据,只能达到达到0.61,而我们得到了0.38个在这里!
所以对数据做预处理是非常重要的
runExpe(scaled_data, theta, n, STOP_GRAD, thresh=0.02, alpha=0.001)
***Scaled data - learning rate: 0.001 - Gradient descent - Stop: gradient norm < 0.02
Theta: [[1.0707921 2.63030842 2.41079787]] - Iter: 59422 - Last cost: 0.22 - Duration: 10.50s
array([[1.0707921 , 2.63030842, 2.41079787]])
更多的迭代次数会使得损失下降的更多!
theta = runExpe(scaled_data, theta, 1, STOP_GRAD, thresh=0.002/5, alpha=0.001)
***Scaled data - learning rate: 0.001 - Stochastic descent - Stop: gradient norm < 0.0004
Theta: [[1.14794829 2.79256769 2.56686015]] - Iter: 72622 - Last cost: 0.22 - Duration: 4.87s
随机梯度下降更快,但是我们需要迭代的次数也需要更多,所以还是用batch的比较合适!!!
runExpe(scaled_data, theta, 16, STOP_GRAD, thresh=0.002*2, alpha=0.001)
***Scaled data - learning rate: 0.001 - Mini-batch (16) descent - Stop: gradient norm < 0.004
Theta: [[1.1731359 2.84100721 2.60671868]] - Iter: 4185 - Last cost: 0.21 - Duration: 0.35s
array([[1.1731359 , 2.84100721, 2.60671868]])
精度
#设定阈值
def predict(X, theta):
return [1 if x >= 0.5 else 0 for x in model(X, theta)]
scaled_X = scaled_data[:, :3]
y = scaled_data[:, 3]
predictions = predict(scaled_X, theta)
correct = [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) in zip(predictions, y)]
accuracy = (sum(map(int, correct)) % len(correct))
print ('accuracy = {0}%'.format(accuracy))
accuracy = 89%