grad在python什么模块_机器学习算法整理(二)梯度下降求解逻辑回归 python实现...

逻辑回归(Logistic regression)

以下均为自己看视频做的笔记,自用,侵删!

还参考了:http://www.ai-start.com/ml2014/

817161-20180128132738756-1235339165.png

817161-20180128132802115-1523898543.png

817161-20180128132822897-1467271980.png

817161-20180128132856272-1308052064.png

817161-20180128132924225-1983594467.png

817161-20180128133128662-2121738706.png

用梯度下降求解逻辑回归 Logistic Regression

The data

我们将建立一个逻辑回归模型来预测一个学生是否被大学录取。假设你是一个大学系的管理员,你想根据两次考试的结果来决定每个申请人的录取机会。你有以前的申请人的历史数据,你可以用它作为逻辑回归的训练集。对于每一个培训例子,你有两个考试的申请人的分数和录取决定。为了做到这一点,我们将建立一个分类模型,根据考试成绩估计入学概率。

In [7]:

#三大件

importnumpy as npimportpandas as pdimportmatplotlib.pyplot as plt%matplotlib inline

In [8]:

importos

path= 'data' + os.sep + 'LogiReg_data.txt'pdData= pd.read_csv(path, header=None, names=['Exam 1', 'Exam 2', 'Admitted'])

pdData.head()

Out[8]:

Exam 1Exam 2Admitted

0

34.623660

78.024693

0

1

30.286711

43.894998

0

2

35.847409

72.902198

0

3

60.182599

86.308552

1

4

79.032736

75.344376

1

In [10]:

positive = pdData[pdData['Admitted'] == 1] #returns the subset of rows such Admitted = 1, i.e. the set of *positive* examples

negative = pdData[pdData['Admitted'] == 0] #returns the subset of rows such Admitted = 0, i.e. the set of *negative* examples

fig, ax= plt.subplots(figsize=(10,5))

ax.scatter(positive['Exam 1'], positive['Exam 2'], s=30, c='b', marker='o', label='Admitted')

ax.scatter(negative['Exam 1'], negative['Exam 2'], s=30, c='r', marker='x', label='Not Admitted')

ax.legend()

ax.set_xlabel('Exam 1 Score')

ax.set_ylabel('Exam 2 Score')

Out[10]:

Text(0,0.5,'Exam 2 Score')

zNUqGWyAAAA%0AAElFTkSuQmCC

The logistic regression

目标:建立分类器(求解出三个参数 θ0θ1θ2)

设定阈值,根据阈值判断录取结果

要完成的模块

sigmoid : 映射到概率的函数

model : 返回预测结果值

cost : 根据参数计算损失

gradient : 计算每个参数的梯度方向

descent : 进行参数更新

accuracy: 计算精度

sigmoid 函数

817161-20180208000059685-826708517.png

In [11]:

defsigmoid(z):return 1 / (1 + np.exp(-z))

In [12]:

nums = np.arange(-10, 10, step=1) #creates a vector containing 20 equally spaced values from -10 to 10

fig, ax = plt.subplots(figsize=(12,4))

ax.plot(nums, sigmoid(nums),'r')

Out[12]:

[]

weO3t2QGbhcbwAAAABJRU5ErkJggg==

Sigmoid

g:ℝ→[0,1]

g(0)=0.5

g(−∞)=0

g(+∞)=1

In [13]:

defmodel(X, theta):return sigmoid(np.dot(X, theta.T))

817161-20180208000737826-888458289.png

In [14]:

pdData.insert(0, 'Ones', 1) #in a try / except structure so as not to return an error if the block si executed several times#set X (training data) and y (target variable)

orig_data = pdData.as_matrix() #convert the Pandas seful for further computationsX= orig_data[:,0:cols-1]

y= orig_data[:,cols-1:cols]

​#convert to numpy arrays and initalize the parameter array theta#X = np.matrix(X.values)#y = np.matrix(data.iloc[:,3:4].values) #np.array(y.values)

theta = np.zeros([1, 3])

In [15]:

X[:5]

Out[15]:

array([[ 1. , 34.62365962, 78.02469282],

[1. , 30.28671077, 43.89499752],

[1. , 35.84740877, 72.90219803],

[1. , 60.18259939, 86.3085521],

[1. , 79.03273605, 75.34437644]])

In [16]:

y[:5]

Out[16]:

array([[ 0.],

[ 0.],

[ 0.],

[1.],

[1.]])

In [17]:

theta

Out[17]:

array([[ 0., 0., 0.]])

In [18]:

X.shape, y.shape, theta.shape

Out[18]:

((100, 3), (100, 1), (1, 3))

损失函数

817161-20180208001844904-1709089606.png

In [19]:

defcost(X, y, theta):

left= np.multiply(-y, np.log(model(X, theta)))

right= np.multiply(1 - y, np.log(1 -model(X, theta)))return np.sum(left - right) / (len(X))

In [20]:

cost(X, y, theta)

Out[20]:

0.69314718055994529

计算梯度

817161-20180208003008623-1393356865.png

In [21]:

defgradient(X, y, theta):

grad=np.zeros(theta.shape) # (1,3)

error= (model(X, theta)-y).ravel()for j in range(len(theta.ravel())): #for each parmeter

term =np.multiply(error, X[:,j])

grad[0, j]= np.sum(term) /len(X)return grad

Gradient descent

比较3中不同梯度下降方法

In [22]:

STOP_ITER =0

STOP_COST= 1STOP_GRAD= 2​defstopCriterion(type, value, threshold):#设定三种不同的停止策略

if type == STOP_ITER: return value >thresholdelif type == STOP_COST: return abs(value[-1]-value[-2])

In [23]:

importnumpy.random#洗牌

defshuffleData(data):

np.random.shuffle(data)

cols= data.shape[1]

X= data[:, 0:cols-1]

y= data[:, cols-1:]return X, y

In [24]:

importtime

​defdescent(data, theta, batchSize, stopType, thresh, alpha):#梯度下降求解

init_time=time.time()

i= 0 #迭代次数

k = 0 #batch

X, y =shuffleData(data)

grad= np.zeros(theta.shape) #计算的梯度

costs = [cost(X, y, theta)] #损失值

​whileTrue:

grad= gradient(X[k:k+batchSize], y[k:k+batchSize], theta)

k+= batchSize #取batch数量个数据

if k >=n:

k=0

X, y= shuffleData(data) #重新洗牌

theta = theta - alpha*grad #参数更新

costs.append(cost(X, y, theta)) #计算新的损失

i += 1​if stopType == STOP_ITER: value =ielif stopType == STOP_COST: value =costselif stopType == STOP_GRAD: value =gradif stopCriterion(stopType, value, thresh): break

return theta, i-1, costs, grad, time.time() -init_time

In [25]:

defrunExpe(data, theta, batchSize, stopType, thresh, alpha):#import pdb; pdb.set_trace();

theta, iter, costs, grad, dur =descent(data, theta, batchSize, stopType, thresh, alpha)

name= "Original" if (data[:,1]>2).sum() > 1 else "Scaled"name+= "data - learning rate: {} -".format(alpha)if batchSize==n: strDescType = "Gradient"

elif batchSize==1: strDescType = "Stochastic"

else: strDescType = "Mini-batch ({})".format(batchSize)

name+= strDescType + "descent - Stop:"

if stopType == STOP_ITER: strStop = "{} iterations".format(thresh)elif stopType == STOP_COST: strStop = "costs change < {}".format(thresh)else: strStop = "gradient norm < {}".format(thresh)

name+=strStopprint ("***{}\nTheta: {} - Iter: {} - Last cost: {:03.2f} - Duration: {:03.2f}s".format(

name, theta, iter, costs[-1], dur))

fig, ax= plt.subplots(figsize=(12,4))

ax.plot(np.arange(len(costs)), costs,'r')

ax.set_xlabel('Iterations')

ax.set_ylabel('Cost')

ax.set_title(name.upper()+ '- Error vs. Iteration')returntheta

不同的停止策略

设定迭代次数

In [26]:

#选择的梯度下降方法是基于所有样本的

n=100runExpe(orig_data, theta, n, STOP_ITER, thresh=5000, alpha=0.000001)

***Original data - learning rate: 1e-06 - Gradient descent - Stop: 5000 iterations

Theta: [[-0.00027127 0.00705232 0.00376711]] - Iter: 5000 - Last cost: 0.63 - Duration: 1.52s

Out[26]:

array([[-0.00027127, 0.00705232, 0.00376711]])

wHA+0citR1o4AAAAABJRU5ErkJggg==

根据损失值停止

设定阈值 1E-6, 差不多需要110 000次迭代

In [27]:

runExpe(orig_data, theta, n, STOP_COST, thresh=0.000001, alpha=0.001)

***Original data - learning rate: 0.001 - Gradient descent - Stop: costs change < 1e-06

Theta: [[-5.13364014 0.04771429 0.04072397]] - Iter: 109901 - Last cost: 0.38 - Duration: 39.86s

Out[27]:

array([[-5.13364014, 0.04771429, 0.04072397]])

wMSjKht+Udn9AAAAABJRU5ErkJggg==

根据梯度变化停止

设定阈值 0.05,差不多需要40 000次迭代

In [28]:

runExpe(orig_data, theta, n, STOP_GRAD, thresh=0.05, alpha=0.001)

***Original data - learning rate: 0.001 - Gradient descent - Stop: gradient norm < 0.05

Theta: [[-2.37033409 0.02721692 0.01899456]] - Iter: 40045 - Last cost: 0.49 - Duration: 13.44s

Out[28]:

array([[-2.37033409, 0.02721692, 0.01899456]])

AZ3y16s3hngkAAAAAElFTkSuQmCC

对比不同的梯度下降方法

Stochastic descent

In [29]:

runExpe(orig_data, theta, 1, STOP_ITER, thresh=5000, alpha=0.001)

***Original data - learning rate: 0.001 - Stochastic descent - Stop: 5000 iterations

Theta: [[-0.38585397 0.09042018 -0.01044445]] - Iter: 5000 - Last cost: 1.53 - Duration: 0.48s

Out[29]:

array([[-0.38585397, 0.09042018, -0.01044445]])

wNlEl7mA6YmTgAA%0AAABJRU5ErkJggg==

有点爆炸。。。很不稳定,再来试试把学习率调小一些

In [30]:

runExpe(orig_data, theta, 1, STOP_ITER, thresh=15000, alpha=0.000002)

***Original data - learning rate: 2e-06 - Stochastic descent - Stop: 15000 iterations

Theta: [[-0.00201963 0.01014321 0.00107125]] - Iter: 15000 - Last cost: 0.63 - Duration: 1.70s

Out[30]:

array([[-0.00201963, 0.01014321, 0.00107125]])

w8ncXY3DdQN3wAAAABJRU5ErkJggg==

速度快,但稳定性差,需要很小的学习率

Mini-batch descent

In [31]:

runExpe(orig_data, theta, 16, STOP_ITER, thresh=15000, alpha=0.001)

***Original data - learning rate: 0.001 - Mini-batch (16) descent - Stop: 15000 iterations

Theta: [[-1.032863 0.03624659 0.02571257]] - Iter: 15000 - Last cost: 0.97 - Duration: 2.11s

Out[31]:

array([[-1.032863 , 0.03624659, 0.02571257]])

wOxnv6z1mPSPAAAAABJRU5ErkJggg==

浮动仍然比较大,我们来尝试下对数据进行标准化 将数据按其属性(按列进行)减去其均值,然后除以其方差。最后得到的结果是,对每个属性/每列来说所有数据都聚集在0附近,方差值为1

In [32]:

from sklearn importpreprocessing as pp

scaled_data=orig_data.copy()

scaled_data[:,1:3] = pp.scale(orig_data[:, 1:3])

runExpe(scaled_data, theta, n, STOP_ITER, thresh=5000, alpha=0.001)

***Scaled data - learning rate: 0.001 - Gradient descent - Stop: 5000 iterations

Theta: [[ 0.3080807 0.86494967 0.77367651]] - Iter: 5000 - Last cost: 0.38 - Duration: 1.92s

Out[32]:

array([[ 0.3080807 , 0.86494967, 0.77367651]])

wHEu5SivQEGjAAAAABJ%0ARU5ErkJggg==

它好多了!原始数据,只能达到达到0.61,而我们得到了0.38个在这里! 所以对数据做预处理是非常重要的

In [33]:

runExpe(scaled_data, theta, n, STOP_GRAD, thresh=0.02, alpha=0.001)

***Scaled data - learning rate: 0.001 - Gradient descent - Stop: gradient norm < 0.02

Theta: [[ 1.0707921 2.63030842 2.41079787]] - Iter: 59422 - Last cost: 0.22 - Duration: 21.58s

Out[33]:

array([[ 1.0707921 , 2.63030842, 2.41079787]])

FIvSRJkiRJHcpQL0mSJElShzLUS5Ik%0ASZLUoQz1kiRJkiR1KEO9JEmSJEkdylAvSZIkSVKHMtRLkiRJktShDPWSJEmSJHUoQ70kSZIkSR3K%0AUC9JkiRJUocy1EuSJEmS1KH+P3dmZs6UbbVvAAAAAElFTkSuQmCC

更多的迭代次数会使得损失下降的更多!

In [34]:

theta = runExpe(scaled_data, theta, 1, STOP_GRAD, thresh=0.002/5, alpha=0.001)

***Scaled data - learning rate: 0.001 - Stochastic descent - Stop: gradient norm < 0.0004

Theta: [[ 1.14904527 2.7920262 2.56725991]] - Iter: 72624 - Last cost: 0.22 - Duration: 9.14s

zngMUppc3kKxl0A7eTTyX4DLCDfFWDEaWU7gd+G7ix2Oc9wE2VNj8Evki+gsFzwIcrhyGllMhX%0AM9hOPtVhJfAEcO04vq6dwOXkUyEeBT5Onulw6ziOIUmSJiDyZ7kkSZIkSZIzCiRJkiRJUolBgSRJ%0AkiRJGmJQIEmSJEmShhgUSJIkSZKkIQYFkiRJkiRpiEGBJEmSJEkaYlAgSZIkSZKGGBRIkiRJkqQh%0ABgWSJEmSJGmIQYEkSZIkSRpiUCBJkiRJkob8f+pFlLxn9cEHAAAAAElFTkSuQmCC

随机梯度下降更快,但是我们需要迭代的次数也需要更多,所以还是用batch的比较合适!!!

In [35]:

runExpe(scaled_data, theta, 16, STOP_GRAD, thresh=0.002*2, alpha=0.001)

***Scaled data - learning rate: 0.001 - Mini-batch (16) descent - Stop: gradient norm < 0.004

Theta: [[ 1.16033549 2.81496841 2.59589695]] - Iter: 2393 - Last cost: 0.22 - Duration: 0.40s

Out[35]:

array([[ 1.16033549, 2.81496841, 2.59589695]])

AAAA%0AAElFTkSuQmCC

精度

In [36]:

#设定阈值

defpredict(X, theta):return [1 if x >= 0.5 else 0 for x inmodel(X, theta)]

In [37]:

scaled_X = scaled_data[:, :3]

y= scaled_data[:, 3]

predictions=predict(scaled_X, theta)

correct= [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) inzip(predictions, y)]

accuracy= (sum(map(int, correct)) %len(correct))print ('accuracy = {0}%'.format(accuracy))

accuracy = 89%

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值