Logistic Regression

Logistic Regression

干什么用

解决二分类问题。优点是对数据不进行分布假设,模型简单。

思路

原理是线性回归模型的预测值 z z z经过单调可微的Logistic函数(sigmoid函数)映射到 y ^ \hat{y} y^,进而根据阈值可以判断分类标记y。
其中Logistic函数(sigmoid函数) y ^ = 1 / ( 1 + exp ⁡ − z ) \hat{y}=1/(1+\exp^{-z}) y^=1/(1+expz)
y ^ \hat{y} y^视为样本x作为正例的可能性p(y=1|x),则 1 − y ^ 1-\hat{y} 1y^是其反例可能性p(y=0|x),然后重点来了:p(y|x)可以表示成 p ( y ∣ x ) = p ( y = 0 ∣ x ) ( 1 − y ) ∗ p ( y = 1 ∣ x ) y = ( 1 − y ^ ) ( 1 − y ) ∗ y ^ y p(y|x) = p(y=0|x)^{(1-y)}*p(y=1|x)^y=(1-\hat{y})^{(1-y)}*\hat{y}^y p(yx)=p(y=0x)(1y)p(y=1x)y=(1y^)(1y)y^y
对数似然概率 ln ⁡ p ( y ∣ x ) = ( 1 − y ) ∗ ln ⁡ ( 1 − y ^ ) + y ∗ ln ⁡ y ^ \ln p(y|x) = (1-y)*\ln (1-\hat{y})+y*\ln \hat{y} lnp(yx)=(1y)ln(1y^)+ylny^

对于给定数据集 ( x i , y i ) i = 1 m {(x_i,y_i)}_{i=1}^m (xi,yi)i=1m,最优解问题转化成最大化对数似然估计
M a x ( ∑ i = 1 m ln ⁡ p ( y ∣ x ) ) Max(\sum_{i=1}^m\ln p(y|x)) Max(i=1mlnp(yx)),即令每个样本属于其真实标记的概率越大越好。
将最大化对数似然估计转换成最小化损失函数问题,同时为了方便我们把总的损失函数除以m得到平均损失函数【方便系数变换?】→
J ( ω , b ) = a r g m i n ( 1 / m ∗ ∑ i = 1 m ( − ln ⁡ p ( y ∣ x ) ) ) = a r g m i n ( 1 / m ∗ ∑ i = 1 m ( − ( 1 − y ) ∗ ln ⁡ ( 1 − y ^ ) − y ∗ ln ⁡ y ^ ) J(\omega,b) = arg min(1/m*\sum_{i=1}^m(-\ln p(y|x)))=arg min(1/m*\sum_{i=1}^m(-(1-y)*\ln (1-\hat{y})-y*\ln \hat{y}) J(ω,b)=argmin(1/mi=1m(lnp(yx)))=argmin(1/mi=1m((1y)ln(1y^)ylny^)
特例【m=1的单个样本(x,y)】 L o s s ( ω , b ) = a r g m i n ( − ( 1 − y ) ∗ ln ⁡ ( 1 − y ^ ) − y ∗ ln ⁡ y ^ ) Loss (\omega,b) = arg min(-(1-y)*\ln (1-\hat{y})-y*\ln \hat{y}) Loss(ω,b)=argmin((1y)ln(1y^)ylny^)
上式是关于 ω , b \omega,b ω,b的高阶可导连续凸函数,可用梯度下降法等求解

梯度下降法最小化损失函数

在这里插入图片描述
对于单个样本(x,y),
∂ L o s s / ∂ w 1 = ∂ L o s s / ∂ a ∗ ∂ a / ∂ z ∗ ∂ z / ∂ w 1 = [ ( a − y ) / ( a ∗ ( 1 − a ) ) ] ∗ [ a ∗ ( 1 − a ) ] ∗ [ x 1 ] = ( a − y ) ∗ x 1 \partial{Loss}/\partial{w1}=\partial{Loss}/\partial{a}*\partial{a}/\partial{z}*\partial{z}/\partial{w1}=[(a-y)/(a*(1-a))]*[a*(1-a)]*[x1]=(a-y)*x1 Loss/w1=Loss/aa/zz/w1=[(ay)/(a(1a))][a(1a)][x1]=(ay)x1
∂ L o s s / ∂ w 2 = ∂ L o s s / ∂ a ∗ ∂ a / ∂ z ∗ ∂ z / ∂ w 2 = [ ( a − y ) / ( a ∗ ( 1 − a ) ) ] ∗ [ a ∗ ( 1 − a ) ] ∗ [ x 2 ] = ( a − y ) ∗ x 2 \partial{Loss}/\partial{w2}=\partial{Loss}/\partial{a}*\partial{a}/\partial{z}*\partial{z}/\partial{w2}=[(a-y)/(a*(1-a))]*[a*(1-a)]*[x2]=(a-y)*x2 Loss/w2=Loss/aa/zz/w2=[(ay)/(a(1a))][a(1a)][x2]=(ay)x2
∂ L o s s / ∂ b = ∂ L o s s / ∂ a ∗ ∂ a / ∂ z ∗ ∂ z / ∂ b = [ ( a − y ) / ( a ∗ ( 1 − a ) ) ] ∗ [ a ∗ ( 1 − a ) ] ∗ [ 1 ] = ( a − y ) \partial{Loss}/\partial{b}=\partial{Loss}/\partial{a}*\partial{a}/\partial{z}*\partial{z}/\partial{b}=[(a-y)/(a*(1-a))]*[a*(1-a)]*[1]=(a-y) Loss/b=Loss/aa/zz/b=[(ay)/(a(1a))][a(1a)][1]=(ay)
w 1 : = w 1 − α ∗ ∂ L o s s / ∂ w 1 w_1:=w_1-\alpha*\partial{Loss}/\partial{w1} w1:=w1αLoss/w1
w 2 : = w 2 − α ∗ ∂ L o s s / ∂ w 2 w_2:=w_2-\alpha*\partial{Loss}/\partial{w2} w2:=w2αLoss/w2
b : = b − α ∗ ∂ L o s s / ∂ b b:=b-\alpha*\partial{Loss}/\partial{b} b:=bαLoss/b

对于样本量为m的数据集 ( x i , y i ) i = 1 m {(x_i,y_i)}_{i=1}^m (xi,yi)i=1m
总的损失函数→ J ( ω , b ) = 1 / m ∗ ∑ i = 1 m ( L o s s ) = − 1 / m ∗ ( Y . ∗ l o g ( A ) T + ( 1 − Y ) . ∗ l o g ( 1 − A ) T ) J(\omega,b) = 1/m*\sum_{i=1}^m(Loss)=-1/m*(Y.*log(A)^T+(1-Y).*log(1-A)^T) J(ω,b)=1/mi=1m(Loss)=1/m(Y.log(A)T+(1Y).log(1A)T)
for i=1 to iterations:
z = w T ∗ X + b z=w^T*X+b z=wTX+b
a=sigmoid(z)
dz = a-Y
d w = 1 / m ∗ X ∗ d z T dw=1/m*X*dz^T dw=1/mXdzT
d b = 1 / m ∗ s u m ( d z ) db=1/m*sum(dz) db=1/msum(dz)
w = w − α ∗ d w w=w-\alpha*dw w=wαdw
b = b − α ∗ d b b=b-\alpha*db b=bαdb

coding

fanghao6666/neural-networks-and-deep-learning
1、重点注意各个参数的维度,点乘时候需不需要转置。
2、数据量较小就直接用梯度下降法了,如果量大随机梯次下降时候还需要设置batch。

import numpy as np
'''定义sigmoid函数、初始化params'''
def sigmoid(z):
	return 1/(1 + np.exp(-z))
def param_initialize(dim):
	w = np.zeros((dim,1))
	b = 0
	return w,b
	
'''前向传播和梯度下降更新params'''
def propagate(w,b,X,Y):
	m = X.shape[1]
	A = sigmoid(np.dot(w.T,X)+b)
	costs = -1/m*(np.dot(Y,np.log(A).T)+np.dot(1-Y,np.log(1-A).T))
	dw = 1/m* np.dot(X,(A-Y).T)
	db = 1/m*np.sum(A-Y)
    grads = {"dw": dw,
            "db": db}
    return grads, cost
    
'''优化params'''
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
	costs = []
	for i in range(num_iterations):
		grads, cost = propagate(w,b,X,Y)
		dw = grads['dw'] 
		db = grads['db'] 
		w = w - learning_rate * dw
		b = b - learning_rate * db
		
		if i % 100 == 0:
			costs.append(cost)
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    params = {"w": w,"b": b}
    grads = {"dw": dw,"db": db}
    return params, grads, costs
    
'''预测结果'''
def predict(w,b,X):
	m = X.shape[1]
	Y_prediction = np.zeros((1,m))
	w = w.reshape((-1,1))
	A = sigmoid(np.dot(w.T,X) + b)

    for i in range(A.shape[1]):
        if(A[0][i] <= 0.5):
            Y_prediction[0][i] = 0
        else:
            Y_prediction[0][i] = 1
	return Y_prediction
'''性能评估:accuray,cost~iteration plot'''
def accuracy(Y_hat,Y):
	print("accuracy: {} %".format(100 - np.mean(np.abs(Y_hat- Y)) * 100))
	return 100 - np.mean(np.abs(Y_hat- Y)) * 100

def costplot(costs):
	plt.plot(costs)
	plt.ylabel('cost')
	plt.xlabel('iterations (per hundreds)')
	
'''模型整合'''
def model(X_train, Y_train, X_test, Y_test,num_iterations=2000,
learning_rate=0.5,print_cost=False):
	w,b = aram_initialize(X.shape[0])
	params, grads, costs = optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = print_cost)
	w = parameters["w"]
    b = parameters["b"]
    Y_prediction_train = predict(w, b, X_train)
	Y_prediction_test = predict(w, b, X_test)
    accuracy(Y_train,Y_prediction_train)
    accuracy(Y_test,Y_prediction_test)
    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    return d
    
'''超参数的选取:learning rate'''
learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(X_train, Y_train, X_test, Y_test, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

plt.figure()
for i in learning_rates:
    costplot(costs)

legend = plt.legend(learning_rates,loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

参考

《机器学习》周志华
吴恩达《神经网络和深度学习》第二周神经网络基础
fanghao6666的GITHUB之neural-networks-and-deep-learning

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值