模型原型
class sklearn.linear_model.LogisticRegression(penalty=’l2’,dual=False,tol=0.0001,C=1.0,fit_intercept=True,intercept_scaling=1, class_weight=None,random_state=None,solver=’liblinear’,max_iter=100,multi_class=’ovr’,verbose=0,warm_start=False,n_jobs=1)
参数
- penalty:指定正则化策
l1:优化目标函数:
12||w→||22+CL(w→),C>0,L(w)为极大似然函数
1
2
|
|
w
→
|
|
2
2
+
C
L
(
w
→
)
,
C
>
0
,
L
(
w
)
为
极
大
似
然
函
数
l2:优化目标函数:
||w→||1+CL(w→),C>0,L(w)为极大似然函数
|
|
w
→
|
|
1
+
C
L
(
w
→
)
,
C
>
0
,
L
(
w
)
为
极
大
似
然
函
数
dual:
- True:求解对偶形式
- False:求解原始形式
tol
- C:指定了罚项系数的倒数,值越小则正则化项越大
- fit_intercept
- intercept_scaling:降低人造特征的影响(当solver=’liblinear’时才有意义)
- class_weight:
- 字典:给定每个分类的权重
- ‘balanced’:每个分类的权重与该分类在样本中出现的频率成反比
- 未指定:每个分类的权重都为1
- random_state:
- solver:小规模数据集使用’liblinear’;大规模数据集使用’sag’;’newton-cg’,’lbfgs’,’sag’只处理penalty=’l2’的情况
- ’newton-cg’:牛顿法
- ‘lbfgs’:L-BFGS拟牛顿法
- ‘liblinear’:liblinear
- ‘sag’:Stochastic Average Gradient descent算法
- max_iter:
- multi_class:指定处理多分类问题的策略
- ’ovr’:采用one-vs-rest策略
- ‘multinomial’:直接采用多分类逻辑回归策略
- verbose:开启/关闭迭代中间输出日志功能
- warm_start
n_jobs
属性
- coef_f:权重向量
intercept_:b值
方法
- fit(X,y[,sample_weight])
- predict(X)
- predict_log_proba(X):数组的元素依次是X预测为各个类别的概率的对数值
- predict_proba(X):数组的元素依次是X预测为各个类别的概率值
- score(X,y[,sample_weight])
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets,linear_model,discriminant_analysis,cross_validation
加载数据
def load_data():
iris=datasets.load_iris()
X_train=iris.data
y_train=iris.target
return cross_validation.train_test_split(X_train,y_train,test_size=0.25,random_state=0,stratify=y_train)
使用LogisticRegression
def test_LogisticRegression(*data):
X_train,X_test,y_train,y_test=data
regr=linear_model.LogisticRegression()
regr.fit(X_train,y_train)
print('Coefficients:%s,\nintercept %s'%(regr.coef_,regr.intercept_))
print('Score:%.2f'%regr.score(X_test,y_test))
X_train,X_test,y_train,y_test=load_data()
test_LogisticRegression(X_train,X_test,y_train,y_test)
multi_class参数的影响
def test_LogisticRegression_multinomial(*data):
X_train,X_test,y_train,y_test=data
regr=linear_model.LogisticRegression()
regr.fit(X_train,y_train)
print('Cofficients:%s,\nintercept %s'%(regr.coef_,regr.intercept_))
print('Score:%.2f'%regr.score(X_test,y_test))
X_train,X_test,y_train,y_test=load_data()
test_LogisticRegression_multinomial(X_train,X_test,y_train,y_test)
参数C(正则化项系数的倒数)的影响
def test_LogisticRegression_C(*data):
X_train,X_test,y_train,y_test=data
Cs=np.logspace(-2,4,num=100)
scores=[]
for C in Cs:
regr=linear_model.LogisticRegression(C=C)
regr.fit(X_train,y_train)
scores.append(regr.score(X_test,y_test))
#绘图
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.plot(Cs,scores)
ax.set_xlabel(r"C")
ax.set_ylabel(r"score")
ax.set_xscale('log')
ax.set_title("LogisticRegression")
plt.show()
test_LogisticRegression_C(X_train,X_test,y_train,y_test)