Logistic Regression(LR)
文章目录
参考:
- 李航,统计学习方法
- Nick McClure, 曾益强 译, TensorFlow 机器学习实战指南
- 周志华, 机器学习
- sklearn.linear_model.LosigticRegression
- Statsmodels.api
Model
考虑二分类任务,其输出标记为
y
∈
{
0
,
1
}
y\in \{0,1\}
y∈{0,1},而线性回归模型产生的预测值
z
=
w
⋅
x
+
b
z=w \cdot x +b
z=w⋅x+b是实值,于是我们需要将实值
z
z
z转化成0/1值。
对数几率函数
y
=
1
1
+
e
−
z
y=\frac{1}{1+e^{-z}}
y=1+e−z1是一种“Sigmoid函数”,将
z
z
z值转化成一个接近0或1 的
y
y
y值,于是得到
y
=
1
1
+
e
−
(
w
⋅
x
+
b
)
,
y=\frac{1}{1+e^{-(w\cdot x+b)}},
y=1+e−(w⋅x+b)1,等价于
ln
y
1
−
y
=
w
⋅
x
+
b
。
\ln \frac{y}{1-y} = w\cdot x+b。
ln1−yy=w⋅x+b。
若将
y
y
y看作是样本
x
x
x作为正例的可能性,则
1
−
y
1-y
1−y是其反例可能性,两者的比值
y
1
−
y
\frac{y}{1-y}
1−yy
称为“几率”(odds),反映了
x
x
x作为正例的相对可能性。对几率去对数则得到“对数几率”(log odds,亦称logit)。
将
x
=
(
1
,
x
)
,
w
=
(
b
,
w
)
x = (1,x), w = (b,w)
x=(1,x),w=(b,w),
y
y
y 看作后验概率估计
P
(
Y
=
1
∣
x
)
P(Y=1|x)
P(Y=1∣x),则
ln
P
(
Y
=
1
∣
x
)
P
(
Y
=
0
∣
x
)
=
w
⋅
x
。
\ln \frac{P(Y=1|x)}{P(Y=0|x)} = w\cdot x。
lnP(Y=0∣x)P(Y=1∣x)=w⋅x。
于是
P
(
Y
=
1
∣
x
)
=
e
x
p
(
w
⋅
x
)
1
+
e
x
p
(
w
⋅
x
)
,
P
(
Y
=
0
∣
x
)
=
1
1
+
e
x
p
(
w
⋅
x
)
。
P(Y=1|x)= \frac{exp(w\cdot x)}{1+exp(w\cdot x)},\\ P(Y=0|x)=\frac{1}{1+exp(w\cdot x)}。
P(Y=1∣x)=1+exp(w⋅x)exp(w⋅x),P(Y=0∣x)=1+exp(w⋅x)1。
这时,线性函数的值越接近正无穷,概率值就越接近1;线性模型的值越接近负无穷,概率值就越接近0。这样的模型就是Logistic Regression(LR)模型。
LR模型学习时,对于给定的训练数据集
T
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
N
,
y
N
)
}
T=\{(x_1,y_1), (x_2,y_2),...,(x_N,y_N)\}
T={(x1,y1),(x2,y2),...,(xN,yN)},其中,
x
i
∈
R
n
x_i\in\mathbb{R}^n
xi∈Rn,
y
i
∈
{
0
,
1
}
y_i\in \{0,1\}
yi∈{0,1},可以应用极大似然估计法估计模型的参数,从而得到LR回归模型。
设
P
(
Y
=
1
∣
x
)
=
π
(
x
)
P(Y=1|x)= \pi(x)
P(Y=1∣x)=π(x),
P
(
Y
=
0
∣
x
)
=
1
−
π
(
x
)
P(Y=0|x)=1-\pi(x)
P(Y=0∣x)=1−π(x),
似然函数为
∏
i
=
1
N
[
π
(
x
i
)
]
y
i
[
1
−
π
(
x
i
)
]
1
−
y
i
,
\prod^N_{i=1}[\pi(x_i)]^{y_i}[1-\pi(x_i)]^{1-y_i},
i=1∏N[π(xi)]yi[1−π(xi)]1−yi,
其对数似然函数为
L
(
w
)
=
∑
i
=
1
N
[
y
i
log
π
(
x
i
)
+
(
1
−
y
i
)
log
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
log
π
(
x
i
)
1
−
π
(
x
i
)
+
log
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
(
w
⋅
x
)
−
log
(
1
+
exp
(
w
⋅
x
)
)
]
。
L(w) = \sum^N_{i=1}[y_i\log\pi(x_i)+(1-y_i)\log(1-\pi(x_i))] \\ = \sum^N_{i=1}[y_i\log\frac{\pi(x_i)}{1-\pi(x_i)}+\log(1-\pi(x_i))]\\ = \sum^N_{i=1} [y_i(w \cdot x)-\log(1+\exp(w\cdot x))]。
L(w)=i=1∑N[yilogπ(xi)+(1−yi)log(1−π(xi))]=i=1∑N[yilog1−π(xi)π(xi)+log(1−π(xi))]=i=1∑N[yi(w⋅x)−log(1+exp(w⋅x))]。
对其求最大值等价于最小化
L
(
w
)
=
∑
[
log
(
1
+
exp
(
w
⋅
x
)
)
−
y
i
(
w
⋅
x
)
]
。
L(w) = \sum[\log(1+\exp(w\cdot x)) - y_i(w \cdot x) ]。
L(w)=∑[log(1+exp(w⋅x))−yi(w⋅x)]。
这样,问题就变成了以对数似然函数为目标函数的最优化问题。LR回归学习中通常采用的方法是梯度下降法,拟牛顿法等。
sklearn.linear_model.LogisticRegression
sklearn.linear_model.LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=’warn’, max_iter=100, multi_class=’warn’, verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)
参数解释:
-
penalty : str, ‘l1’, ‘l2’, ‘elasticnet’ or ‘none’, optional (default=’l2’)
Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver. If ‘none’ (not supported by the liblinear solver), no regularization is applied.
用于指定处罚中使用的标准。“newton cg”、“sag”和“lbfgs”解算器仅支持 L 2 L_2 L2惩罚。“Elasticnet”仅由“Saga”解算器支持。如果“无”(liblinear解算器不支持),则不应用正则化。 -
solver : str, {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, optional (default=’liblinear’).
Algorithm to use in the optimization problem.
For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.
For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes.
‘newton-cg’, ‘lbfgs’, ‘sag’ and ‘saga’ handle L2 or no penalty
‘liblinear’ and ‘saga’ also handle L1 penalty
‘saga’ also supports ‘elasticnet’ penalty
‘liblinear’ does not handle no penalty
Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
New in version 0.17: Stochastic Average Gradient descent solver.
New in version 0.19: SAGA solver.
Changed in version 0.20: Default will change from ‘liblinear’ to ‘lbfgs’ in 0.22.
用于优化问题的算法。
对于小数据集,选择“liblinear”;而对于大数据集,“sag”和“saga”更快。
对于多类问题,只有’newton cg’、‘sag’、'saga’和’lbfgs’处理多项损失;'liblinear’仅限于one-versus-rest。
‘Newton CG’、‘LBFGS’、‘SAG’和‘SAGA’处理 L 2 L_2 L2或无惩罚
“liblinear”和“saga”也处理一级惩罚
“saga”也支持“elasticnet”惩罚
“liblinear”不处理无惩罚
请注意,“SAG”和“SAGA”快速收敛仅在具有大致相同比例的特征上得到保证。您可以使用sklearn.preprocessing中的定标器对数据进行预处理。
默认值将在0.22中从“liblinear”更改为“lbfgs”。 -
class_weight : dict or ‘balanced’, optional (default=None)
Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
New in version 0.17: class_weight=’balanced’
权重的格式是:{class_label: weight}。如果没有给出,所有的类都应该有一个权重。
“balanced”模式使用y值自动调整权重,与输入数据中的类频率成反比,作为n_样本/(n_classes*np.bincount(y))。
0.17版新增:class“weight=‘balanced’ -
C : float, optional (default=1.0)
Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
正则化强度的倒数;必须是正浮点。与支持向量机一样,较小的值指定更强的正则化。
from sklearn.linear_model import LogisticRegression as LR
lr = LR()
#Predict confidence scores for samples.
lr.decision_function(x_train)
#Convert coefficient matrix to dense array format.
lr.densify()
#Fit the model according to the given training data.
lr.fit(X, y, [sample_weight])
#Coefficient of the features in the decision function.
lr.coef_
#Intercept (a.k.a. bias) added to the decision function.
lr.intercept_
#Get parameters for this estimator.
lr.get_params()
#Predict class labels for samples in x_train.
lr.predict(x_train)
#Log of probability estimates.
lr.predict_log_proba(x_train)
#Probability estimates.
lr.predict_proba(x_train)
#Returns the mean accuracy on the given test data and labels.
lr.score(x_train, y_train,[sample_weight])
#Set the parameters of this estimator.
lr.set_params(self, \*\*params)
#Convert coefficient matrix to sparse format.
lr.sparsify(self)
sklearn.linear_model.LogisticRegressionCV
sklearn.linear_model.LogisticRegressionCV(Cs=10, fit_intercept=True, cv=’warn’, dual=False, penalty=’l2’, scoring=None, solver=’lbfgs’, tol=0.0001, max_iter=100, class_weight=None, n_jobs=None, verbose=0, refit=True, intercept_scaling=1.0, multi_class=’warn’, random_state=None, l1_ratios=None)
用法同sklearn.linear_model.LogisticRegression。
statsmodels
import statsmodels.api as sm
# 添加常数项
x1 = sm.add_constant(x_train)
logit = sm.Logit(y_train,x1)
result = logit.fit()
print(result.summary())
使用statsmodels.api编写逐步逻辑回归:
# 向前逻辑回归
def forward_selection(x_train,y_train,namein,nameset,sle):
x1 = x_train[nameset]
xmodel = sm.add_constant(x1)
logit = sm.Logit(y_train,xmodel)
result=logit.fit()
print(result.summary())
t_value = abs(result.tvalues)
p_value = result.pvalues
k1 = p_value[p_value<sle].keys()
if 'const' in k1:
k1 = k1[1:]
if len(k1)!=0:
in_name = t_value[k1].idxmax()
print(in_name, p_value[in_name],t_value[in_name])
namein.append(in_name)
nameset.pop(nameset.index(in_name))
else:
in_name = ''
print('no new variable to insert')
return namein, nameset, in_name
# 向后逻辑回归
def backward_selection(x_train,y_train,nameout,nameset,sls):
x1 = x_train[nameset]
xmodel = sm.add_constant(x1)
logit = sm.Logit(y_train,xmodel)
result=logit.fit()
print(result.summary())
t_value = abs(result.tvalues)
p_value = result.pvalues
k1 = p_value[p_value>sls].keys()
if len(k1)!=0:
out_name = t_value[k1].idxmax()
print(out_name, p_value[out_name],t_value[out_name])
nameout.append(out_name)
nameset.pop(nameset.index(out_name))
else:
out_name = ''
print('no new varibale to remove')
return nameout, nameset, out_name
# 逐步逻辑回归
def model_selection(x_train,y_train,method,sle, sls):
namein, nameout, nameset=[],[],list(x_train.keys())
in_name, out_name = 'const', 'const'
if method == 'forward':
while in_name != '':
namein, nameset, in_name = forward_selection(x_train,y_train,namein,nameset,sle)
train_set = namein
elif method == 'backward':
while out_name !='':
nameout, nameset, out_name = backward_selection(x_train,y_train,nameout,nameset,sls)
train_set = nameset
elif method == 'all':
train_set = nameset
elif method =='stepwise':
while in_name !='' or out_name!='':
namein, nameset, in_name = forward_selection(x_train,y_train,namein,nameset,sle)
nameout, namein, out_name = backward_selection(x_train,y_train,nameout,namein,sls)
train_set = namein
else:
print('Please use correct methods')
return method
x1 = x_train[train_set]
xmodel = sm.add_constant(x1)
logit = sm.Logit(y_train,xmodel)
result=logit.fit()
return result, train_set
x_train, x_test, y_train, y_test = train_test_split(xdata, ydata, test_size = 0.4, random_state = 0)
sle, sls = 0.05, 0.05
method = 'stepwise'
result,train_set = model_selection(x_train,y_train,method,sle, sls)
Tensorflow 实现LR
import tensorflow as tf
from tensorflow.python.framework import ops
ops.reset_default_graph()
sess = tf.Session()
分割数据集为测试集和训练集
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(xdata, ydata, test_size = 0.4, random_state = 0)
x_vals_train = x_train.values()
x_vals_test = x_test.values()
y_vals_train = y_train.values()
y_vals_test = y_test.values()
将所有特征缩放到0和1区间(min-max缩放),逻辑回归收敛的效果更好。
def normalize_cols(m):
col_max = m.max(axis = 0)
col_min = m.min(axis = 0)
return (m-col_min)/(col_max - col_min)
x_vals_train = np.nan_to_mun(normalize_cols(x_vals_train))
x_vals_test = np.nan_to_mun(normalize_cols(x_vals_test))
'''
注:我们在分割数据集后再缩放,目的是保证他们互不影响。
'''
声明批量大小,占位符,变量和逻辑模型。这一步不需要用Sigmoid函数封装输出结果,因为sigmoid操作是包含在内建损失函数中的。
batch_size = 25
x_data = tf.placeholder(shape = [None,7], dtype = tf.float32)
y_target = tf.placeholder(shape = [None,1], dtype = tf.float32)
A = tf.Variable(tf.random_normal(shape = [7,1]))
b = tf.Variable(tf.random_normal(shape = [1,1]))
model_output = tf.add(tf.matmul(x_data, A),b)
声明损失函数,其包含sigmoid函数。初始化变量,声明优化器。
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(model_output, y_target))
init = tf.global_variables_initializer()
sess.run(init)
my_opt = tf.train.GradientDescentOptimizer(0.01)
train_step = my_opt.minimize(loss)
除记录损失函数外,也需要记录分类器在训练集和测试集上的准确度。所以创建一个返回准确度的预测函数。
prediction = tf.round(tf.sigmoid(model_output))
predictions_correct = tf.cast(tf.equal(prediction, y_traget),tf.float32)
accuracy = tf.reduce_mean(predictions_correct)
开始遍历迭代训练,记录损失值和准确度。
loss_vec = []
train_acc = []
test_acc = []
for i in range(1500):
rand_index = np.random.choice(len(x_vals_train),size= batch_size)
rand_x = x_vals_train[rand_index]
rand_y = np.transpose([y_vals_train[rand_index]])
sess.run(train_step, feed_dict{x_data:rand_x,y_target:rand_y})
temp_loss = sess.run()
loss_vec.append(temp_loss)
temp_acc_train = sess.run(accuracy,feed_dict={x_data:x_vals_train, y_traget:np.transpose([y_vals_train])})
train_acc.append(temp_acc_train)
temp_acc_test = sess.run(accuracy,feed_dict={x_data:x_vals_test, y_traget:np.transpose([y_vals_test])})
test_acc.append(temp_acc_test)
绘制损失和准确度
plt.plot(loss_vec,'k-')
plt.title('Cross Entropy Loss per Generation')
plt.show()
plt.plot(train_acc,'k-',label = 'Train Set Accuracy')
plt.plot(test_acc,'r--',label = 'Test Set Accuracy')
plt.title('Train and test accuracy')
plt.legend(loc = 'lower right')
plt.show()