逻辑回归

  • 逻辑回归既可以看做时回归算法,也可以看做是分类算法
  • 通常作为分类算法用,理论上只能解决二分类问题
逻辑函数(sigmoid函数)

σ ( t ) = 1 1 + e − t \sigma(t) = \frac{1}{1+e^{-t}} σ(t)=1+et1

二分类

p ( y = 1 ∣ x , w ) = 1 1 + e − ( w T + b ) p(y=1|x,w) = \frac{1}{1+e^{-(w^T+b)}} p(y=1x,w)=1+e(wT+b)1

p ( y = 0 ∣ x , w ) = 1 1 + e − ( w T + b ) = 1 − p ( y = 1 ∣ x , w ) p(y=0|x,w) = \frac{1}{1+e^{-(w^T+b)}} = 1-p(y=1|x,w) p(y=0x,w)=1+e(wT+b)1=1p(y=1x,w)

两个式子可以合并成:
p ( y ∣ x , w ) = p ( y = 1 ∣ x , w ) y [ 1 − p ( y = 1 ∣ x , w ) ] 1 − y p(y|x,w) = p(y=1|x,w)^y[1-p(y=1|x,w)]^{1-y} p(yx,w)=p(y=1x,w)y[1p(y=1x,w)]1y

损失函数

c o s t = { − l o g ( p ^ ) i f y = 1 − l o g ( 1 − p ^ ) i f y = 0 cost = \begin{cases}-log(\hat{p}) &if &y=1 \\ -log(1-\hat{p}) & if & y=0\end{cases} cost={log(p^)log(1p^)ifify=1y=0

也可写成:
c o s t = − y l o g ( p ^ ) − ( 1 − y ) l o g ( 1 − p ^ ) cost = -ylog(\hat{p}) - (1-y)log(1-\hat{p}) cost=ylog(p^)(1y)log(1p^)

梯度下降
  • 代价函数

J ( θ ) = 1 m ∑ 1 m c o s t = − 1 m ∑ i = 1 m ( y ( i ) l o g ( σ ( X b ( i ) θ ) ) + ( 1 − y ( i ) ) l o g ( 1 − σ ( X b ( i ) θ ) ) ) J(\theta) = \frac{1}{m}\sum_{1}^{m}cost=-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}log(\sigma(X_b^{(i)}\theta))+(1-y^{(i)})log(1-\sigma(X_b^{(i)}\theta))) J(θ)=m11mcost=m1i=1my(i)log(σ(Xb(i)θ))+(1y(i))log(1σ(Xb(i)θ))

  • 梯度

J ( θ ) θ j = 1 m ∑ i = 1 m ( σ ( X b ( i ) θ ) − y ( i ) ) X j ( i ) \frac{J(\theta)}{\theta_j} = \frac{1}{m}\sum_{i=1}^{m}(\sigma(X_b^{(i)}\theta)-y^{(i)})X_j^{(i)} θjJ(θ)=m1i=1m(σ(Xb(i)θ)y(i))Xj(i)

J ( θ ) θ = 1 m X b T ( σ ( X b θ ) − y ) \frac{J(\theta)}{\theta}=\frac{1}{m}{X_b}^T(\sigma(X_b\theta)-y) θJ(θ)=m1XbT(σ(Xbθ)y)

代码实现
  • 自定义

采用梯度下降法

# coding=utf-8

import numpy as np

from sklearn.metrics import accuracy_score


class LogisticRegression:

    def __init__(self):
        self.coef_ = None
        self.intercept_ = None
        self.theta_ = None
    def _sigma(self, t):
        return 1/(1+np.exp(-t))

    def _J(self, theta, X_b, y_train):
        y_hat = self._sigma(X_b.dot(theta))
        return -np.sum(y_train*np.log(y_hat)+(1-y_train)*np.log(1-y_hat))/len(y_train)

    def _dJ(self, theta, X_b, y_train):
        return X_b.T.dot(self._sigma(X_b.dot(theta))-y_train)/len(y_train)

    def fit(self, X_train, y_train, alpha=0.01, cycle_index=1e4, interv=1e-8):
        start_index = 0
        X_b = np.hstack([np.ones((len(X_train), 1)), X_train])
        theta = np.zeros(X_b.shape[1])
        while start_index < cycle_index:
            last_theta = theta
            theta = theta - alpha * self._dJ(theta, X_b, y_train)
            if abs(self._J(theta, X_b, y_train) - self._J(last_theta, X_b, y_train)) < interv:
                break
            start_index += 1
        self.theta_ = theta
        self.coef_ = theta[1:]
        self.intercept_ = theta[0]
        return self

    def _predict(self, X_test):
        X_b = np.hstack([np.ones((len(X_test), 1)), X_test])
        return self._sigma(X_b.dot(self.theta_))

    def predict(self, X_test):
        prabl = self._predict(X_test)
        return np.array(prabl >= 0.5, dtype=int)

    def score(self, X_test, y_test):
        return accuracy_score(y_test, self.predict(X_test))

    def __repr__(self):
        return 'This Is LogisticRegression'

  • sk-learn实现
from sklearn.linear_model import LogisticRegression
"""
	multi_class取值:
	auto=ovr
	ovr:多分类OvR
	multinomial:多分类OvO
	
"""
log_reg = LogisticRegression(solver='lbfgs',multi_class='auto')
log_reg.fit(X_train, y_train)
解决多分类
OvR

One vs Rest,每次分为某一类和其他类,n个类别,就要进行n次分类

from sklearn.multiclass import OneVsRestClassifier

ovr = OneVsRestClassifier(分类器)
ovr.fit(X_train, y_train)
ovr.score(X_test, y_test)
OvO

One vs One,两两排列组合,投票,选取票数最高的类别

from sklearn.multiclass import OneVsOneClassifier

ovr = OneVsOneClassifier(分类器)
ovr.fit(X_train, y_train)
ovr.score(X_test, y_test)
逻辑回归中的模型正则化
  • L1正则化

C ⋅ J ( θ ) + L 1 C·J(\theta)+L1 CJ(θ)+L1

  • L2正则化

C ⋅ J ( θ ) + L 2 C·J(\theta)+L2 CJ(θ)+L2

"""
	sk-leran逻辑回归的默认参数如下,penalty代表正则项方式,默认是l2正则化
"""
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值