最近想学习下机器学习的思想,经过前后2周的准备终于搞懂了逻辑回归算法的原理。
用python实现了一个简陋版本。代码如下
#!/bin/python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import math
def sigmoid_func(X, theta):
return 1.0 / (1.0 + np.exp(-np.dot(X, theta)))
def grad_func(theta, X, y):
m = X_train.shape[0]
return (1.0/m) * np.dot(X.T, (y - sigmoid_func(X, theta)))
def cost_func(X, theta, y):
sig = sigmoid_func(X, theta)
m = X.shape[0]
v = -y.T * np.log(sig) - (1 - y.T) * np.log(1 - sig)
return v/m
if __name__ == '__main__':
err = None
pre_err = None
num_err = None
for i in range(100000):
ret = grad_func(theta, X_train, np.mat(y_train).T)
theta += 0.01 * ret
err = cost_func(np.mat(X_train), theta, np.mat(y_train).T)
if i % 1000 == 0:
print err
print num_err
if not pre_err is None:
num_err = (pre_err - err).max()
if abs(num_err) < 1e-5:
break
pre_err = err
print theta
代码写的比较简陋,就当学习了。跑了一下数据,然后对比了一下sklearn的逻辑回归算法,差距还是挺大的。
[[-4.77225259] [-0.84604906] [ 0.19146712] [ 1.15061542]]
[[-5.52189559] [-0.17054912 0.11048042 2.0367388 ]]
上面是笔者的拟合结果,后者是sklearn的拟合结果。差距有点大。
所以机器学习的代码实现并不难,难的是调整参数得到最优的拟合效果。