最近一直在外面,李航那本书没带在身上,所以那本书的算法实现估计要拖后了。
这几天在看Andrew Ng 机器学习的课程视频,正好看到了Softmax分类器那块,发现自己之前理解perceptron与logistic regression是有问题的。这两个算法真正核心的不同在于其分类函数的不同,perceptron采用一个分段函数作为分类器,logistic regression采用sigmod函数作为分类器,这才是这两个函数真正的不同。
废话不多说了,今天打算实现softmax分类器。
算法
算法参考的是Andrew 的课件与这篇文章。
具体实现的时候发现加入权重衰减效果会更好。
这里为了防止大家看不懂我的程序,我在这里做一些定义
数据集
数据集和KNN那个博文用的是同样的数据集。
数据地址:https://github.com/WenDesi/lihang_book_algorithm/blob/master/data/train.csv
特征
将整个图作为特征
代码
代码已上传GitHub
这次的代码是python3的,有可能需要稍微改一改,不好意思了,我要背叛python2了。
import math
import pandas as pd
import numpy as np
import random
import time
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
class Softmax(object):
def __init__(self):
self.learning_step = 0.000001
self.max_iteration = 100000
self.weight_lambda = 0.01
def cal_e(self,x,l):
'''
计算博客中的公式3
'''
theta_l = self.w[l]
product = np.dot(theta_l,x)
return math.exp(product)
def cal_probability(self,x,j):
'''
计算博客中的公式2
'''
molecule = self.cal_e(x,j)
denominator = sum([self.cal_e(x,i) for i in range(self.k)])
return molecule/denominator
def cal_partial_derivative(self,x,y,j):
'''
计算博客中的公式1
'''
first = int(y==j)
second = self.cal_probability(x,j)
return -x*(first-second) + self.weight_lambda*self.w[j]
def predict_(self, x):
result = np.dot(self.w,x)
row, column = result.shape
_positon = np.argmax(result)
m, n = divmod(_positon, column)
return m
def train(self, features, labels):
self.k = len(set(labels))
self.w = np.zeros((self.k,len(features[0])+1))
time = 0
while time < self.max_iteration:
print('loop %d' % time)
time += 1
index = random.randint(0, len(labels) - 1)
x = features[index]
y = labels[index]
x = list(x)
x.append(1.0)
x = np.array(x)
derivatives = [self.cal_partial_derivative(x,y,j) for j in range(self.k)]
for j in range(self.k):
self.w[j] -= self.learning_step * derivatives[j]
def predict(self,features):
labels = []
for feature in features:
x = list(feature)
x.append(1)
x = np.matrix(x)
x = np.transpose(x)
labels.append(self.predict_(x))
return labels
if __name__ == '__main__':
print('Start read data')
time_1 = time.time()
raw_data = pd.read_csv('../data/train.csv', header=0)
data = raw_data.values
imgs = data[0::, 1::]
labels = data[::, 0]
train_features, test_features, train_labels, test_labels = train_test_split(
imgs, labels, test_size=0.33, random_state=23323)
time_2 = time.time()
print('read data cost '+ str(time_2 - time_1)+' second')
print('Start training')
p = Softmax()
p.train(train_features, train_labels)
time_3 = time.time()
print('training cost '+ str(time_3 - time_2)+' second')
print('Start predicting')
test_predict = p.predict(test_features)
time_4 = time.time()
print('predicting cost ' + str(time_4 - time_3) +' second')
score = accuracy_score(test_labels, test_predict)
print("The accruacy socre is " + str(score))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
运行结果
速度挺快,正确率一般吧,比决策树之类的要高。