《统计学习方法》第二章：感知机 ——python实现

最新推荐文章于 2021-03-20 22:24:22 发布

菜小白—NLP

最新推荐文章于 2021-03-20 22:24:22 发布

阅读量241

点赞数

本文链接：https://blog.csdn.net/ACM_hades/article/details/89501144

版权

参考链接

感知机理论推导:https://blog.csdn.net/ACM_hades/article/details/89496175
数据链接:https://github.com/WenDesi/lihang_book_algorithm/blob/master/data

代码

数据集：我们选择MNIST数据集进行实验，它包含各种手写数字(0-9)图片，图片大小28*28。MNIST数据集本身有10个类别，为了将其变成二分类问题我们进行如下处理：label等于0的继续等于0，label大于0改为1。这样就将十分类的数据改为二分类的数据。
特征选择：可选择的特征有很多，包括：
- 自己提取特征
- 将整个图片作为特征向量
- HOG特征
我们选择HOG特征(324)和将整个图片作为特征(784=28×28)。

代码

import pandas as pd
import numpy as np
import random
import time
from sklearn.model_selection  import train_test_split
from sklearn.metrics import accuracy_score

# 利用opencv获取图像hog特征
def get_hog_features(trainset):
    features = []
    hog = cv2.HOGDescriptor('../hog.xml')
    for img in trainset:
        img = np.reshape(img,(28,28))
        cv_img = img.astype(np.uint8)
        hog_feature = hog.compute(cv_img)
        features.append(hog_feature)
    features = np.array(features)
    features = np.reshape(features,(-1,324))
    return features

#感知机模型
class Perceptron(object):
    def __init__(self):
        self.learning_step = 0.00001
        self.max_iteration = 5000

    def model_function(self, x):
        wx = x.dot(self.w)
        return np.sign(wx)

    def train(self, features, labels):
        self.w = np.zeros(len(features[0]) + 1,dtype=np.float32)#将b并入到w中
        correct_count = 0
        time = 0
        while time < self.max_iteration:
            index = random.randint(0, len(labels) - 1)#随机选择一个样本进行梯度下降
            x = features[index]
            x=np.append(x,1.0)#参数b的系数
            y=labels[index]
            pred=self.model_function(x)

            if y * pred > 0:#样本分类正确
                correct_count += 1
                if correct_count > self.max_iteration:
                    break
                continue
            #更新
            self.w+=self.learning_step * y * x

    def predict(self,features):
        labels = []
        for feature in features:
            x = np.append(feature, 1.0)
            labels.append(self.model_function(x))
        return labels


if __name__ == '__main__':

    print ('Start read data')
    S = time.time()
    raw_data = pd.read_csv('../data/train_binary.csv')#读取数据
    data = raw_data.values#获取数据
    print("data shape:",data.shape)
    imgs = data[0:, 1:]
    labels = data[:, 0]
    
	#imgs = get_hog_features(imgs)  # 图片HOG特征（使用HOG特征就打开它）
    print("imgs shape:", imgs.shape)
    print("labels shape:", labels.shape)

    # 选取 2/3 数据作为训练集， 1/3 数据作为测试集
    train_features, test_features, train_labels, test_labels = train_test_split(
        imgs, labels, test_size=0.33, random_state=23323)
    train_labels=2*train_labels-1#将0/1转变为-1/+1
    test_labels=2*test_labels-1
    print("train data count ：%d"%len(train_labels))
    print("test data count ：%d"%len(test_labels))
    print ('read data cost ', time.time() - S, ' second')

    print ('Start training')
    S = time.time()
    p = Perceptron()
    p.train(train_features, train_labels)
    print( 'training cost ', time.time() - S, ' second')

    print('Start predicting')
    S = time.time()
    test_predict = p.predict(test_features)
    print('predicting cost ', time.time() - S, ' second')

    score = accuracy_score(test_labels, test_predict)
    print( "The accruacy socre is ", score)

输出：
		图片HOG特征：
				Start read data
				data shape: (42000, 785)
				imgs shape: (42000, 324)
				labels shape: (42000,)
				train data count ：28140
				test data count ：13860
				read data cost  5.35866117477417  second
				Start training
				training cost  0.07878541946411133  second
				Start predicting
				predicting cost  0.12164664268493652  second
				The accruacy socre is  0.9935786435786436	
	源图片特征：
			Start read data
			data shape: (42000, 785)
			imgs shape: (42000, 784)
			labels shape: (42000,)
			train data count ：28140
			test data count ：13860
			read data cost  3.7569241523742676  second
			Start training
			training cost  0.08876228332519531  second
			Start predicting
			predicting cost  0.12666058540344238  second
			The accruacy socre is  0.9242424242424242