《统计学习方法》第二章:感知机 ——python实现

参考链接

感知机理论推导:https://blog.csdn.net/ACM_hades/article/details/89496175
数据链接:https://github.com/WenDesi/lihang_book_algorithm/blob/master/data

代码

  • 数据集:我们选择MNIST数据集进行实验,它包含各种手写数字(0-9)图片,图片大小28*28。MNIST数据集本身有10个类别,为了将其变成二分类问题我们进行如下处理:label等于0的继续等于0,label大于0改为1。这样就将十分类的数据改为二分类的数据。
  • 特征选择:可选择的特征有很多,包括:
    • 自己提取特征
    • 将整个图片作为特征向量
    • HOG特征
  • 我们选择HOG特征(324)和将整个图片作为特征(784=28×28)。

代码

import pandas as pd
import numpy as np
import random
import time
from sklearn.model_selection  import train_test_split
from sklearn.metrics import accuracy_score

# 利用opencv获取图像hog特征
def get_hog_features(trainset):
    features = []
    hog = cv2.HOGDescriptor('../hog.xml')
    for img in trainset:
        img = np.reshape(img,(28,28))
        cv_img = img.astype(np.uint8)
        hog_feature = hog.compute(cv_img)
        features.append(hog_feature)
    features = np.array(features)
    features = np.reshape(features,(-1,324))
    return features

#感知机模型
class Perceptron(object):
    def __init__(self):
        self.learning_step = 0.00001
        self.max_iteration = 5000

    def model_function(self, x):
        wx = x.dot(self.w)
        return np.sign(wx)

    def train(self, features, labels):
        self.w = np.zeros(len(features[0]) + 1,dtype=np.float32)#将b并入到w中
        correct_count = 0
        time = 0
        while time < self.max_iteration:
            index = random.randint(0, len(labels) - 1)#随机选择一个样本进行梯度下降
            x = features[index]
            x=np.append(x,1.0)#参数b的系数
            y=labels[index]
            pred=self.model_function(x)

            if y * pred > 0:#样本分类正确
                correct_count += 1
                if correct_count > self.max_iteration:
                    break
                continue
            #更新
            self.w+=self.learning_step * y * x

    def predict(self,features):
        labels = []
        for feature in features:
            x = np.append(feature, 1.0)
            labels.append(self.model_function(x))
        return labels


if __name__ == '__main__':

    print ('Start read data')
    S = time.time()
    raw_data = pd.read_csv('../data/train_binary.csv')#读取数据
    data = raw_data.values#获取数据
    print("data shape:",data.shape)
    imgs = data[0:, 1:]
    labels = data[:, 0]
    
	#imgs = get_hog_features(imgs)  # 图片HOG特征(使用HOG特征就打开它)
    print("imgs shape:", imgs.shape)
    print("labels shape:", labels.shape)

    # 选取 2/3 数据作为训练集, 1/3 数据作为测试集
    train_features, test_features, train_labels, test_labels = train_test_split(
        imgs, labels, test_size=0.33, random_state=23323)
    train_labels=2*train_labels-1#将0/1转变为-1/+1
    test_labels=2*test_labels-1
    print("train data count :%d"%len(train_labels))
    print("test data count :%d"%len(test_labels))
    print ('read data cost ', time.time() - S, ' second')

    print ('Start training')
    S = time.time()
    p = Perceptron()
    p.train(train_features, train_labels)
    print( 'training cost ', time.time() - S, ' second')

    print('Start predicting')
    S = time.time()
    test_predict = p.predict(test_features)
    print('predicting cost ', time.time() - S, ' second')

    score = accuracy_score(test_labels, test_predict)
    print( "The accruacy socre is ", score)

输出:
		图片HOG特征:
				Start read data
				data shape: (42000, 785)
				imgs shape: (42000, 324)
				labels shape: (42000,)
				train data count :28140
				test data count :13860
				read data cost  5.35866117477417  second
				Start training
				training cost  0.07878541946411133  second
				Start predicting
				predicting cost  0.12164664268493652  second
				The accruacy socre is  0.9935786435786436	
	源图片特征:
			Start read data
			data shape: (42000, 785)
			imgs shape: (42000, 784)
			labels shape: (42000,)
			train data count :28140
			test data count :13860
			read data cost  3.7569241523742676  second
			Start training
			training cost  0.08876228332519531  second
			Start predicting
			predicting cost  0.12666058540344238  second
			The accruacy socre is  0.9242424242424242

API 说明

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值