【机器学习算法】之KNN算法的实现

为了加深对机器学习算法的理解,以及熟悉python,pandas,scikit-learn。现在自己实现一下主要的机器学习算法,程序记录如下:

knn类的实现程序:

import numpy as np  
import pandas as pd  
import random as rd  
import csv
from sklearn import preprocessing

class knn:
    def __init__(self, name):
        self.train_file = name
        self.feature=[]
        self.label=[]
    def train(self):
        self.feature,self.label=gen_model(self.train_file)       
    def test(self,x,k):
        result=knn_classify(self.feature,self.label,k,x)
        return result


def knn_classify(train_set,train_label,k,x):
    data_mat=(train_set)
    data_len = len(data_mat)
    diff_mat = (np.tile(x,(data_len,1))-data_mat)**2
    dist=diff_mat.sum(axis=1)
    sorted_idx = dist.argsort()
    label_count={}
    curr_max_vote= 0;curr_max_label=0
    for i in range(k):
        curr_label=label[sorted_idx[i]]
        label_count[curr_label]=label_count.get(curr_label,0)+1
        if(label_count[curr_label] > curr_max_vote):
            curr_max_vote = label_count[curr_label]
            curr_max_label = curr_label
    return curr_max_label

def gen_model(name):
    #df = pd.read_csv('datingTestSet.txt') 
    txt_name = name +".txt"
    csv_name = name + ".csv"  
    file = open(csv_name,'wb') 
    my_save=csv.writer(file) 
    fr = open(txt_name)
    line_mat = fr.readlines()
    data_mat=[]
    for line in line_mat:
        line=line.strip()
        line_list = line.split('\t')
        data_mat.append(line_list)
        my_save.writerow(line_list)
    file.close()
    feature,label=pre_data(csv_name)
    return feature,label


def pre_data(csv_name):
    df=pd.read_csv(csv_name,names=['f1','f2','f3','label'])
    df['label_id']=pd.factorize(df['label'])[0]
    df.drop(['label'],axis=1)

    data_mat = df.values
    train_data = data_mat[:,0:3]
    label = data_mat[:,-1]
    min_max_scaler = preprocessing.MinMaxScaler()
    feature = min_max_scaler.fit_transform(train_data)
    return feature,label

测试程序:

import knn
model = knn('datingTestSet')
model.train()
x=[1,1,1];k=3
res=model.test(x,k)

数据集采用的是《机器学习实战》的数据集,算法原理介绍参见:http://blog.csdn.net/messiran10/article/details/49333641

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值