吴恩达机器学习课后习题ex6支持向量机(python实现)

支持向量机

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.io import loadmat
data1=loadmat('./data/ex6data1.mat')
data=pd.DataFrame(data1['X'],columns=['x1','x2'])
data['y']=data1['y']

positive=data[data['y']==1]
negative=data[data['y']==0]
fig,ax=plt.subplots(figsize=(12,8))
ax.scatter(positive['x1'],positive['x2'],c='r',marker='o')
ax.scatter(negative['x1'],negative['x2'],c='y',marker='x')
ax.legend()
ax.set_xlabel('x1')
ax.set_ylabel('x2')
plt.show()

在这里插入图片描述

from sklearn.svm import SVC
svc1=SVC(C=1,kernel='linear')
svc1.fit(data1['X'],data1['y'].ravel())#拟合
#当前分类器的准确率
svc1.score(data1['X'],data1['y'].ravel())

准确率是0.9803921568627451

def plot_decisionboundary(model):
  x=np.linspace(-0.5,4.5,500)
  y=np.linspace(1.3,5,500)
  xx,yy=np.meshgrid(x,y)
  ##预测值这段代码中ravel函数将多维数组降为一维,仍返回array数组,元素以列排列。之后调用np.c_[]将xx.ravel()得到的列后增加以列yy.ravel()。
  #这时每行元素变为了[[x1,y1],[x2,y2]]
  z=model.predict(np.c_[xx.ravel(),yy.ravel()]) 
  z=z.reshape(xx.shape)
  plt.contour(xx,yy,zz)

positive=data[data['y']==1]
negative=data[data['y']==0]
fig,ax=plt.subplots(figsize=(12,8))
ax.scatter(positive['x1'],positive['x2'],c='r',marker='o')
ax.scatter(negative['x1'],negative['x2'],c='y',marker='x')
ax.legend()
ax.set_xlabel('x1')
ax.set_ylabel('x2')
plot_boundary(svc1)
plt.show()

在这里插入图片描述

垃圾邮件分类

with open('data/emailSample1.txt','r') as f:
  email=f.read()
  print(email)

对邮件进行预处理

Lower-casing: The entire email is converted into lower case, so that captialization is ignored (e.g., IndIcaTE is treated the same as Indicate).
• Stripping HTML: All HTML tags are removed from the emails.Many emails often come with HTML formatting; we remove all the HTML tags, so that only the content remains.
• Normalizing URLs: All URLs are replaced with the text \httpaddr".
• Normalizing Email Addresses: All email addresses are replaced with the text \emailaddr".
• Normalizing Numbers: All numbers are replaced with the text \number".
• Normalizing Dollars: All dollar signs ($) are replaced with the text \dollar".
• Word Stemming: Words are reduced to their stemmed form. For example, \discount", \discounts", \discounted" and \discounting" are all replaced with \discount". Sometimes, the Stemmer actually strips off additional characters from the end, so \include", \includes", \included", and \including" are all replaced with \includ".
• Removal of non-words: Non-words and punctuation have been removed. All white spaces (tabs, newlines, spaces) have all been trimmed to a single space character.

import re
from stemming.porter2 import stem
import nltk, nltk.stem.porter

def prepocess(email):  #字符串
  email=email.lower() #小写
  email=re.sub('<[^<>]>',' ',email) #去掉html
  email=re.sub('(http|https)://[^\s]*','httpaddr',email) #Normalizing URLs
  email=re.sub('[^\s]+@[^\s]+','emailaddr',email) #Normalizing email
  email=re.sub('[\$]+','number',email) #Normalizing Numbers
  email=re.sub('[\d]+','dollar',email)
  return email
#完成后两步预处理
def email2list(email):
   stemmer = nltk.stem.porter.PorterStemmer()
    #先进行之前的预处理
    email=prepocess(email)
    tokens = re.split('[\@\$\/\#\.\-\:\&\*\+\=\[\]\?\!\(\)\{\}\,\'\"\>\_\<\;\%]', email)
    tokenlist=[]
    for token in tokens:
        token=re.sub('[^a-zA-Z0-9]','',token)
        stemmed=stemmer.stem(token) #提取词根
        if not len(token): continue
        tokenlist.append(stemmed)
    return tokenlist #字符串列表
#Vocabulary List
def list2indics(email,vocab):
  tokenlist=email2list(email)
  index=[i for i in range(len(vocab)) if vocab[i] in tokenlist]
  return index
#提取特征,变成0,1格式的向量
def extractfeature(email):
  df=pd.read_table('data/vocab.txt',names=['words'])
  vocab=df.values
  vector=np.zeros(len(vocab))
  index=list2indics(email,vocab)
  for i in index:
    vector[i]=1
 return vector

准备的spamTrain.mat和spamTest.mat数据集是已经经过预处理的0,1向量,可以直接使用SVC函数

train_data=loadmat('data/spamTrain.mat')
train_x=train_data['X']
train_y=train_data['y']
test_data=loadmat('data/spamTest.mat')
test_x=test_data['Xtest']
test_y=test_data['ytest']
model=SVC(kernel='linear')
model.fit(train_x,train_y.ravel())
predtrain=model.score(train_x,train_y)
predtest=model.score(test_x,test_y)
predtrain,predtest

(0.99975, 0.978)

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值