（二）SVM项目：客户购买预测数据

最新推荐文章于 2024-06-19 23:40:42 发布

高级生信攻橙诗~

最新推荐文章于 2024-06-19 23:40:42 发布

阅读量1.7k

点赞数

分类专栏：机器学习项目文章标签：机器学习 python svm

本文链接：https://blog.csdn.net/ccc_bioinfo/article/details/108979680

版权

机器学习项目专栏收录该内容

5 篇文章 1 订阅

订阅专栏

【问题描述】

根据一个客户的性别，年龄，和工资的数据，可以预测他是否会购买你的产品。

根据客户的信息，预测他是否倾向于购买我们的产品。根据预测结果进行不同的处理，从而减少公司的推广成本。

数据说明
数据集是csv文件。
User ID --用户ID
Gender --用户性别
Age --用户年龄
EstimatedSalary --估计薪资
Purchased --是否购买（布尔量）

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

path=path='D:/data_analysis/jupyter_notebook/machine_learning/dataset/Social_Network_Ads.csv'
dataset=pd.read_csv(path)
dataset

在这里插入图片描述

400 rows × 5 columns

x=dataset.iloc[:,[2,3]].values#年龄、薪资
y=dataset.iloc[:,4].values#是否购买（1/0）

#切分训练集和测试集
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25,random_state=0)

数据预处理
svm的归一化方法
在这里插入图片描述

#数据预处理-归一化：生成的数据均值为0；方差为1
#归一化可以加速梯度下降，提高训练和预测的精度
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_test)

尝试用多种算法进行预测

#对数据进行训练
from sklearn.svm import SVC
classifier=SVC(random_state=0)
#classifier = SVC(random_state = 0, kernel="linear")
#classifier = SVC(random_state = 0, kernel="sigmoid")
#classifier = SVC(random_state = 0, kernel="poly", degree=6)
classifier.fit(x_train,y_train)

#预测结果
y_pred=classifier.predict(x_test)

评价结果

#生成混淆矩阵，是对分类结果的评价
from sklearn.metrics import confusion_matrix
cm=confusion_matrix(y_test,y_pred)
cm
#[[TP,FN],
# [FP,TF]]

array([[64, 4], [ 3, 29]], dtype=int64)

结果可视化

#将训练集的分类结果可视化
from matplotlib.colors import ListedColormap
x_set,y_set=x_train,y_train
x1,x2=np.meshgrid(np.arange(start=x_set[:,0].min()-1,stop=x_set[:,0].max()+1,step=0.01),
                 np.arange(start=x_set[:,1].min()-1,stop=x_set[:,1].max()+1, step=0.01))
#contourf:绘制轮廓并填充轮廓
#ravel()将维，x1\x2经过meshgrid成为了多维的数组，需要拉成一维才可以进行classifier的预测，但由于画图，还需要reshape成网格
plt.contourf(x1,x2,classifier.predict(np.array([x1.ravel(),x2.ravel()]).T).reshape(x1.shape),
             alpha=0.75,cmap=ListedColormap(('red','green')))
#上面的代码是把红色和绿色的区域画出来，下面的代码是把点画出来
plt.xlim(x1.min(),x1.max())
plt.ylim(x2.min(),x2.max())
#unique：去重并排序
#先画红点，再画绿点
for i,j in enumerate(np.unique(y_set)):
    plt.scatter(x_set[y_set==j,0],x_set[y_set==j,1],color=ListedColormap(('darkred','darkgreen'))(i),label=j)
plt.title('Logistic Regression(Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()