任务三:使用svm模型,做信贷预测分类

前述

任务一任务二中,分别对模型建立前的数据清理、归一化、特征值选取等做了学习。在本篇博客中,主要是通过svm算法,结合前两篇博客,实现对信贷数据分类预测的模型的简历。

代码实现

直接贴代码吧

import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, r2_score
from sklearn.svm import SVC

data = pd.read_csv("D://project//金融数据分析//data.csv", encoding='gbk')
# 获取分类
y = data['status']
x = data.drop('status', axis=1)
# 删除无用的列
x.drop(['custid', 'trade_no', 'bank_card_no', 'id_name'], axis=1)
# 缺失值/时间等字段值处理
x['first_transaction_time_year'] = pd.to_datetime(data['first_transaction_time']).dt.year
x['first_transaction_time_month'] = pd.to_datetime(data['first_transaction_time']).dt.month
x['first_transaction_time_day'] = pd.to_datetime(data['first_transaction_time']).dt.day
x['latest_query_time_year'] = pd.to_datetime(data['latest_query_time']).dt.year
x['latest_query_time_month'] = pd.to_datetime(data['latest_query_time']).dt.month
x['latest_query_time_day'] = pd.to_datetime(data['latest_query_time']).dt.day
x['loans_latest_time_year'] = pd.to_datetime(data['loans_latest_time']).dt.year
x['loans_latest_time_month'] = pd.to_datetime(data['loans_latest_time']).dt.month
x['loans_latest_time_day'] = pd.to_datetime(data['loans_latest_time']).dt.day
x.fillna(x.median(), inplace=True)
# 删除原有时间字段
x.drop(["first_transaction_time", "latest_query_time", "loans_latest_time", "id_name"], axis=1, inplace=True)

for cl in x.columns:
    count = x[cl].count()
    if len(list(x[cl].unique())) in [1, count, count - 1]:
        x.drop(cl, axis=1, inplace=True)

# 城市处理
n = set(x['reg_preference_for_trad'])
dic = {}
for i, j in enumerate(n):
    dic[j] = i
x['reg_preference_for_trad'] = x['reg_preference_for_trad'].map(dic)
x['student_feature'] = x['student_feature'].fillna(0)  # 学生字段补0

x.fillna(x.median(), inplace=True)  # 其余空字段,设置为中位数
# 数据标准化处理
data_zs = 1.0 * (x - x.mean()) / x.std()
print(data_zs.info())

# 随机森林:选取特征值
feat_lables = x.columns
forest = RandomForestClassifier(n_estimators=10000, random_state=0, n_jobs=1)
forest.fit(x, y)
importance = forest.feature_importances_
imp_result = np.argsort(importance)[::-1]

for i in range(x.shape[1]):
    print("%2d. %-*s %f" % (i + 1, 30, feat_lables[i], importance[imp_result[i]]))

# 设定阈值为0.01
threshold = 0.01
# 移除阈值小于0.01的列
drop_data_index = list(x.columns[importance < threshold])
x.drop(drop_data_index, axis=1, inplace=True)
print('清理后的数据列为:')
print(x.info())

# 使用svm模型
fit_data = x[0:3000]
fit_tag = y[0:3000]
clf = SVC()
model = clf.fit(fit_data, fit_tag.ravel())

test_data = x[3000:4755]
test_tag = y[3000:4755]

predict_rst = model.predict(test_data)

print('svm分类器的准确率为:', model.score(test_data, test_tag))
print('f1_score准确率:', f1_score(test_tag, predict_rst))
print('r2_score准确率:', r2_score(test_tag, predict_rst))

结果为:
svm分类器的准确率为: 0.7354618015963512
f1_score准确率: 0.0
r2_score准确率: -0.3596899224806205

结论

预测准确率为0.735…
f1_score的分数为0,r2_score的分数是负的。。这个是啥情况。接下来去研究研究

未完待续…

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值