php调用datashap,可解释机器学习的应用-shap值,shapvalue,使用

本文介绍了如何使用PHP调用Datashap库进行可解释机器学习,通过LightGBM分类器建立模型并进行数据预处理。在训练过程中,通过处理特殊字符来解决JSON加载错误。接着,利用SHAP值解释模型,展示了特征的重要性,并通过不同类型的图表(如条形图、散点图和依赖性图)深入理解模型预测的影响因素。
摘要由CSDN通过智能技术生成

1 数据预处理和建模

1.1 加载库和数据预处理

import pandas as pd

import numpy as np

from sklearn.metrics import roc_auc_score, precision_recall_curve, roc_curve, average_precision_score

from sklearn.model_selection import KFold, train_test_split

from lightgbm import LGBMClassifier

import matplotlib.pyplot as pl

import gc # 对已经销毁的对象,Python不会自动释放其占据的内存空间。为了能够充分地利用分配的内存,避免程序跑到一半停止,要时不时地进行内存回收

import shap

file_path = 'D:\\jupyter files\\shap_value_practice_data\\home-credit-default-risk\\'

def build_model_input():

buro_bal = pd.read_csv(file_path + 'bureau_balance.csv')

print('Buro bal shape : ', buro_bal.shape)

print('transform to dummies')

buro_bal = pd.concat([buro_bal, pd.get_dummies(buro_bal.STATUS, prefix='buro_bal_status')], axis=1).drop('STATUS', axis=1)

print('Counting buros')

buro_counts = buro_bal[['SK_ID_BUREAU', 'MONTHS_BALANCE']].groupby('SK_ID_BUREAU').count()

buro_bal['buro_count'] = buro_bal['SK_ID_BUREAU'].map(buro_counts['MONTHS_BALANCE'])

print('averaging buro bal')

avg_buro_bal = buro_bal.groupby('SK_ID_BUREAU').mean()

avg_buro_bal.columns = ['avg_buro_' + f_ for f_ in avg_buro_bal.columns]

del buro_bal

gc.collect()

print('Read Bureau')

buro = pd.read_csv(file_path + 'bureau.csv')

print('Go to dummies')

buro_credit_active_dum = pd.get_dummies(buro.CREDIT_ACTIVE, prefix='ca_')

buro_credit_currency_dum = pd.get_dummies(buro.CREDIT_CURRENCY, prefix='cu_')

buro_credit_type_dum = pd.get_dummies(buro.CREDIT_TYPE, prefix='ty_')

buro_full = pd.concat([buro, buro_credit_active_dum, buro_credit_currency_dum, buro_credit_type_dum], axis=1)

# buro_full.columns = ['buro_' + f_ for f_ in buro_full.columns]

del buro_credit_active_dum, buro_credit_currency_dum, buro_credit_type_dum

gc.collect()

print('Merge with buro avg')

buro_full = buro_full.merge(right=avg_buro_bal.reset_index(), how='left', on='SK_ID_BUREAU', suffixes=('', '_bur_bal'))

print('Counting buro per SK_ID_CURR')

nb_bureau_per_curr = buro_full[['SK_ID_CURR', 'SK_ID_BUREAU']].groupby('SK_ID_CURR').count()

buro_full['SK_ID_BUREAU'] = buro_full['SK_ID_CURR'].map(nb_bureau_per_curr['SK_ID_BUREAU'])

print('Averaging bureau')

avg_buro = buro_full.groupby('SK_ID_CURR').mean()

print(avg_buro.head())

del buro, buro_full

gc.collect()

print('Read prev')

prev = pd.read_csv(file_path + 'previous_application.csv')

prev_cat_features = [

f_ for f_ in prev.columns if prev[f_].dtype == 'object'

]

print('Go to dummies')

prev_dum = pd.DataFrame()

for f_ in prev

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值