(论文笔记)碰撞频率建模中机器学习模型的可解释性

Title. Xiao Wen , Yuanchang Xie , Liming Jiang , et alOn the interpretability of machine learning methods in crash frequency modeling and crash modification factor development[J].Accident Analysis and Prevention, 2022,168: 106617.

Abstract:Machine learning (ML) model interpretability has attracted much attention recently given the promising performance of ML methods in crash frequency studies. Extracting accurate relationship between risk factors and crash frequency is important for understanding the causal effects of risk factors and developing safety countermeasures. However, there is no study that comprehensively summarizes ML model interpretation methods and provides guidance for safety researchers and practitioners. This research aims to fill this gap. Model-based and post-hoc ML interpretation methods are critically evaluated and compared to study their suitability in crash frequency modeling. These methods include classification and regression tree (CART), multivariate adaptive regression splines (MARS), Local Interpretable Model-agnostic Explanations (LIME), Local Sensitivity Analysis (LSA), Partial Dependence Plots (PDP), Global Sensitivity Analysis (GSA), and SHapley Additive exPlanations (SHAP). Model-based interpretation methods cannot reveal the detailed interaction relationships among risk factors. LIME can only be used to analyze the effects of a risk factor at the prediction level. LSA and PDP assume that different risk factors are independently distributed. Both GSA and SHAP can account for the potential correlation among risk factors. However, only SHAP can visualize the detailed relationships between crash outcomes and risk factors. This study also demonstrates the potential and benefits of using ML and SHAP to derive Crash Modification Factors (CMF). Finally, it is emphasized that statistical and ML models may not directly differentiate causation from correlation. Understanding the differences between them is critical for developing reliable safety countermeasures.

主要工作

  1. 对基于模型和事后解释方法进行批判性评估和比较,以研究它们在碰撞频率建模中的适用性。
  2. 综述了机器学习在交通安全领域的研究:现有的研究主要集中在比较不同的ML模型来预测事故的频率和严重程度,并确定最准确的模型和重要的因素。

  3. lightgbm交叉验证过拟合的方法

  4. 事后解释性方法分为预测层和数据集层,预测层侧重于风险因素对个体预测的影响,而数据集层侧重于作用于所有的预测(即,整个数据集),预测层的方法为LIME,数据集层的方法有LSA、PDP、GSA、SHAP.

发现:CART和MARS等方法架构简单,可以很清楚的反映各个因素对碰撞事故发生频率的影响,但是预测的效果不够精确,lightgbm算法相比于其他机器学习算法拟合的效果更好,但需要事后解释方法。Shap还可以提供有关风险因素的详细信息,如重要性排序、总效应、主效应和交互作用影响。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

欢仔今天学习了嘛

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值