GARP翻译:信用风险评估——基于文本分析的好处

英文正本:

Credit Risk Assessment: The Benefits of Text-Based Analysis

Can credit risk models and loan appraisals be improved through text mining? For financial institutions that know which words, phrases and clauses to search for, the answer is a qualified yes. Judging from this article, would you give me a loan?


A few years ago, I was lucky to hire an excellent summer intern from a leading economics PhD program in Europe. At the time, Lending Club made their historical performance data public and they included in the file a brief written request (likely penned by the prospective borrower), urging investors to fund their loan. I asked my intern to explore whether a quantitative treatment of the text would be useful in assessing the subsequent credit risk of the observed consumers.

Cutting a long story short, it was.

We explored several different words that you would expect people to use when requesting loans through a social lending platform. We found that words, phrases and clauses suggesting human capital increasing activities (e.g., “my daughter is getting married”) tended to clearly reduce credit risk. The use of words indicating “desperation” or “panic,” meanwhile, increased observed default rates sharply, even after controlling for credit scores and other commonly reported borrower attributes.

Some results were rather sad – health problems increase the likelihood of default – and some were uplifting. We found, for example, that people who were newly unemployed, but borrowing to do something about it, were more likely to remain current on their obligations than those in the baseline group.

We also found that people who write at a higher level, as measured by the Flesch-Kincaid scale, defaulted at a lower rate than those with poorer writing skills.

Text-Mining Do’s and Don’ts

The upshot of all this is that text-based analysis holds a lot of potential to improve the quality of credit risk models in common use.

In our analysis, one key point to note is that the text could be directly associated with the motivations of the borrower in seeking a loan. (We assumed, of course, that the requests were made honestly.) Had the text been on an irrelevant topic – like the historical significance of Hannibal’s campaigns on the Italian peninsula – the text mining would likely have proved ineffective. We might have been able to deduce the applicant’s level of education from such a screed, but little else of any possible relevance for credit assessment.

The most available source of a potential borrower’s writing, of course, is social media, but we would argue that this is not the route that lenders should pursue when making underwriting decisions. While it may be possible to scrape these communication channels for clues as to a person’s core creditworthiness – whether they have strong societal bonds or family connections, for example – these are likely to be less useful than the mission-critical paragraphs we were able to access in the Lending Club data.

Tying credit availability to social media activity would also change the nature of online society, triggering what statisticians call a “Hawthorne Effect.” Who, after all, would be willing to share the cute cat video with their friends if doing so would increase their mortgage payment by $50 a month?

Impact on Commercial Credit, Privately-Held Businesses and B2Cs

One area where text mining will be more useful, and its use less potentially corrosive, is in commercial credit. One can reasonably suppose that any document or electronic communication made public by a company would be relevant to its business success and pertinent in a full consideration of the institution’s credit risk. This is certainly true of official public filings, like those made to the SEC, but it is also true of every communication that defines a company’s public face.

For large public companies, the relevance of published documents to performance is likely, in most cases, to affect stock and bond prices. Existing credit models based on observed financial market data will quickly reflect the changed circumstances caused by the damaging (or helpful) text.

Where text mining is likely to be more effective is for smaller, privately-held businesses whose market value is opaque and whose financial statements are not always available. It is in more rarefied data environments, like this one, that non-traditional sources are most highly prized. The smaller the enterprise, the more valuable text-based data are ultimately likely to be.

In addition, text-mining assessments of B2C businesses are likely to be more fruitful than those performed for B2Bs. Consumers may react to a scandal in an emotional or political manner, punishing the business even if they previously derived value from buying the product on offer. In the business world, on the other hand, one suspects that corporations will take a purely pragmatic approach and continue to use a supplier – unless doing so directly harms their bottom line or reputation.

Parting Thoughts

When pondering the overall impact of text mining, we need to recognize that text-based AI algorithms may have a limited shelf life in the lending business, especially if they are built using chance-based, data-mined correlations. If, for example, more commas on a webpage is one day associated with higher credit risk, you can bet the house that companies will instantly drop them.

When using any AI-based tool, it is critical to ensure that the metrics used as inputs actually do pertain to the underlying creditworthiness of the target institution.

By the way, this article scores a 45.1 on the Flesch-Kincaid scale, meaning it is aimed at a college-educated readership. Can I have my loan now?

Tony Hughes is a managing director of economic research and credit analytics at Moody’s Analytics. His work over the past 15 years has spanned the world of financial risk modeling, from corporate and retail exposures to deposits and revenues. He has also engaged in forecasting of asset prices and general macroeconomic analysis.


信用风险评估——基于文本分析的好处

  • 梗概:信用风险模型以及贷款评估能否通过文本挖掘来提高?对于金融机构来说,他们很清楚搜寻那些子句意味着合格。以本文来作为判断依据,你是否会给我贷款?

  • 一些年之前,我从领先的欧洲经济学博士项目有幸招募了一个异常出色的实习生。那时候,借贷者公开他们的历史表现数据,并在文件中附上了一个简要的申请书(可能是潜在的借款人书写的),促使投资者满足他们的贷款。我要求我的实习生探索这么一个问题,针对文本的定量分析对于接下来的信用风险评估是否有益处?
  • 结果是的确是有效的
  • 我们发现许许多多不同的词语,你希望人们在社会借贷平台申请贷款中使用。我们发现那些字词句支撑着人类资本增长的活动(比如,我的女儿正要结婚)能够大幅度的减少违约风险。使用表示绝望或者恐慌的词句,也意味着违约率大幅度的上升,即使在控制了信用分数以及一般情况下报告了借贷者的特征之后。
  • 一些结果异常令人失望 - 健康问题提升了违约的风险 - 另一些却令人振奋。我们发现,举例来说,那些最新的未就职人员,总是想要借钱做些什么,因此相比于基准的情况下,他们更易于维持他们的义务责任。
  • 我们同时也发现,那些写作水准程度更高的人的违约率相比写作水准更低的人,违约率要来得低一些

文本挖掘的优与劣

  • 基于文本的分析有许多潜在的优势能够提升信用风险模型在一般应用时的质量
  • 在我们的分析中,一个关键之处在于,这些文本直接与借款人寻找贷款的目的动机相关联。(我们假设,这些申请是诚实书写的)。这些文本是否都是关于不相干的话题,比如曾经在某战役的优异表现,那么基于此的文本分析被证明为不有效。我们可能不得不从冗长的文章中去推断申请人的教育背景,对于信用分析有用的的相关信息却少之又少。
  • 最能够获得的潜在借款人文本的当然是社交媒体,但是我们并非是说借贷人应该追求基于此来做是否借贷的决定。然而通过这些沟通的通道,也许能够找出关于一个申请人核心信用程度得某些线索----比如借款人是否有强劲的社会关系或者家庭联系,这些内容可能并不比我们再借贷俱乐部数据库中找到的关键段落来得有用。
  • 试图从社交媒体行为来评估信用,将会改变网络社交的性质,能够触发引起统计学中的“霍桑效应”

“霍桑效应”就是当人们在意识到自己正在被关注或者观察的时候,会刻意去改变一些行为或者是言语表达的效应。

  • 最终的结果是,谁也不愿意多支付50美金一个月的按揭贷款只是因为分享了一段可爱猫咪的视频

对商业信贷、私营业务以及B2C的影响

  • 更能体现文本挖掘优势的领域,也更能减少潜在应用危害影响,是商业信贷。我们有理由认为,任何公司的公开的文档以及电子通信都与公司的成功以及组织整体的信用风险相关。这个一定是正确的,尤其是在官方要求的公开文件,例如提交给SEC的文件,同时,每一次对外沟通也定义者这家企业的公共形象。
  • 对于大型上市公司来说,业绩表现相关的公开文件,大多数情况下,回直接影响公司股价以及债券价格,现存的信用模型都是基于金融市场数据,并且会非常迅速得反应环境情况的变化,尤其是有害或者有意的文本。
  • 对于文本挖掘来说,对于小型私营业务来说是更有效的,这些企业的市场价值是不透明的,财务表表也是不一定能够获得。在如此数据稀缺的环境情况下,非传统渠道就会显得更加重要,企业的规模越小,基于文本的数据就显得越加珍贵。
  • 此外,基于文本评估的B2C业务要比B2B业务的评估效果要好。消费者很有可能对一件丑闻采取更加情绪化以及政治性的行为,即使从之前购买的公司产品中获得了价值,依然试图惩罚企业。在商业的世界中,从另一方面说,我们有理由相信,一家企业会采用务实的方式,持续使用一家供应商,除非这么做会损害他们的声誉以及伤害他们的底线。

个人观点

  • 当我们斟酌文本挖掘的整体影响时,我们需要认识到基于文本的AI算法在借贷领域是有保质期的,尤其是基于概率、数据相关性构建的算法尤其如此。如果,打个比方,如果网页上的逗号越多则意味着更高的信用风险,那么你可以肯定,公司一定会立即放弃他们。
  • 当我们使用基于AI的工具时,我们使用输入的矩阵,能够确实的与目标机构的信用程度相关联,这一点非常关键。
  • Flesch-Kincaid scale

measures of readability,they are used by the United States military to evaluate the readability of their manuals

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值