充分利用机器学习、自然语言处理等技术,创新性地量化文本等非结构数据,可以构建金融会计情绪词典、LDA主题建模、计算文本相似度、计算文本可读性、计算特定信息含量(企业文化、数字化转型、企业社会责任、风险、创新、知识等),将经典实证问题推向新的领域。
近年,JFE、RFS、MS、SMJ等经管国际顶刊对文本/图片/音频数据也有典型应用如下:
1.基于年报电话会议测度企业文化(创新、诚信、合作、质量、尊重)
Li K , Mai F , Shen R , et al. Measuring Corporate Culture Using Machine Learning[J]. The Review of Financial Studies, 2021.
Abstract
We create a culture dictionary using one of the latest machine learning techniques—the word embedding model—and 209,480 earnings call transcripts. We score the five corporate cultural values of innovation, integrity, quality, respect, and teamwork for 62,664 firm-year observations over the period 2001–2018. We show that an innovative culture is broader than the usual measures of corporate innovation – R&D expenses and the number of patents. Moreover, we show that corporate culture correlates with business outcomes, including operational efficiency, risk-taking, earnings management, executive compensation design, firm value, and deal making, and that the culture-performance link is more pronounced in bad times. Finally, we present suggestive evidence that corporate culture is shaped by major corporate events, such as mergers and acquisitions.
Keywords
Corporate culture, semisupervised machine learning, earnings calls
2.基于分析师报告测度企业创新
Bellstam G , Bhagat S , Cookson J A . A Text-Based Analysis of Corporate Innovation[J]. Management Science, 2021, 67(7):-.
Abstract
We develop a new measure of innovation using the text of analyst reports of S&P 500 firms. Our text-based measure gives a useful description of innovation by firms with and without patenting and R&D (research and development). For nonpatenting firms, the measure identifies innovative firms that adopt novel technologies and innovative business practices (e.g., Walmart’s cross-geography logistics). For patenting firms, the text-based measure strongly correlates with valuable patents, which likely capture true innovation. The text-based measure robustly forecasts greater firm performance and growth opportunities for up to four years, and these value implications hold just as strongly for innovative nonpatenting firms.
Keywords
Innovation;textual analysis;machine learning;natural language processing;latent Dirchlet allocation
3.利用年报文本测度企业多元化
Jaeho Choi & Anoop Menon & Haris Tabakovic, 2021. "Using machine learning to revisit the diversification–performance relationship," Strategic Management Journal, Wiley Blackwell, vol. 42(9), pages 1632-1661, September.
Research Summary
In this article, we examine the relationship between corporate diversification and firm performance using a machine learning technique called natural language processing (NLP). By applying a widely used NLP technique called topic modeling to unstruc_x0002_tured text from annual reports, we create a new, multi_x0002_dimensional measure that captures the degree of diversification of both multisegment and single-segment firms. Additionally, we introduce a novel method to incorporate human judgments into the interpretation of machine-learned patterns, which allows us to measure diversification across multiple dimensions, such as prod_x0002_ucts and geographies. Finally, we illustrate how these new measures can generate novel insights into the rela_x0002_tionship between the degree and type of diversification and firm performance, furthering our understanding of the diversification–performance relationship.
Managerial Summary
At some point, most firms face dilemmas about whether to diversify their business activities across industries or geographic markets—an important decision that invariably affects firm perfor_x0002_mance. Albeit very important, the direction of a rela_x0002_tionship between diversification and firm performance is not always clear. Inconsistent results of previous studies are partially driven by inherent difficulties in reliably measuring diversification. This study intro_x0002_duces a novel methodology to address that problem: a machine learning-based technique to quantify diversifi_x0002_cation from unstructured corporate annual report texts. An analysis of firm performance based on these novel diversification measures suggests that diversification, in contrast to earlier studies that find a diversification discount, is associated with higher firm value—a pre_x0002_mium particularly pronounced for firms diversifying within a single industry
Keywords
Annual reports, diversification, natural language processing, performance, topic modeling
4.基于业务描述进行基础行业细分
Song S . The Informational Value of Segment Data Disaggregated by Underlying Industry: Evidence from the Textual Features of Business Descriptions[J]. Accounting review, 2021(6):96.
Abstract
I examine a fundamental determinant of disclosure quality: how underlying data are disaggregated. For this, I create a measure of industry disaggregation, which is the extent to which segment disclosures are disaggregated based on underlying industries. To identify underlying industries, I apply a deep learning algorithm that extracts textual features from Item 1 business descriptions, in which firms are required to accurately describe their products and services. Industry disaggregation captures the disclosure of underlying industries and the adherence to industry-based disaggregation criteria. Consistent with capital markets being informationally segmented by industry, I find that industry disaggregation is negatively associated with analyst forecast error and dispersion, and positively associated with analyst following and information transfers among analysts and investors. These findings indicate that financial information is more informative, and thus of higher quality, when disaggregated by standardized criteria that achieve comparability and match the information-processing strategies of capital market participants.
Keywords
disclosure quality, segment reporting, disaggregation, comparability, information-processing costs, textual analysis, machine learning, industries, analysts, information transfers
5.将机器学习引入基于情感的新闻图片分类测度每日市场投资者的悲观情绪指数(Photo Pessimism)
Obaid K , Pukthuanthong K . A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news[J]. Journal of Financial Economics, 2021(4).
Abstract
By applying machine learning to the accurate and cost-effective classification of photos based on sentiment, we introduce a daily market-level investor sentiment index (Photo Pessimism) obtained from a large sample of news photos. Consistent with behavioral models, Photo Pessimism predicts market return reversals and trading volume. The relation is strongest among stocks with high limits to arbitrage and during periods of elevated fear. We examine whether Photo Pessimism and pessimism embedded in news text act as complements or substitutes for each other in predicting stock returns and find evidence that the two are substitutes.
6.基于音乐基调构建实时情绪指标
Edmans A , Fernandez-Perez A , Garel A , et al. Music sentiment and stock returns around the world[J]. Journal of Financial Economics, 2022, 145.
Abstract
This paper introduces a real-time, continuous measure of national sentiment that is language-free and thus comparable globally: the positivity of songs that individuals choose to listen to. This is a direct measure of mood that does not pre-specify certain mood-affecting events nor assume the extent of their impact on investors. We validate our music-based sentiment measure by correlating it with mood swings induced by seasonal factors, weather conditions, and COVID-related restrictions. We find that music sentiment is positively correlated with same-week equity market returns and negatively correlated with next-week returns, consistent with sentiment-induced temporary mispricing. Results also hold under a daily analysis and are stronger when trading restrictions limit arbitrage. Music sentiment also predicts increases in net mutual fund flows, and absolute sentiment precedes a rise in stock market volatility. It is negatively associated with government bond returns, consistent with a flight to safety.