Loughran&McDonald金融文本情感分析库

今天看到一个预测股价的项目,其中用到pysentiment库对金融文本数据进行情感计算。查了下该库的官方文档,发现该库提供了两大情感分析

  • Harvard IV-4 英文通用情感分析

  • Loughran&MCdonald 英文金融情感分析

pysentiment github地址https://github.com/hanzhichao2000/pysentiment

pysentiment安装

!pip3 install pysentiment

我在安装pysentiment遇到了问题,估计大家也会遇到这个问题,我的解决办法是

  1. 下载本文项目文件夹压缩包(文章末尾有下载链接,记得点赞评论点广告啊?)

  2. 将项目文件夹压缩包解压,解压到桌面。

  3. cmd打开命令行(不懂的百度)

  4. 命令行输入 cd desktop,按 Enter回车键

  5. 命令行输入 cd20191015pysentiment,按 Enter回车键

  6. 命令行输入 cd pysentiment,按 Enter回车键

  7. 命令行输入 python3 setup.py install,按 Enter回车键。有的同学这里如果有问题,可以将python3换成python

pysentiment接口

  • HIV4 英文通用情感分析

  • LM 英文金融领域情感分析

英文通用情感分析

通用情绪的情感分析使用的Harvard IV-4的词库,词库详情可见 http://www.wjh.harvard.edu/~inquirer/

计算说明:

  • Positive正面词词频数

  • Negative负面词词频数

  • Polarity=(Pos-Neg)/(Pos+Neg)

  • Subjectivity=(Pos+Neg)/count(*)

import pysentiment as ps	
#初始化hiv4	
hiv4 = ps.HIV4()	
#待分析文本	
test_text = """Lately, the Indonesian government has unleashed an array of policies that are keeping mining and oil executives awake at night across this vast and geologically rich archipelago. The unpopular new regulations, aimed at reforming the mining and oil industries, are promoted in the name of "national interest." Yet left uncorrected, they will inevitably lead to a dramatic decline of output in Indonesia's extractive industries, damaging foreign investment and economic growth.Particularly hard-hit will be some of Indonesia's less-developed regions such as Kalimantan and Papua, where oil and mining play major economic roles.Equating the government to the Emperor Nero and the local mining industry to ancient Rome," said Bill Sullivan, leading legal consultant for the mining industry in Indonesia, "It is as if Nero is choosing to complacently fiddle while Rome burns.Why exactly this fiddling persists—especially since large investors have alrea"""	
#分词得到词语列表tokens	
words = hiv4.tokenize(test_text)	
#将词语列表words传入hiv4.get_score,得到得分score	
score = hiv4.get_score(words)	
#查看score	
score

Run

{'Positive': 14,	
 'Negative': 10,	
 'Polarity': 0.1666666597222225,	
 'Subjectivity': 0.3287671187840121}

英文金融情感分析

英文金融情感分析使用的Loughran and McDonald的词库,词库详情可见 https://www3.nd.edu/~mcdonald/Word_Lists.html

计算说明:

  • Positive正面词词频数

  • Negative负面词词频数

  • Polarity=(Pos-Neg)/(Pos+Neg)

  • Subjectivity=(Pos+Neg)/count(*)

import pysentiment as ps	
#初始化lm	
lm = ps.LM()	
#待分析文本	
test_text = "Cisco Posts Another Record Quarter With Growth Across All Segments; Raising FVE to $46Cisco's first-quarter results modestly beat our top line and net income expectations while the $0.77 earnings per share exceeded our expected result due to an increased quantity of shares repurchased. The narrow-moat firm posted 8% year-over-year revenue growth, with strength across all the business segments and provided strong guidance for the next quarter. After updating our Cisco forecast to consider stronger growth driven by expected cross selling of multi-cloud environment products, security solutions, and infrastructure hardware, we are raising our fair value estimate to $46 per share from $43. With shares trading around our fair value estimate after hours, we recommend for investors to sustain their Cisco positions.The company guided the second quarter to have a 5%-7% growth over the previous year with 30.5%-31.5% non-GAAP operating margins. Cisco is benefitting from a strong IT spending environment, and we believe that the company's product roadmap has made the it a one-stop-shop for networking environments. Two major recent announcements by Cisco were its integration of security into SD-WAN products and its offering of production grade Kubernetes to be run on premises and then offloaded to Amazon AWS. We like that Cisco is intertwining previously siloed offerings into combined solutions that contain unique selling features. Additionally, having support with all three major hyperscale public cloud providers allows Cisco to be a commonality for IT teams balancing on-premises, private, and public cloud environments. We like that Cisco has completely embraced the cloud as a path to growth instead of a business threat. In our view, Cisco's innovative product portfolio should keep it on the shortlist for enterprise customers debating networking infrastructure providers for hardware, software, and services in cloud environments or on premises."	
#分词得到词语列表tokens	
words = lm.tokenize(test_text)	
#将词语列表words传入lm.get_score,得到得分score	
score = lm.get_score(words)	
#查看score	
score

Run

{'Positive': 6,	
 'Negative': 2,	
 'Polarity': 0.4999999375000079,	
 'Subjectivity': 0.055172413412604045}

推荐文章

课件获取方式,请在公众号后台回复关键词“LM情感分析

觉得本文有用,请不吝点赞评论转发~谢谢支持~

  • 13
    点赞
  • 45
    收藏
    觉得还不错? 一键收藏
  • 8
    评论
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值