如何自动获取各个公司的股票历史数据并绘图是金融文本情感分析项目里的必要部分,诚然这些数据在finance.yahoo.com里可以很方便的看到,但如何利用程序自动获取、实时显示却是个问题。之前一直考虑写爬虫来抓取数据,显然这样做很费力且效率不高,而Python.matplotlib module有一finance module能够很便捷的实现这一功能。
finance.py is a collection of modules for collecting , collecting ,analying and plotting financial data.让我们先看一个example 关于利用matplotlib模块获取finance.yahoo.com里的历史数据并绘图,先贴代码
from pylab import figure, show
from matplotlib.finance import quotes_historical_yahoo
from matplotlib.dates import YearLocator, MonthLocator, DateFormatter
import datetime
date1 = datetime.date( 2012, 1, 1 )
date2 = datetime.date( 2012, 11, 11 )
daysFmt = DateFormatter('%m-%d-%Y')
quotes = quotes_historical_yahoo('MSFT', date1, date2)
if len(quotes) == 0:
raise SystemExit
dates = [q[0] for q in quotes]
opens = [q[1] for q in quotes]
fig = figure()
ax = fig.add_subplot(111)
ax.plot_date(dates, opens, '-')
# format the ticks
ax.xaxis.set_major_formatter(daysFmt)
ax.autoscale_view()
# format the coords message box
def price(x): return '$%1.2f'%x
ax.fmt_xdata = DateFormatter('%Y-%m-%d')
ax.fmt_ydata = price
ax.grid(True)
fig.autofmt_xdate()
show()
date1、date2分别是所要查询数据的起止时间,比如这个例子就是要查询微软2012.1.1至2012.11.11之间的历史股价。
quotes_historical_yahoo是一个获取yahoo历史数据的函数,需要输入公司的Ticker Symbol和查询起止日期,输出为一缓冲文件,具体代码如下:
def quotes_historical_yahoo(ticker, date1, date2, asobject=False,
adjusted=True, cachename=None):
"""
Get historical data for ticker between date1 and date2. date1 and
date2 are datetime instances or (year, month, day) sequences.
See :func:`parse_yahoo_historical` for explanation of output formats
and the *asobject* and *a