期权数据 获取
by Harry Sauers
哈里·绍尔斯(Harry Sauers)
我如何免费获得期权数据 (How I get options data for free)
网页抓取金融简介 (An introduction to web scraping for finance)
Ever wished you could access historical options data, but got blocked by a paywall? What if you just want it for research, fun, or to develop a personal trading strategy?
曾经希望您可以访问历史期权数据,但是却被付费专区阻止了吗? 如果您只是想将其用于研究,娱乐或制定个人交易策略该怎么办?
In this tutorial, you’ll learn how to use Python and BeautifulSoup to scrape financial data from the Web and build your own dataset.
在本教程中,您将学习如何使用Python和BeautifulSoup从Web刮取财务数据并构建自己的数据集。
入门 (Getting Started)
You should have at least a working knowledge of Python and Web technologies before beginning this tutorial. To build these up, I highly recommend checking out a site like codecademy to learn new skills or brush up on old ones.
在开始本教程之前,您应该至少具有Python和Web技术的工作知识。 要建立这些基础,我强烈建议您访问codecademy之类的网站,以学习新技能或学习旧技能。
First, let’s spin up your favorite IDE. Normally, I use PyCharm but, for a quick script like this Repl.it will do the job too. Add a quick print (“Hello world”) to ensure your environment is set up correctly.
首先,让我们启动您最喜欢的IDE。 通常,我使用PyCharm,但是对于像Repl.it这样的快速脚本也可以完成此工作。 添加快速打印(“ Hello world”)以确保正确设置您的环境。
Now we need to figure out a data source.
现在我们需要找出一个数据源。
Unfortunately, Cboe’s awesome options chain data is pretty locked down, even for current delayed quotes. Luckily, Yahoo Finance has solid enough options data here. We’ll use it for this tutorial, as web scrapers often need some content awareness, but it is easily adaptable for any data source you want.
不幸的是,即使对于当前的延迟报价, Cboe令人敬畏的期权链数据也已被锁定。 幸运的是,Yahoo Finance 在这里拥有足够可靠的期权数据。 我们将在本教程中使用它,因为网络抓取工具通常需要一些内容意识,但是它很容易适应您想要的任何数据源。
依存关系 (Dependencies)
We don’t need many external dependencies. We just need the Requests and BeautifulSoup modules in Python. Add these at the top of your program:
我们不需要很多外部依赖。 我们只需要Python中的Requests和BeautifulSoup模块。 将这些添加到程序顶部:
from bs4 import BeautifulSoupimport requests
Create a main
method:
创建一个main
方法:
def main(): print(“Hello World!”)if __name__ == “__main__”: main()
刮HTML (Scraping HTML)
Now you’re ready to start scraping! Inside main()
, add these lines to fetch the page’s full HTML
:
现在您就可以开始抓取了! 在main()
内部,添加以下行以获取页面的完整HTML
:
data_url = “https://finance.yahoo.com/quote/SPY/options"data_html = requests.get(data_url).contentprint(data_