aws python库_如何使用Python,AWS和IEX Cloud创建自动更新股市数据的Excel电子表格

aws python库

Many Python developers in the financial world are tasked with creating Excel documents for analysis by non-technical users.

金融界的许多Python开发人员的任务是创建Excel文档,以供非技术用户进行分析。

This is actually a lot harder than it sounds. From sourcing the data to formatting the spreadsheet to deploying the final doc in a central location, there are plenty of steps involved in the process.

这实际上比听起来要难得多。 从获取数据到格式化电子表格,再到在中心位置部署最终文档,该过程涉及很多步骤。

In this tutorial, I'm going to show you how to create Excel spreadsheets using Python that:

在本教程中,我将向您展示如何使用Python创建Excel电子表格:

  • Use stock market data from IEX Cloud

    使用来自IEX Cloud的股市数据
  • Are deployed in a centralized S3 bucket so that anyone with the right URL can access them

    部署在集中式S3存储桶中,因此任何具有正确URL的人都可以访问它们
  • Automatically update daily using the cron command line utility

    使用cron命令行实用程序每天自动更新

步骤1:使用IEX Cloud创建帐户 (Step 1: Create an Account with IEX Cloud)

IEX Cloud is the data provider subsidiary of the IEX stock exchange.

IEX CloudIEX股票交易所的数据提供商子公司。

In case you're unfamiliar with IEX, it is an acronym for "The Investor's Exchange". IEX was founded by Brad Katsuyama to build a better stock exchange that avoids investor-unfriendly behavior like front-running and high-frequency trading. Katsuyama's exploits were famously chronicled in Michael Lewis' best-selling book Flash Boys.

如果您不熟悉IEX,则它是“投资者交易所”的缩写。 IEX由布拉德·胜山(Brad Katsuyama)创立,旨在建立一个更好的证券交易所,避免投资者不友好的行为,例如前期交易和高频交易。 胜山的功绩在迈克尔·刘易斯(Michael Lewis)的畅销书《 Flash Boys 》中被记载。

I have investigated many financial data providers and IEX Cloud has the best combination of:

我调查了许多金融数据提供商,并且IEX Cloud具有以下最佳组合:

  • High-quality data

    高质量数据
  • Affordable price

    可接受的价格

Their prices are below:

其价格如下:

The $9/month Launch plan is plenty for many use cases.

每月9美元的启动计划可满足许多用例。

A warning on using IEX Cloud (and any other pay-per-use data provider): it is very important that you set usage budgets from the beginning. These budgets lock you out of your account once you hit a certain dollar cost for the month.

关于使用IEX Cloud(和任何其他按使用量付费数据提供者)的警告:从一开始就设置使用预算非常重要。 一旦您当月达到一定的美元费用,这些预算就会使您无法进入帐户。

When I first started using IEX Cloud, I accidentally created an infinite loop on a Friday afternoon that contained an API call to IEX Cloud. These API calls are priced on a cost-per-call basis...which resulted in a terrifying email from IEX:

刚开始使用IEX Cloud时,我意外地在星期五下午创建了一个无限循环,其中包含对IEX Cloud的API调用。 这些API调用是按每次调用费用定价的,这导致来自IEX的电子邮件令人恐惧:

It is a testament to IEX's customer-centricity that they agreed to reset my usage as long as I set usage budgets moving forward. Go IEX!

IEX以客户为中心的证明是,只要我设定未来的使用预算,他们就同意重置我的使用情况。 去IEX!

As with most API subscriptions, the main benefit of creating an IEX Cloud account is having an API key.

与大多数API订阅一样,创建IEX Cloud帐户的主要好处是拥有API密钥。

For obvious reasons, I will not share an API key in this article.

出于明显的原因,我不会在本文中共享API密钥。

However, you can still work through this tutorial with your own API key as long as you assign it to the following variable name:

但是,只要将其分配给以下变量名,您仍然可以使用自己的API密钥来完成本教程:

IEX_API_Key

You will see the blank IEX_API_Key variable in my code blocks throughout the rest of this tutorial.

在本教程的其余部分中,您将在我的代码块中看到空白的IEX_API_Key变量。

第2步:编写Python脚本 (Step 2: Write Your Python Script)

Now that you have access to the API key that you'll need to gather financial data, it's time to write your Python script.

现在您已经可以访问收集财务数据所需的API密钥,现在该编写Python脚本了。

This will be the longest section of this tutorial. It is also the most flexible - we are going to create a Python script that satisfies certain pre-specified criteria, but you could modify this section to really create any spreadsheet you want!

这将是本教程中最长的部分。 它也是最灵活的-我们将创建一个满足某些预先指定条件的Python脚本,但是您可以修改此部分以真正创建所需的任何电子表格!

To start, let's lay out our goal posts. We are going to write a Python script that generates an Excel file of stock market data with the following characteristics:

首先,让我们布置目标职位。 我们将编写一个Python脚本,该脚本生成具有以下特征的股市数据的Excel文件:

  • It will include the 10 largest stocks in the United States

    它将包括美国十大股票
  • It will contain four columns: stock ticker, company name, share price, and dividend yield.

    它包含四列:股票行情,股票名称,股价和股息收益率。
  • It will be formatted such that the header's background color is #135485 and text is white, while the spreadsheet body's background is #DADADA and the font color is black (the default).

    格式将设置为:标题的背景颜色为#135485 ,文本为白色,而电子表格正文的背景为#DADADA ,字体颜色为黑色(默认值)。

Let's start by importing our first package.

让我们从导入第一个包开始。

Since spreadsheets are essentially just data structures with rows and columns, then the pandas library - including its built-in DataFrame object - is a perfect candidate for manipulating data in this tutorial.

由于电子表格本质上只是具有行和列的数据结构,因此pandas库(包括其内置的DataFrame对象)是本教程中处理数据的理想选择。

We'll start by importing pandas under the alias pd like this:

我们将以这样的别名pd导入pandas

import pandas as pd

Next, we'll specify our IEX Cloud API key. As I mentioned before, I'm not going to really include my API key, so you'll have to grab your own API key from your IEX account and include it here:

接下来,我们将指定IEX Cloud API密钥。 如前所述,我不会真正包含我的API密钥,因此您必须从IEX帐户中获取自己的API密钥并将其包含在此处:

IEX_API_Key = ''

Our next step is to determine the ten largest companies in the United States.

我们的下一步是确定美国十大公司。

You can answer this question with a quick Google search.

您可以通过快速的Google搜索回答此问题。

For brevity, I have included the companies (or rather, their stock tickers) in the following Python list:

为简便起见,我将公司(或更确切地说,它们的股票行情自动收录器)包括在以下Python列表中:

Next, it is time to figure out how to ping the IEX Cloud API to pull in the metrics we need for each company.

接下来,是时候弄清楚如何对IEX Cloud API进行ping操作,以获取每个公司所需的指标了。

The IEX Cloud API returns JSON objects in response to HTTP requests. Since we are working with more than 1 ticker in this tutorial, we will use IEX Cloud's batch API call functionality, which allows you to request data on more than one ticker at a time. Using batch API calls has two benefits:

IEX Cloud API返回JSON对象以响应HTTP请求。 由于我们在本教程中使用多个代码,因此我们将使用IEX Cloud的批处理API调用功能,该功能可让您一次在多个代码上请求数据。 使用批处理API调用有两个好处:

  • It reduces the number of HTTP requests you need to make, which will make your code more performant.

    它减少了您需要发出的HTTP请求的数量,这将使您的代码更具性能。
  • The pricing for batch API calls is slightly better with most data providers.

    对于大多数数据提供者,批处理API调用的定价略高。

Here is an example of what the HTTP request might look like, with a few placeholder words where we'll need to customize the request:

这是HTTP请求的外观示例,其中有一些占位符,我们需要自定义该请求:

https://cloud.iexapis.com/stable/stock/market/batch?symbols=TICKERS&types=ENDPOINTS&range=RANGE&token=IEX_API_Key

In this URL, we'll replace these variables with the following values:

在此URL中,我们将这些变量替换为以下值:

  • TICKERS will be replaced by a string that contains each of our tickers separated by a comma.

    TICKERS将替换为包含以逗号分隔的每个TICKERS代码的字符串。

  • ENDPOINTS will be replaced by a string that contains each of the IEX Cloud endpoints we want to hit, separated by a comma.

    ENDPOINTS将替换为包含我们要命中的每个IEX Cloud端点的字符串,并用逗号分隔。

  • RANGE will be replaced by 1y. These endpoints each contain point-in-time data and not time series data, so this range can really be whatever you want.

    RANGE将替换为1y 。 这些端点每个都包含时间点数据而不是时间序列数据,因此此范围实际上可以是您想要的任何范围。

Let's put this URL into a variable called HTTP_request for us to modify later:

让我们将此URL放入一个名为HTTP_request的变量中,以便我们稍后进行修改:

HTTP_request = 'https://cloud.iexapis.com/stable/stock/market/batch?symbols=TICKERS&types=ENDPOINTS&range=RANGE&token=IEX_API_Key'

Let's work through each of these variables one-by-one to determine the exact URL that we need to hit.

让我们逐一研究这些变量,以确定需要命中的确切URL。

For the TICKERS variable, we can generate a real Python variable (and not just a placeholder word) with a simple for loop:

对于TICKERS变量,我们可以使用简单的for循环生成一个真实的Python变量(而不仅仅是占位符):

#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''

#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:
    ticker_string += ticker
    ticker_string += ','
    
#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]

Now we can interpolate our ticker_string variable into the HTTP_request variable that we created earlier using an f-string:

现在,我们可以将我们的ticker_string变量插值到我们先前使用f字符串创建的HTTP_request变量中:

HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types=ENDPOINTS&range=RANGE&token=IEX_API_Key'

Next, we need to determine which IEX Cloud endpoints we need to ping.

接下来,我们需要确定需要ping的IEX Cloud端点。

Some quick investigation into the IEX Cloud documentation reveals that we only need the price and stats endpoints to create our spreadsheet.

对IEX Cloud文档的一些快速调查显示,我们只需要pricestats端点即可创建电子表格。

Thus, we can replace the placeholder ENDPOINTS word from our original HTTP request with the following variable:

因此,我们可以将原始HTTP请求中的占位符ENDPOINTS单词替换为以下变量:

endpoints = 'price,stats'

Like we did with our ticker_string variable, let's substitute the endpoints variable into the ticker_string variable:

就像我们对ticker_string变量所做的一样,让我们​​将endpoints变量替换为ticker_string变量:

HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=RANGE&token=IEX_API_Key'

The last placeholder we need to replace is RANGE. We will not replace with this a variable. Instead, we can hardcode a 1y directly into the URL path like this:

我们需要替换的最后一个占位符是RANGE 。 我们不会用这个替换变量。 相反,我们可以将1y直接硬编码到URL路径中,如下所示:

https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token=IEX_API_Key

We've done a lot so far, so let's recap our code base:

到目前为止,我们已经做了很多工作,所以让我们回顾一下代码库:

import pandas as pd

IEX_API_Key = ''

#Specify the stock tickers that will be included in our spreadsheet
tickers = [
            'MSFT',
            'AAPL',
            'AMZN',
            'GOOG',
            'FB',
            'BRK.B',
            'JNJ',
            'WMT',
            'V',
            'PG'
            ]


#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''

#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:
    ticker_string += ticker
    ticker_string += ','
    
#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]

#Create the endpoint strings
endpoints = 'price,stats'

#Interpolate the endpoint strings into the HTTP_request string
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token={IEX_API_Key}'

It is now time to ping the API and save its data into a data structure within our Python application.

现在是时候ping API并将其数据保存到我们的Python应用程序中的数据结构中了。

We can read  JSON objects with pandas' read_json method. In our case, we'll save the JSON data to a pandas DataFrame called raw_data, like this:

我们可以使用pandas的read_json方法读取JSON对象。 在本例中,我们将JSON数据保存到名为raw_datapandas DataFrame ,如下所示:

raw_data = pd.read_json(HTTP_request)

Let's take a moment now to make sure that the data has been imported in a nice format for our application.

现在让我们花一点时间来确保已为我们的应用程序以一种不错的格式导入了数据。

If you're working through this tutorial in a Jupyter Notebook, you can simply type the name of the pandas DataFrame variable on the last line of a code cell, and Jupyter will nicely render an image of the data, like this:

如果您正在Jupyter Notebook中完成本教程,则只需在代码单元的最后一行键入pandas DataFrame变量的名称,Jupyter就会很好地呈现数据图像,如下所示:

As you can see, the pandas DataFrame contains a column for each stock ticker and two rows: one for the stats endpoint and one for the price endpoint. We will need to parse this DataFrame to get the four metrics we want. Let's work through the metrics one-by-one in the steps below.

如您所见, pandas DataFrame的每个股票行情pandas DataFrame包含一列,两行分别是: stats端点和price端点。 我们将需要解析此DataFrame以获得所需的四个指标。 让我们在以下步骤中一步一步地研究指标。

指标1:股票行情 (Metric 1: Stock Ticker)

This step is very straightforward since the stock tickers are contained in the columns of the pandas DataFrame. We can access them through the columns attribute of the pandas DataFrame like this:

由于股票行情收录器包含在pandas DataFrame的列中,因此这一步骤非常简单。 我们可以通过pandas DataFramecolumns属性访问它们,如下所示:

raw_data.columns

To access the other metrics in raw_data, we will create a for loop that loops through each ticker in raw_data.columns. In each iteration of the loop we will add the data to a new pandas DataFrame object called output_data.

要访问raw_data的其他指标,我们将创建一个for循环,循环遍历raw_data.columns每个报价器。 在循环的每次迭代中,我们都将数据添加到名为output_data的新pandas DataFrame对象中。

First we'll need to create output_data, which should be an empty pandas DataFrame with four columns. Here's how to do this:

首先,我们需要创建output_data ,它应该是一个具有四列的空pandas DataFrame 。 这样做的方法如下:

output_data = pd.DataFrame(pd.np.empty((0,4)))

This creates an empty pandas DataFrame with 0 rows and 4 columns.

这将创建一个空的pandas DataFrame其中包含0行和4列。

Now that this object has been created, here's how we can structure this for loop:

现在已经创建了该对象,这是我们如何构造此for循环的方法:

for ticker in raw_data.columns:
	
    #Parse the company's name - not completed yet
    company_name = ''
    
    #Parse the company's stock price - not completed yet
    stock_price = 0
    
    #Parse the company's dividend yield - not completed yet
    dividend_yield = 0
    
    
    new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
    output_data = output_data.append(new_column, ignore_index = True)

Next, let's determine how to parse the company_name variable from the raw_data object.

接下来,让我们确定如何从raw_data对象中解析company_name变量。

指标2:公司名称 (Metric 2: Company Name)

The company_name variable is the first variable will need to be parsed from the raw_data object. As a quick recap, here's what raw_data looks like:

company_name变量是第一个需要从raw_data对象解析的变量。 快速回顾一下, raw_data如下所示:

The company_name variable is held within the stats endpoint under the dictionary key companyName. To parse this data point out of raw_data, we can use these indexes:

company_name变量保存在stats端点内的字典键companyName 。 要从raw_data解析此数据点,我们可以使用以下索引:

raw_data[ticker]['stats']['companyName']

Including this in our for loop from before gives this:

从之前将其包含在我们的for循环中将得到以下结果:

output_data = pd.DataFrame(pd.np.empty((0,4)))

for ticker in raw_data.columns:
	
    #Parse the company's name - not completed yet
    company_name = raw_data[ticker]['stats']['companyName']
    
    #Parse the company's stock price - not completed yet
    stock_price = 0
    
    #Parse the company's dividend yield - not completed yet
    dividend_yield = 0
    
    
    new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
    output_data = output_data.append(new_column, ignore_index = True)

Let's move on to parsing stock_price.

让我们继续分析stock_price

指标3:股价 (Metric 3: Stock Price)

The stock_price variable is contained within the price endpoint, which returns only a single value. This means we do not need to chain together indexes like we did with company_name.

stock_price变量包含在price端点内,该端点仅返回单个值。 这意味着我们不需要像对company_name那样将索引链接在一起。

Here's how we could parse stock_price from raw_data:

这是我们从raw_data解析stock_price

raw_data[ticker]['price']

Including this in our for loop gives us:

在我们的for循环中包含它for使我们:

output_data = pd.DataFrame(pd.np.empty((0,4)))

for ticker in raw_data.columns:
	
    #Parse the company's name - not completed yet
    company_name = raw_data[ticker]['stats']['companyName']
    
    #Parse the company's stock price - not completed yet
    stock_price = raw_data[ticker]['price']
    
    #Parse the company's dividend yield - not completed yet
    dividend_yield = 0
    
    
    new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
    output_data = output_data.append(new_column, ignore_index = True)

The last metric we need to parse is dividend_yield.

我们需要解析的最后一个指标是dividend_yield

指标4:股息收益率 (Metric 4: Dividend Yield)

Like company_name, dividend_yield is contained in the stats endpoint. It is held under the dividendYield dictionary key.

company_name一样, dividend_yield端点也包含在stats端点中。 它被保存在dividendYield字典关键字下。

Here is how we could parse it out of raw_data:

这是我们可以从raw_data解析出来的方法:

raw_data[ticker]['stats']['dividendYield']

Adding this to our for loop gives us:

将其添加到我们的for循环中可以使我们:

output_data = pd.DataFrame(pd.np.empty((0,4)))

for ticker in raw_data.columns:
	
    #Parse the company's name - not completed yet
    company_name = raw_data[ticker]['stats']['companyName']
    
    #Parse the company's stock price - not completed yet
    stock_price = raw_data[ticker]['price']
    
    #Parse the company's dividend yield - not completed yet
    dividend_yield = raw_data[ticker]['stats']['dividendYield']
    
    
    new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
    output_data = output_data.append(new_column, ignore_index = True)

Let's print out our output_data object to see what the data looks like:

让我们打印出output_data对象,看看数据是什么样的:

So far so good! The next two steps are to name the columns of the pandas DataFrame and to change its index.

到目前为止,一切都很好! 接下来的两个步骤是命名pandas DataFrame的列并更改其索引。

如何命名熊猫数据框的列 (How to Name the Columns of a Pandas DataFrame)

We can update the column names of our output_data object by creating a list of column names and assigning it to the output_data.columns attribute, like this:

我们可以通过创建列名列表并将其分配给output_data.columns属性来更新output_data对象的列名,如下所示:

output_data.columns = ['Ticker', 'Company Name', 'Stock Price', 'Dividend Yield']

Let's print out our output_data object to see what the data looks like:

让我们打印出output_data对象,看看数据是什么样的:

Much better! Let's change the index of output_data next.

好多了! 接下来让我们更改output_data的索引。

如何更改熊猫数据框的索引 (How to Change the Index of a Pandas DataFrame)

The index of a pandas DataFrame is a special column that is somewhat similar to the primary key of a SQL database table. In our output_data object, we want to set the Ticker column as the DataFrame's index.

pandas DataFrame的索引是一个特殊的列,与SQL数据库表的主键有些相似。 在我们的output_data对象中,我们想要将Ticker列设置为DataFrame的索引。

Here's how we can do this using the set_index method:

使用set_index方法的方法如下:

output_data.set_index('Ticker', inplace=True)

Let's print out our output_data object to see what the data looks like:

让我们打印出output_data对象,看看数据是什么样的:

Another incremental improvement!

另一个增量的改进!

Next, let's deal with the missing data in output_data.

接下来,让我们处理output_data丢失的数据。

如何处理Pandas DataFrame中的缺失数据 (How to Handle Missing Data in Pandas DataFrames)

If you take a close look at output_data, you will notice that there are several None values in the Dividend Yield column:

如果仔细查看output_data ,您会注意到“ Dividend Yield列中有多个“ None值:

These None values simply indicate that the company for that row does not currently pay a dividend. While None is one way of representing a non-dividend stock, it is more common to show a Dividend Yield of 0.

这些“ None值仅表示该行的公司当前不支付股息。 虽然None是表示非股息股票的一种方式,但更常见的是将Dividend Yield显示为0

Fortunately, the fix for this is quite straightforward. The pandas library includes an excellent fillna method that allows us to replace missing values in a pandas DataFrame.

幸运的是,解决方法非常简单。 pandas库包含出色的fillna方法,该方法使我们可以替换pandas DataFrame缺失值。

Here's how we can use the fillna method to replace our Dividend Yield column's None values with 0:

这是我们可以使用fillna方法将“ Dividend Yield列的None值替换为0

output_data['Dividend Yield'].fillna(0,inplace=True)

The output_data object looks much cleaner now:

现在, output_data对象看起来更加干净:

We are now ready to export our DataFrame to an Excel document! As a quick recap, here is our Python script to date:

现在,我们准备将DataFrame导出到Excel文档中! 快速回顾一下,这是迄今为止的Python脚本:

import pandas as pd

IEX_API_Key = ''

#Specify the stock tickers that will be included in our spreadsheet
tickers = [
            'MSFT',
            'AAPL',
            'AMZN',
            'GOOG',
            'FB',
            'BRK.B',
            'JNJ',
            'WMT',
            'V',
            'PG'
            ]


#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''

#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:
    ticker_string += ticker
    ticker_string += ','
    
#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]

#Create the endpoint strings
endpoints = 'price,stats'

#Interpolate the endpoint strings into the HTTP_request string
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token={IEX_API_Key}'

#Create an empty pandas DataFrame to append our parsed values into during our for loop
output_data = pd.DataFrame(pd.np.empty((0,4)))

for ticker in raw_data.columns:
	
    #Parse the company's name
    company_name = raw_data[ticker]['stats']['companyName']
    
    #Parse the company's stock price
    stock_price = raw_data[ticker]['price']
    
    #Parse the company's dividend yield
    dividend_yield = raw_data[ticker]['stats']['dividendYield']
    
    
    new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
    output_data = output_data.append(new_column, ignore_index = True)

#Change the column names of output_data
output_data.columns = ['Ticker', 'Company Name', 'Stock Price', 'Dividend Yield']

#Change the index of output_data
output_data.set_index('Ticker', inplace=True)

#Replace the missing values of the 'Dividend Yield' column with 0
output_data['Dividend Yield'].fillna(0,inplace=True)

#Print the DataFrame
output_data

如何使用XlsxWriter从Pandas DataFrame导出样式化的Excel文档 (How to Export A Styled Excel Document From a Pandas DataFrame using XlsxWriter)

There are multiple ways to export an xlsx file from a pandas DataFrame.

有多种方法可以从pandas DataFrame导出xlsx文件。

The easiest way is to use the built-in function to_excel. As an example, here's how we could export output_data to an Excel file:

最简单的方法是使用内置函数to_excel 。 例如,这是我们将output_data导出到Excel文件的方法:

output_data.to_excel('my_excel_document.xlsx)

The problem with this approach is that the Excel file has no format whatsoever. The output looks like this:

这种方法的问题在于Excel文件没有任何格式。 输出如下所示:

The lack of formatting in this document makes it hard to interpret.

本文档缺乏格式,使其难以解释。

What is the solution?

解决办法是什么?

We can use the Python package XlsxWriter to generate nicely-formatted Excel files. To start, we'll want to add the following import to the beginning of our Python script:

我们可以使用Python包XlsxWriter生成格式正确的Excel文件。 首先,我们要在Python脚本的开头添加以下导入:

import xlsxwriter

Next, we need to create our actual Excel file. The XlsxWriter package actually has a dedicated documentation page for how to work with pandas DataFrames, which is available here.

接下来,我们需要创建实际的Excel文件。 XlsxWriter软件包实际上有一个专门的文档页面,介绍如何使用pandas DataFrames ,可在此处找到

Our first step is to call the pd.ExcelWriter function and pass in the desired name of our xlsx file as the first argument and engine='xlsxwriter as the second argument. We will assign this to a variable called writer:

我们的第一步是调用pd.ExcelWriter函数,并将我们的xlsx文件的所需名称作为第一个参数传递,并将engine='xlsxwriter作为第二个参数engine='xlsxwriter 。 我们将其分配给一个称为writer的变量:

writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')

From there, we need to call the to_excel method on our pandas DataFrame. This time, instead of passing in the name of the file that we're trying to export, we'll pass in the writer object that we just created:

从那里,我们需要在pandas DataFrame上调用to_excel方法。 这次,我们将传递刚刚创建的writer对象,而不是传递我们试图导出的文件名:

output_data.to_excel(writer, sheet_name='Sheet1')

Lastly, we will call the save method on our writer object, which saves the xlsx file to our current working directory. When all this is done, here is the section of our Python script that saves output_data to an Excel file.

最后,我们将在writer对象上调用save方法,该方法将xlsx文件保存到当前工作目录中。 完成所有这些操作后,这就是我们的Python脚本部分,该部分将output_data保存到Excel文件。

writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')

output_data.to_excel(writer, sheet_name='Sheet1')

writer.save()

All of the formatting code that we will include in our xlsx file needs to be contained between the creation of the ExcelWriter object and the writer.save() statement.

在创建ExcelWriter对象和writer.save()语句之间,必须包含将包含在xlsx文件中的所有格式代码。

如何为使用Python创建的xlsx文件设置样式 (How to Style an xlsx File Created with Python)

It is actually harder than you might think to style an Excel file using Python.

实际上,这比使用Python设置Excel文件样式的难度要大。

This is partially because of some of the limitations of the XlsxWriter package. Its documentation states:

部分原因是XlsxWriter软件包的某些限制。 其文档指出:

'XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates or datetimes. In addition it isn’t possible to format any cells that already have a default format applied.

“ XlsxWriter和Pandas除了默认格式(如标头和索引单元格以及任何包含日期或日期时间的单元格)外,几乎没有支持格式化数据帧中的输出数据。 此外,无法格式化已应用默认格式的任何单元格。

If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas. However, some formatting options are available.'

如果您需要对数据帧输出进行严格控制的格式设置,那么最好直接使用Xlsxwriter处理来自Pandas的原始数据。 但是,有些格式选项可用。

In my experience, the most flexible way to style cells in an xlsx file created by XlsxWriter is to use conditional formatting that only applies styling when a cell is not equal to None.

以我的经验,在XlsxWriter创建的xlsx文件中设置单元格样式的最灵活的方法是使用条件格式,该条件格式仅在单元格不等于None时才应用样式设置。

This has three advantages:

这具有三个优点:

  1. It provides more styling flexibility than the normal formatting options available in XlsxWriter.

    与XlsxWriter中可用的常规格式设置选项相比,它提供了更多的样式灵活性。
  2. You do not need to manually loop through each data point and import them into the writer object one-by-one.

    您无需手动遍历每个数据点,并将它们一个接一个地导入到writer对象中。

  3. It allows you to easily see when None values have made their way into your finalized xlsx files, since they'll be missing the required formatting.

    它使您可以轻松地查看None值何时进入了最终的xlsx文件,因为它们将缺少所需的格式。

To apply styling using conditional formatting, we first need to create a few style templates. Specifically, we will need four templates:

要使用条件格式应用样式,我们首先需要创建一些样式模板。 具体来说,我们将需要四个模板:

  • One header_template that will be applied to the column names at the top of the spreadsheet

    一个header_template将应用于电子表格顶部的列名称

  • One string_template that will be applied to the Ticker and Company Name columns

    一个将应用于“ Ticker和“ Company Name列的string_template

  • One dollar_template that will be applied to the Stock Price column

    dollar_template将应用于“ Stock Price

  • One percent_template that will be applied to the Dividend Yield column

    将应用于“ Dividend Yield列的一个percent_template

Each of these format templates need to be added to the writer object in dictionaries that resemble CSS syntax. Here's what I mean:

这些格式模板中的每一个都需要以类似于CSS语法的字典添加到writer对象。 这就是我的意思:

header_template = writer.book.add_format(
        {
            'font_color': '#ffffff',
            'bg_color': '#135485',
            'border': 1
        }
    )

string_template = writer.book.add_format(
        {
            'bg_color': '#DADADA',
            'border': 1
        }
    )

dollar_template = writer.book.add_format(
        {
            'num_format':'$0.00',
            'bg_color': '#DADADA',
            'border': 1
        }
    )

percent_template = writer.book.add_format(
        {
            'num_format':'0.0%',
            'bg_color': '#DADADA',
            'border': 1
        }
    )

To apply these formats to specific cells in our xlsx file, we need to call the package's conditional_format method on  writer.sheets['Stock Market Data']. Here is an example:

要将这些格式应用于xlsx文件中的特定单元格,我们需要在writer.sheets['Stock Market Data']上调用程序包的conditional_format方法。 这是一个例子:

writer.sheets['Stock Market Data'].conditional_format('A2:B11', 
                             {
                                'type':     'cell',
                                'criteria': '<>',
                                'value':    '"None"',
                                'format':   string_template
                                }
                            )

If we generalize this formatting to the other three formats we're applying, here's what the formatting section of our Python script becomes:

如果将这种格式概括为我们将要应用的其他三种格式,这就是Python脚本的格式部分:

writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')

output_data.to_excel(writer, sheet_name='Stock Market Data')

header_template = writer.book.add_format(
        {
            'font_color': '#ffffff',
            'bg_color': '#135485',
            'border': 1
        }
    )

string_template = writer.book.add_format(
        {
            'bg_color': '#DADADA',
            'border': 1
        }
    )

dollar_template = writer.book.add_format(
        {
            'num_format':'$0.00',
            'bg_color': '#DADADA',
            'border': 1
        }
    )

percent_template = writer.book.add_format(
        {
            'num_format':'0.0%',
            'bg_color': '#DADADA',
            'border': 1
        }
    )


#Format the header of the spreadsheet
writer.sheets['Stock Market Data'].conditional_format('A1:D1', 
                             {
                                'type':     'cell',
                                'criteria': '<>',
                                'value':    '"None"',
                                'format':   header_template
                                }
                            )

#Format the 'Ticker' and 'Company Name' columns
writer.sheets['Stock Market Data'].conditional_format('A2:B11', 
                             {
                                'type':     'cell',
                                'criteria': '<>',
                                'value':    '"None"',
                                'format':   string_template
                                }
                            )

#Format the 'Stock Price' column
writer.sheets['Stock Market Data'].conditional_format('C2:C11', 
                             {
                                'type':     'cell',
                                'criteria': '<>',
                                'value':    '"None"',
                                'format':   dollar_template
                                }
                            )

#Format the 'Dividend Yield' column
writer.sheets['Stock Market Data'].conditional_format('D2:D11', 
                             {
                                'type':     'cell',
                                'criteria': '<>',
                                'value':    '"None"',
                                'format':   percent_template
                                }
                            )

writer.save()

Let's take a look at our Excel document to see how its looking:

让我们看一下我们的Excel文档,看看它的外观:

So far so good! The last incremental improvement that we can make to this document is to make its columns a bit wider.

到目前为止,一切都很好! 我们可以对本文档进行的最后一个增量改进是使其列更宽。

We can specify column widths by calling the set_column method on writer.sheets['Stock Market Data'].

我们可以通过调用writer.sheets['Stock Market Data']set_column方法来指定列宽。

Here's what we'll add to our Python script to do this:

这是我们将添加到Python脚本中的操作:

#Specify all column widths
writer.sheets['Stock Market Data'].set_column('B:B', 32)
writer.sheets['Stock Market Data'].set_column('C:C', 18)
writer.sheets['Stock Market Data'].set_column('D:D', 20)

Here's the final version of the spreadsheet:

这是电子表格的最终版本:

Voila! We are good to go! You can access the final version of this Python script on GitHub here. The file is named stock_market_data.py.

瞧! 我们很好走! 您可以在GitHub上访问此Python脚本的最终版本 。 该文件名为stock_market_data.py

步骤3:设置AWS EC2虚拟机以运行Python脚本 (Step 3: Set Up an AWS EC2 Virtual Machine to Run Your Python Script)

Your Python script is finalized and ready to run.

您的Python脚本已完成并可以运行。

However, we do not want to simply run this on our local machine on an ad hoc basis.

但是,我们不想临时在本地计算机上简单地运行它。

Instead, we are going to set up a virtual machine using Amazon Web Services' Elastic Compute Cloud (EC2) service.

相反,我们将使用Amazon Web Services的Elastic Compute Cloud (EC2)服务来设置虚拟机。

You'll need to create an AWS account first if you do not already have one. To do this, navigate to this URL and click the "Create an AWS Account" in the top-right corner:

如果您还没有一个AWS账户,则需要先创建一个。 为此,请导航至该URL,然后单击右上角的“创建AWS账户”:

AWS' web application will guide you through the steps to create an account.

AWS的Web应用程序将指导您完成创建帐户的步骤。

Once your account is created,  you'll need to create an EC2 instance. This is simply a virtual server for running code on AWS infrastructure.

创建帐户后,您需要创建一个EC2实例。 这只是一个用于在AWS基础架构上运行代码的虚拟服务器。

EC2 instances come in various operating systems and sizes, ranging from very small servers that qualify for AWS' free tier to very large servers capable of running complex applications.

EC2实例具有各种操作系统和大小,范围从符合AWS免费层标准的小型服务器到能够运行复杂应用程序的大型服务器。

We will use AWS' smallest server to run the Python script that we wrote in this article. To get started, navigate to EC2 within the AWS management console. Once you've arrived within EC2, click Launch Instance:

我们将使用AWS最小的服务器来运行我们在本文中编写的Python脚本。 要开始使用,请在AWS管理控制台中导航到EC2。 进入EC2后,点击Launch Instance

This will bring you to a screen that contains all of the available instance types within AWS EC2. Any machine that qualifies for AWS' free tier will be sufficient.

这将带您到一个包含AWS EC2内所有可用实例类型的屏幕。 符合AWS免费套餐条件的任何计算机就足够了。

I chose the Amazon Linux 2 AMI (HVM):

我选择了Amazon Linux 2 AMI (HVM)

Click Select to proceed.

单击Select继续。

On the next page, AWS will ask you to select the specifications for your machine. The fields you can select include:

在下一页上,AWS将要求您选择计算机的规格。 您可以选择的字段包括:

  • Family

    Family

  • Type

    Type

  • vCPUs

    vCPUs

  • Memory

    Memory

  • Instance Storage (GB)

    Instance Storage (GB)

  • EBS-Optimized

    EBS-Optimized

  • Network Performance

    Network Performance

  • IPv6 Support

    IPv6 Support

For the purpose of this tutorial, we simply want to select the single machine that is free tier eligible. It is characterized by a small green label that looks like this:

就本教程而言,我们只想选择符合免费套餐资格的单台计算机。 它的特征是带有一个小的绿色标签,如下所示:

Once you have selected a free tier eligible machine, click Review and Launch at the bottom of the screen to proceed. The next screen will present the details of your new instance for you to review. Quickly review the machine's specifications, then click Launch in the bottom right-hand corner.

选择符合条件的免费套餐计算机后,请单击屏幕底部的“ Review and Launch ”以继续。 下一个屏幕将显示新实例的详细信息供您查看。 快速查看机器的规格,然后单击右下角的Launch

Clicking the Launch button will trigger a popup that asks you to Select an existing key pair or create a new key pair. A key pair is comprised of a public key that AWS holds and a private key that you must download and store within a .pem file. You must have access to that .pem file in order to access your EC2 instance (typically via SSH). You also have the option to proceed without a key pair, but this is not recommended for security reasons.

单击Launch按钮将触发一个弹出窗口,要求您Select an existing key pair or create a new key pair 。 密钥对由AWS持有的公共密钥和必须下载并存储在.pem文件中的私有密钥组成。 您必须有权访问该.pem文件才能访问您的EC2实例(通常通过SSH)。 您还可以选择不使用密钥对继续进行操作,但是出于安全原因, 建议这样做。

Once you have selected or created a key pair for this EC2 instance and click the radio button for I acknowledge that I have access to the selected private key file (data-feeds.pem), and that without this file, I won't be able to log into my instance, you can click Launch Instances to proceed.

为该EC2实例选择或创建密钥对后,请单击单选按钮,因为I acknowledge that I have access to the selected private key file (data-feeds.pem), and that without this file, I won't be able to log into my instance ,您可以单击Launch Instances继续。

Your instance will now begin to launch. It can take some time for these instances to boot up, but once its ready, its Instance State will show as running in your EC2 dashboard.

您的实例现在将开始启动。 这些实例启动可能需要一些时间,但是一旦准备就绪,其Instance State将在您的EC2仪表板中显示为running

Next, you will need to push your Python script into your EC2 instance. Here is a generic command state statement that allows you to move a file into an EC2 instance:

接下来,您将需要将Python脚本推送到EC2实例中。 这是一条通用的命令状态语句,它使您可以将文件移动到EC2实例中:

scp -i path/to/.pem_file path/to/file   username@host_address.amazonaws.com:/path_to_copy

Run this statement with the necessary replacements to move stock_market_data.py into the EC2 instance.

运行此语句并进行必要的替换,以将stock_market_data.py移入EC2实例。

Trying to run stock_market_data.py at this point will actually result in an error because the EC2 instance does not come with the necessary Python packages.

此时尝试运行stock_market_data.py实际上会导致错误,因为EC2实例未附带必需的Python软件包。

To fix this, you can either export a requirements.txt file and import the proper packages using pip, or you can simply run the following:

要解决此问题,您可以导出requirements.txt文件并使用pip导入适当的软件包,也可以简单地运行以下命令:

sudo yum install python3-pip
pip3 install pandas
pip3 install xlsxwriter

Once this is done, you can SSH into the EC2 instance and run the Python script from the command line with the following statement:

完成此操作后,您可以通过SSH进入EC2实例,并使用以下语句从命令行运行Python脚本:

python3 stock_market_data.py

步骤4:创建一个AWS S3存储桶以保存完成的Python脚本 (Step 4: Create an AWS S3 Bucket to Hold the Finished Python Script)

With the work that we have completed so far, our Python script can be executed inside of our EC2 instance.

到目前为止,我们已经完成了工作,可以在EC2实例内部执行Python脚本。

The problem with this is that the xlsx file will be saved to the AWS virtual server.

问题是xlsx文件将保存到AWS虚拟服务器。

It is not accessible to anyone but us in that server, which limits its usefulness.

该服务器中除我们以外的任何人都无法访问它,这限制了它的实用性。

To fix this, we are going to create a public bucket on AWS S3 where we can save the xlsx file. Anyone who has the right URL will be able to download this file once this change is made.

为了解决这个问题,我们将在AWS S3上创建一个公共存储桶,在其中可以保存xlsx文件。 进行此更改后,拥有正确URL的任何人都可以下载此文件。

To start, navigate to AWS S3 from within the AWS Management Console. Click Create bucket in the top right:

首先,从AWS管理控制台中导航到AWS S3。 点击右上角的Create bucket

On the next screen, you will need to pick a name for your bucket and an AWS region for the bucket to be hosted in. The bucket name must be unique and cannot contain spaces or uppercase letters. The region does not matter much for the purpose of this tutorial, so I will be using the default region of US East (Ohio) us-east-2).

在下一个屏幕上,您将需要为存储桶选择一个名称,并为要托管的存储桶选择一个AWS区域。存储桶名称必须唯一,并且不能包含空格或大写字母。 该区域对于本教程而言并不重要,因此我将使用US East (Ohio) us-east-2)的默认区域。

You will need to change the Public Access settings in the next section to match this configuration:

您将需要在下一部分中更改“公共访问”设置以匹配此配置:

Click Create bucket to create your bucket and conclude this step of this tutorial!

单击Create bucket以创建您的存储桶,并完成本教程的这一步!

步骤5:修改Python脚本以将xlsx文件推送到AWS S3 (Step 5: Modify Your Python Script to Push the xlsx File to AWS S3)

Our AWS S3 bucket is now ready to hold our finalized xlsx document. We will now make a small change to our stock_market_data.py file to push the finalized document to our S3 bucket.

现在,我们的AWS S3存储桶已准备好保存我们最终确定的xlsx文档。 现在,我们stock_market_data.py文件进行少量更改,以将最终文档推送到我们的S3存储桶中。

We will need to use the boto3 package to do this. boto3 is the AWS Software Development Kit (SDK) for Python, allowing Python developers to write software that connects to AWS services. To start, you'll need to install boto3 on your EC2 virtual machine. Run the following command line statement to do this:

我们将需要使用boto3软件包来执行此操作。 boto3是用于Python的AWS软件开发套件(SDK),允许Python开发人员编写连接到AWS服务的软件。 首先,您需要在EC2虚拟机上安装boto3 。 运行以下命令行语句以执行此操作:

pip3 install boto3

You will also need to import the library into stock_market_data.py by adding the following statement to the top of the Python script.

您还需要通过将以下语句添加到Python脚本的顶部,将库导入stock_market_data.py

import boto3

We will need to add a few lines of code to the end of stock_market_data.py to push the final document to AWS S3.

我们将需要在stock_market_data.py的末尾添加几行代码,以将最终文档推送到AWS S3。

s3 = boto3.resource('s3')
s3.meta.client.upload_file('stock_market_data.xlsx', 'my-S3-bucket', 'stock_market_data.xlsx', ExtraArgs={'ACL':'public-read'})

The first line of this code, s3 = boto3.resource('s3'), allows our Python script to connect to Amazon Web Services.

此代码的第一行s3 = boto3.resource('s3')允许我们的Python脚本连接到Amazon Web Services。

The second line of code calls a method from boto3 that actually uploads our file to S3. It takes four arguments:

第二行代码从boto3调用一个方法,该方法实际上将我们的文件上传到S3。 它包含四个参数:

  • stock_market_data.xlsx - the name of the file on our local machine.

    stock_market_data.xlsx我们本地计算机上文件的名称。

  • my-S3-bucket - the name of the S3 bucket that we're uploading our file to.

    my-S3-bucket我们要将文件上传到的S3存储桶的名称。

  • stock_market_data.xlsx - the desired name of the file within the S3 bucket. In most cases, this will have the same value as the first argument passed into this method.

    stock_market_data.xlsx -S3存储桶中文件的所需名称。 在大多数情况下,该值与传递给此方法的第一个参数的值相同。

  • ExtraArgs={'ACL':'public-read'} - this is an optional argument that tells AWS to make the uploaded file publicly-readable.

    ExtraArgs={'ACL':'public-read'} -这是一个可选参数,告诉AWS使上传的文件公开可读。

步骤6:安排Python脚本使用Cron定期运行 (Step 6: Schedule Your Python Script to Run Periodically Using Cron)

So far, we have completed the following:

到目前为止,我们已经完成了以下工作:

  • Built our Python script

    构建我们的Python脚本
  • Created an EC2 instance and deployed our code there

    创建一个EC2实例并将代码部署在那里
  • Created an S3 bucket where we can push the final xlsx document

    创建了一个S3存储桶,我们可以在其中推送最终的xlsx文档

  • Modified the original Python script to upload the finalized stock_market_data.xlsx file to an AWS S3 bucket

    修改了原始Python脚本以将最终的stock_market_data.xlsx文件上传到AWS S3存储桶

The only step that is left is to schedule the Python script to run periodically.

剩下的唯一步骤是安排Python脚本定期运行。

We can do this using a command-line utility called cron. To start, we will need to create a cron expression that tells the utility when to run the code. The crontab guru website is an excellent resource for this.

我们可以使用名为cron的命令行实用程序来执行此操作。 首先,我们将需要创建一个cron表达式来告诉实用程序何时运行代码。 crontab大师网站是一个很好的资源。

Here's how you can use crontab guru to get cron expression that means every day at noon:

这是使用crontab专家获取cron表达式的方法,这意味着every day at noon

Now we need to instruct our EC2 instance's cron daemon to run stock_market_data.py at this time each day.

现在,我们需要指示EC2实例的cron守护进程每天每天的这个时候运行stock_market_data.py

To do this, we will first create a new file in our EC2 instance called stock_market_data.cron.

为此,我们将首先在EC2实例中创建一个名为stock_market_data.cron的新文件。

Open up this file and type in our cron expression followed by the statement that should be executed at the command line at that specified time.

打开此文件,然后键入cron表达式,然后输入应在指定时间在命令行执行的语句。

Our command line statement is python3 stock_market_data.py, so here is what should be contained in stock_market_data.cron:

我们的命令行语句是python3 stock_market_data.py ,所以这是stock_market_data.cron应包含的stock_market_data.cron

00 12 * * * python3 stock_market_data.py

If you run an ls command in your EC2 instance, you should now see two files:

如果在EC2实例中运行ls命令,现在应该看到两个文件:

stock_market_data.py	stock_market_data.cron

The last step of this tutorial is to load stock_market_data.cron into the crontab. You can think of the crontab as a file that contains commands and instructions for the cron daemon to execute. In other words, the crontab contains batches of cron jobs.

本教程的最后一步是将stock_market_data.cron加载到crontab 。 您可以将crontab视为一个文件,其中包含cron守护程序要执行的命令和说明。 换句话说, crontab包含一批cron作业。

First, let's see what's in our crontab. It should be empty since we have not put anything in it! You can view the contents of your crontab with the following command:

首先,让我们看看crontab 。 它应该是空的,因为我们没有在其中放任何东西! 您可以使用以下命令查看crontab的内容:

crontab -l

To load stock_market_data.cron into the crontab, run the following statement on the command line:

要将stock_market_data.cron加载到crontab ,请在命令行上运行以下语句:

crontab stock_market_data.cron

Now when you run crontab -l, you should see:

现在,当您运行crontab -l ,您应该看到:

00 12 * * * python3 stock_market_data.py

Our stock_market_data.py script will now run at noon every day on our AWS EC2 virtual machine!

现在,我们的stock_market_data.py脚本每天中午将在我们的AWS EC2虚拟机上运行!

最后的想法 (Final Thoughts)

In this article, you learned how to create automatically-updating Excel spreadsheets of financial data using Python, IEX Cloud, and Amazon Web Services.

在本文中,您学习了如何使用Python,IEX Cloud和Amazon Web Services创建自动更新财务数据的Excel电子表格。

Here are the specific steps we covered in this tutorial:

以下是我们在本教程中介绍的具体步骤:

  • How to create an account with IEX Cloud

    如何使用IEX Cloud创建帐户
  • How to write a Python script that generates beautiful Excel documents using pandas and XlsxWriter

    如何编写使用Pandas和XlsxWriter生成漂亮的Excel文档的Python脚本
  • How to launch an AWS EC2 instance and deploy code on it

    如何启动AWS EC2实例并在其上部署代码
  • How to create an AWS S3 bucket

    如何创建一个AWS S3存储桶
  • How to push files to an AWS S3 bucket from within a Python script

    如何从Python脚本中将文件推送到AWS S3存储桶
  • How to schedule code to run using the cron software utility

    如何安排代码使用cron软件实用程序运行

This article was published by Nick McCullum, who teaches people how to code on his website.

这篇文章由尼克·麦卡鲁姆(Nick McCullum)发表,他教人们如何在其网站上进行编码

翻译自: https://www.freecodecamp.org/news/auto-updating-excel-python-aws/

aws python库

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值