熊猫分发_与熊猫度假

熊猫分发

While working on a project recently, I had to work with time series data spread over a year. I wanted to add columns for whether a specific date was a holiday, as well as wanted to count number of days from the previous holiday and days to the next holiday. As someone relatively new to Python, it was a bit of a challenge. However, after some research online, I found out that it is rather simple, and decided to share. Here are the goals of this article:

最近在从事一个项目时,我不得不处理分布在一年中的时间序列数据。 我想添加一列以了解特定日期是否是假日,以及想要计算从上一个假日到下一个假日的天数。 作为Python的新手,这是一个挑战。 但是,经过一些在线研究,我发现它相当简单,因此决定共享。 这是本文的目标:

  1. Check whether a day is a holiday or not.

    检查一天是否是假期。
  2. Calculate days from the previous holiday.

    计算上一个假期的天数。
  3. Calculate days to the next holiday.

    计算下一个假期的天数。

设置: (The Setup:)

We will be working with pandas data frame. Therefore we import pandas:

我们将使用熊猫数据框。 因此,我们进口大熊猫:

import pandas as pd

Now let’s create a data frame that contains dates over a period of a year, say from the 1st of January 2019 to the 31st of December 2019. We will call our data frame “range”. Pandas makes it very easy to generate a date range.

现在,让我们创建一个数据框,其中包含一年中的日期,例如从2019年1月1日到2019年12月31日。我们将数据框称为“范围”。 熊猫使生成日期范围变得非常容易。

dates = pd.DataFrame({'date':pd.date_range('2019-01-01', '2019-12-31')})

However, we need to add the first day of 2020 to this range. The reason being that one of our goals is to calculate days to the next holiday. This gets tricky in December as after the 25th, the next holiday is the 1st of January, which will not be included in our date range therefore we add that to out range of dates.

但是,我们需要将2020年的第一天添加到此范围。 原因是我们的目标之一是计算下一个假期的天数。 由于12月25日之后的下一个假期是1月1日,因此这在12月变得棘手,因此不会将其包含在我们的日期范围内,因此我们将其添加到日期范围之外。

dates = pd.DataFrame({'date':pd.date_range('2019-01-01', '2020-01-01')})

Let’s get a sneak peak at the data frame we created:

让我们在创建的数据帧上先睹为快:

Image for post

我可以拥有的所有假期 (All The Holidays I Can Have)

Pandas comes with a built-in module that contains the US federal holidays. Let’s import that:

熊猫带有一个内置模块,其中包含美国联邦假日。 让我们导入:

from pandas.tseries.holiday import USFederalHolidayCalendar as calendar

We can pass a range of date to this module and it will return all the holidays in that range.

我们可以将日期范围传递给该模块,它将返回该范围内的所有假期。

cal = calendar()
holidays = cal.holidays(start=dates[‘date’].min(), end=dates[‘date’].max())

In the above code, the module needs a start date and end date. Rather than giving it the dates manually, we can do it dynamically, which will be helpful when your data file keeps changing with data being added on on-going basis.

在上面的代码中,模块需要开始日期和结束日期。 我们可以动态地做到这一点,而不是手动指定日期,当您的数据文件随着不断添加的数据而不断变化时,这将非常有用。

“holidays” is now a pandas series containing all the dates that are US federal holidays.

“假期”现在是熊猫系列,其中包含美国联邦假日的所有日期。

Image for post

放假或不放假 (Holiday or No Holiday)

Let’s say we want a column titles “holiday” in our data frame which contains True if the date is a US federal holiday and False if it is not.

假设我们要在数据框中添加一个标题为“假期”的列,如果日期是美国联邦假日,则该字段为True,否则为False。

dates[‘holiday’] = dates[‘date’].isin(holidays)

Let’s have a look at our data frame

让我们看一下我们的数据框

Image for post

从上一个假期到下一个假期的天数(Days From The Previous Holiday & To The Next One)

We also wanted a column that displayed the number of days from the previous holiday as well as another one displaying days to the next one. We will write a function for each.

我们还希望有一列显示从上一个假期开始的天数,以及另外一个显示到下一个假期的天数的列。 我们将为每个函数编写一个函数。

def days_prev_holiday(date, holidays):
difference=[]
for item in holidays:
difference.append(int(str((item-date).days)))
return abs(max([x for x in difference if x<=0]))

Let’s dissect the code. What we are doing is that creating a list of numbers, each number representing number of days from a certain holiday. Since the date has passed, subtracting the current date from past date will give us a negative number. To determine days from closest past holiday, we return the maximum number among all the negative numbers. Since we are looking at the number of days, we do not need the ‘“-” sign therefore we use abs(number) to get rid of the minus sign.

让我们剖析代码。 我们正在做的是创建一个数字列表,每个数字代表某个假日的天数。 由于日期已过,因此从过去的日期减去当前日期将得到负数。 为了确定最近假期的天数,我们返回所有负数中的最大值。 由于我们正在查看天数,因此我们不需要“-”号,因此我们使用abs(number)来消除减号。

This function can be re-written into lesser lines using lists comprehension however for readability, we will keep our for loop in.

可以使用列表理解功能将此函数重写为较少的行,但是出于可读性考虑,我们将保留for循环。

Now to calculate the number of days to the next holiday, we use pretty much the same logic, however, this time we return the minimum number amount all the numbers greater than 0. Therefore the function then becomes:

现在计算下一个假期的天数,我们使用几乎相同的逻辑,但是,这次我们返回所有大于0的数字的最小数量。因此,该函数将变为:

def days_next_holiday(date, holidays):
difference=[]
for item in holidays:
difference.append(int(str((item-date).days)))
return min([x for x in difference if x>=0])

Now that we have our functions written, we can apply them to all the rows in the corresponding column in our data frame.

现在我们已经编写了函数,我们可以将它们应用于数据框中相应列的所有行。

dates['days_previous_holiday']= dates.apply(lambda row: days_prev_holiday(row['date'], holidays), axis=1)dates['days_next_holiday']= dates.apply(lambda row: days_next_holiday(row['date'], holidays), axis=1)

Let’s have a final look at the data frame we created:

让我们最后看一下我们创建的数据框:

Image for post

翻译自: https://medium.com/python-in-plain-english/a-holiday-date-with-pandas-ef089f83a24

熊猫分发

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值