I have a need to fetch list of holidays in a given range, i.e., if start date is 20/12/2016 & end date is 10/1/2017, then I should get 25/12/2017, 1/1/2017. I can do this using Pandas, but in my case, I have limitation that I need to AWS Glue service & Pandas are not supported in AWS Glue.
I am trying to use native python library holidays, but I couldn't see API document to fetch holidays between from & to date?
Here is what I have tried:
import holidays
import datetime
from datetime import date, timedelta
import dateutil
from dateutil.relativedelta import relativedelta
us_holidays = holidays.UnitedStates()
for date2,name in sorted(holidays.US(state='CA', years=2013).items()):
print (date2,name)
days_from_closest_holiday = [(abs(fdate - hdate)).days for hdate in holidays.date.tolist()]
print days_from_closest_holiday
Output of us_holidays:
(datetime.date(2013, 1, 1), "New Year's Day")
(datetime.date(2013, 1, 21), 'Martin Luther King, Jr. Day')
(datetime.date(2013, 2, 18), "Washington's Birthday")
(datetime.date(2013, 3, 31), 'C\xc3\xa9sar Ch\xc3\xa1vez Day')
(datetime.date(2013, 4, 1), 'C\xc3\xa9sar Ch\xc3\xa1vez Day (Observed)')
(datetime.date(2013, 5, 27), 'Memorial Day')
(datetime.date(2013, 7, 4), 'Independence Day')
(datetime.date(2013, 9, 2), 'Labor Day')
(datetime.date(2013, 10, 14), 'Columbus Day')
(datetime.date(2013, 11, 11), 'Veterans Day')
(datetime.date(2013, 11, 28), 'Thanksgiving')
(datetime.date(2013, 12, 25), 'Christmas Day')
I need to pass a date range, fromdate, todate to the us_holidays object, and I expect list of holidays with in the range, something like the following:
DatetimeIndex(['2013-12-25', '2014-01-01'], dtype='datetime64[ns]', freq=None)
[10, 17]
In pandas, I can get it using the following:
cal = USFederalHolidayCalendar()
holidays = pd.to_datetime(cal.holidays(start_date, end_date))
print holidays
As mentioned above, I need to use AWS Glue, and currently Glue doesn't support Pandas.
Appreciate any help please.
Thanks
解决方案
After a while of exploring, I have the got the solution myself, sharing here for reference as a solution:
%pyspark
import holidays
import datetime
from datetime import date, timedelta
import dateutil
from dateutil.relativedelta import relativedelta
us_holidays = holidays.UnitedStates()
custom_holidays = holidays.HolidayBase()
holidays_within_range=[]
fmt = '%Y-%m-%d'
holidays2013=[]
for date2,name in sorted(holidays.US(state='CA', years=2013).items()):
holidays2013.append(date2.strftime(fmt))
print holidays2013
fdate = date(2013, 1, 1)
s_date = fdate - dateutil.relativedelta.relativedelta(days=7)
e_date = fdate + relativedelta(months=1)
start_date = s_date.strftime(fmt)
end_date = e_date.strftime(fmt)
print "Range : "
print start_date, end_date
dd = [s_date + timedelta(days=x) for x in range((e_date-s_date).days + 1)]
for d in dd:
if(d in us_holidays):
custom_holidays.append(d)
holidays_within_range.append(d.strftime(fmt))
print holidays_within_range
days_from_closest_holiday = [(abs(fdate - datetime.datetime.strptime(hdate, fmt).date())).days for hdate in holidays_within_range]
print days_from_closest_holiday
The outputs for the above would be :
['2013-01-01', '2013-01-21', '2013-02-18', '2013-03-31', '2013-04-01', '2013-05-27', '2013-07-04', '2013-09-02', '2013-10-14', '2013-11-11', '2013-11-28', '2013-12-25']
Range :
2012-12-25 2013-02-01
['2012-12-25', '2013-01-01', '2013-01-21']
[7, 0, 20]
This doesn't require Pandas, and I hope this works with AWS Glue. Haven't updated the variable names, please change as necessary.
Thanks