I'm using Python 2.7, PyCharm and Anaconda,
I have a list of dates and I'd like to retrieve the last date of each month present in the array.
Are there any functions or libraries that could help me to do this?
I read the dates from a CSV file and stored them as datetime.
I have the following code:
Dates=[]
Dates1=[]
for date in dates:
temp=xlrd.xldate_as_tuple(int(date),0)
Dates1.append(datetime.datetime(temp[0],temp[1],temp[2]))
for date in Dates1:
if not (dateendDate):
Dates.append(date)
To make it clear, suppose I have:
Dates = [2015-01-20, 2015-01-15, 2015-01-17, 2015-02-21, 2015-02-06]
(Consider it being in datetime format.)
The list I'd like to retrieve is:
[2015-01-20, 2015-02-21]
So far I've googled around, especially in Stack Overflow, but I could only find answers to how I could get the last date of each month, but not from a user-specified list.
解决方案
Pandas can handle this task really well. Load your csv to a dataframe, then run a group by the month and find the max date using the aggregate function:
import pandas as pd
import numpy as np
df = pd.read_csv('/path/to/file/') # Load a dataframe with your file
df.index = df['my_date_field'] # set the dataframe index with your date
dfg = df.groupby(pd.TimeGrouper(freq='M')) # group by month / alternatively use MS for Month Start / referencing the previously created object
# Finally, find the max date in each month
dfg.agg({'my_date_field': np.max})
# To specifically coerce the results of the groupby to a list:
dfg.agg({'my_date_field': np.max})['my_date_field'].tolist()