I'm trying to pull two of the same files into python in different dataframes, with the end goal of comparing what was added in the new file and removed from the old. So far, I've got code that looks like this:
In[1] path = r'\\Documents\FileList'
files = os.listdir(path)
In[2] files_txt = [f for f in files if f[-3:] == 'txt']
In[3] for f in files_txt:
data = pd.read_excel(path + r'\\' + f)
df = df.append(data)
I've also set a variable to equal the current date minus a certain number of days, which I want to use to pull the file that has a date equal to that variable:
d7 = dt.datetime.today() - timedelta(7)
As of now, I'm unsure of how to do this, as the first part of the filename always remains the same but they add numbers at the end (eg. file_03232016 then file_03302016). I want to parse through the directory for the beginning part of the filename and add it to a dataframe if it matches the date parameter I set.
EDIT: I forgot to add that sometimes I also need to look at the system date created timestamp, as the text date in the file name isn't always there.
解决方案
Here are some modifications to your original code to get a list of files containing your target date. You need to use strftime.
import os
from datetime import timedelta
d7 = dt.datetime.today() - timedelta(7)
target_date_str = d7.strftime('_%m%d%Y')
files_txt = [f for f in files if f[-13:] == target_date_str + '.txt']
>>> target_date_str + '.txt'
'_03232016.txt'
data = []
for f in files_txt:
data.append(pd.read_excel(os.path.join(path, f))
df = pd.concat(data, ignore_index=True)