I have a bunch of files with file-names as
companyname-date_somenumber.txt
I have to sort the files according to company name, then according to date, and copy their content in this sorted order to another text file.
Here's the approach I'm trying :
From each file-name, extract company name and then date, put these two fields in a dictionary, append this dictionary to a list and then sort this list according to the two columns of companyname and then date.
Then once I have the sorted order, I think I could search for the files in the folder according to the file-order I just obtained, then copy each files content into a txt file and I'll have my final txt file.
Here's the code I have so far :
myfiles = [ f for f in listdir(path) if isfile(join(path,f)) ]
file_list=[]
for file1 in myfiles:
# find indices of companyname and date in the file-name
idx1=file1.index('-',0)
idx2=file1.index('_',idx1)
company=file1[0:idx1] # extract companyname
thisdate=file1[idx1+1:idx2] #extract date, which is in format MMDDYY
dict={}
# extract month, date and year from thisdate
m=thisdate[0:2]
d=thisdate[2:4]
y='20'+thisdate[4:6]
# convert into date object
mydate = date(int(y), int(m), int(d))
dict['date']=mydate
dict['company']=company
file_list.append(dict)
I checked the output of file_list at the end of this block of code and I think I have my list of dicts. Now, how do I sort by companyname and then by date? I looked up sorting by multiple keys online but how would I get the increasing order by date?
Is there any other way that I could sort a list by a string and then a date field?
解决方案import os
from datetime import datetime
MY_DIR = 'somedirectory'
# my_files = [ f for f in os.listdir(MY_DIR) if os.path.isfile(os.path.join(MY_DIR,f)) ]
my_files = [
'ABC-031814_01.txt',
'ABC-031214_02.txt',
'DEF-010114_03.txt'
]
file_list = []
for file_name in my_files:
company,_,rhs = file_name.partition('-')
datestr,_,rhs = rhs.partition('_')
file_date = datetime.strptime(datestr,'%m%d%y')
file_list.append(dict(file_date=file_date,file_name=file_name,company=company))
for row in sorted(file_list,key=lambda x: (x.get('company'),x.get('file_date'))):
print row
The function sorted takes a keyword argument key that is a function applied to each item in the sequence you're sorting. If this function returns a tuple, the sequence will be sorted by the items in the tuple in turn.
Here lambda x: (x.get('company'),x.get('file_date')) allows sorted to sort by company name and then by date.