If have downloaded several years of data stored in files with the following naming convention, year_day.dat. For example, the file named 2014_1.dat has the data for January 1, 2014. I need to read these data files ordered by day, 2014_1.dat, 2014_2.dat, 2014_3.dat until the end of the year. In the folder they are listed in that ordered BUT when I create a list of the files in the directory they are reordered 2014_1.dat, 2014_10.dat, 2014_100.dat, 2014_101.dat...2014.199.dat, 2014_2.dat.
I think I need to use a sort function but how do I force it to sort the listed files by day so I can continue processing them?
Here's the code so far:
import sys, os, gzip, fileinput, collections
# Set the input/output directories
wrkDir = "C:/LJBTemp"
inDir = wrkDir + "/Input"
outDir = wrkDir + "/Output"
# here we go
inList = os.listdir(inDir) # List all the files in the 'Input' directory
print inList #print to screen reveals 2014_1.dat.gz followed by 2014_10.dat.gz NOT 2014_2.dat.gz HELP
d = {}
for fileName in inList: # Step through each input file
readFileName = inDir + "/" + fileName
with gzip.open(readFileName, 'r') as f: #call built in utility to unzip file for reading
for line in f:
city, long, lat, elev, temp = line.split() #create dictionary
d.setdefault(city, []).append(temp) #populate dictionary with city and associated temp data from each input file
collections.OrderedDict(sorted(d.items(), key=lambda d: d[0])) # QUESTION? why doesn't this work
#now collect and write to output file
outFileName = outDir + "/" + "1981_maxT.dat" #create output file in output directory with .dat extension
with open(outFileName, 'w') as f:
for city, values in d.items():
f.write('{} {}\n'.format(city, ' '.join(values)))
print "All done!!"
raw_input("Press ") # this keeps the window open until you press "enter"
解决方案
If you don't mind using third party libraries, you can use the natsort library, which was designed for exactly this situation.
import natsort
inList = natsort.natsorted(os.listdir(inDir))
This should take care of all the numerical sorting without having to worry about the details.
You can also use the ns.PATH option to make the sorting algorithm path-aware:
from natsort import natsorted, ns
inList = natsorted(os.listdir(inDir), alg=ns.PATH)
Full disclosure, I am the natsort author.