I'm parsing two files which has data as shown below
File1:
UID A B C D
------ ---------- ---------- ---------- ----------
456 536 1 148 304
1071 908 1 128 243
1118 4 8 52 162
249 4 8 68 154
1072 296 416 68 114
118 180 528 68 67
file2:
UID X Y A Z B
------ ---------- ---------- ---------- ---------- ---------
456 536 1 148 304 234
1071 908 1 128 243 12
1118 4 8 52 162 123
249 4 8 68 154 987
1072 296 416 68 114 45
118 180 528 68 67 6
I will be comparing two such files, however the number of columns might vary and the columns names. For every unique UID, I need to match the column names, compare and find the difference.
Questions
1. Is there a way to access columns by column names instead of index?
2. Dynamically give column names based on the file data?
I'm able to load the file into list, and compare using indexes, but thats not a proper solutions.
Thanks in advance.
解决方案
You might consider using csv.DictReader. It allows you both to address columns by names, and a variable list of columns for each file opened. Consider removing the ------ separating header from actual data as it might be read wrong.
Example:
import csv
with open('File1', 'r', newline='') as f:
# If you don't pass field names
# they are taken from the first row.
reader = csv.DictReader(f)
for line in reader:
# `line` is a dict {'UID': val, 'A': val, ... }
print line
If your input format has no clear delimiter (multiple whitespaces), you can wrap the file with a generator that will compress continous whitespaces into e.g. a comma:
import csv
import re
r = re.compile(r'[ ]+')
def trim_whitespaces(f):
for line in f:
yield r.sub(',', line)
with open('test.txt', 'r', newline='') as f:
reader = csv.DictReader(trim_whitespaces(f))
for line in reader:
print line