I have a list of lists of lists
The outer most list is the entire collection of members, each list within that is the individual members, and within that is each line of the raw text file I got split up into its individual elements.
Each member's record has a Name line, indicated by the "NM1" label
But not every member has an "End Date" field, indicated by the 'DTP' and '349' labels
Likewise not every member has an "Prior ID" field, indicated by the 'REF' and '0F' labels
I want to go through each record and if the field I need is there, extract the element of the list I need and append to a new list. If it isnt there, append a None value as a placeholder. I need each list to have the same number of values so when I put them as a pandas Series into a DataFrame, each series has the same length.
I got the data parsed into the format I want like this, as a simple example.
Groups = [[['NM1', 'IL', '1', 'SMITH', 'JOHN', 'PAUL', 'MR', 'JR', ''],
['REF', '1L', '690553677', ''],
['DTP', '348', 'D8', '20200601', ''],
['DTP', '349', 'D8', '20200630', '']],
[['NM1', 'IL', '1', 'IMA', 'MEAN', 'TURD', 'MR', 'SR', ''],
['REF', '1L', '690545645', ''],
['REF', '0F', '001938383',''],
['DTP', '348', 'D8', '20200601', '']]]
I try using a for loop to go through each record and if the combination of those special "labels" exist in the group, append it to a new list with just the last element I want (the date, or the ID #).
when I try to use multiple if- else conditions for each element I only get None values.
current_id = []
prior_id = []
start_date = []
end_date = []
for group in Groups:
if ((line[0] == 'REF') and (line[1] == 'IL')) in (line for line in group):
current_id.append(line[2])
else:
current_id.append(None)
if ((line[0] == 'REF') and (line[1] == '0F')) in (line for line in group):
prior_id.append(line[2])
else:
prior_id.append(None)
if ((line[0] == 'DTP') and (line[1] == '348')) in (line for line in group):
start_date.append(line[2])
else:
start_date.append(None)
if ((line[0] == 'DTP') and (line[1] == '349')) in (line for line in group):
end_date.append(line[2])
else:
end_date.append(None)
print(current_id)
print(prior_id)
print(start_date)
print(end_date)
[None, None]
[None, None]
[None, None]
[None, None]
It should be:
['690553677','690545645']
[None, '001938383']
['20200601', '20200601']
['20200630', None]
I know my logic must be off but how is the best way to do this?
解决方案
You can use for and else statements, I defined a function called ids that will retrieve the ids:
Groups = [[['NM1', 'IL', '1', 'SMITH', 'JOHN', 'PAUL', 'MR', 'JR', ''],
['REF', '1L', '690553677', ''],
['DTP', '348', 'D8', '20200601', ''],
['DTP', '349', 'D8', '20200630', '']],
[['NM1', 'IL', '1', 'IMA', 'MEAN', 'TURD', 'MR', 'SR', ''],
['REF', '1L', '690545645', ''],
['REF', '0F', '001938383',''],
['DTP', '348', 'D8', '20200601', '']]]
def ids(a, b):
l = []
for group in Groups:
for lst in group:
if lst[:2] == [a, b]:
if lst[2] == 'D8':
l.append(lst[3])
else:
l.append(lst[2])
break
else:
l.append(None)
return l
current_id = ids('REF', '1L')
prior_id = ids('REF', '0F')
start_date = ids('DTP', '348')
end_date = ids('DTP', '349')
print(current_id)
print(prior_id)
print(start_date)
print(end_date)
Output:
['690553677', '690545645']
[None, '001938383']
['20200601', '20200601']
['20200630', None]
Note the if statements I used: if lst[2] == 'D8':. I used that because I saw that not all of the list's id numbers are at index 2, some are at index 3.