**编辑**
我刚刚意识到这并不能产生你想要的格式。把它放在一边以防别人发现它有用
**编辑**
您要查看的数据看起来像是使用键、值对的自定义格式。我不知道你是否想用csv模块来读取这些文件。(尽管它在编写输出csv文件时非常有用)
格式如下:
在
可能无法区分小数据段和小数据段的参数。它看起来像是在文件前面添加了'Time,Parameter,Value',这就是为什么我们看到奇怪的'Value00:00'条目。我想你是想在值后面加一个新行。在
我用一些数据做了一个虚拟文件:00:00, RecordID,5,Age,73
00:42,PaCO2,3400:42,PaO2,34401:11
01:11,SysABP,10501:11,Temp,35.201:11
在这里,我们期望输出csv文件的唯一列名是
^{pr2}$
我们需要遍历文件来发现所有这些。一旦我们找到它们,我们就可以创建一个csv.DictWriter使用适当的列。然后我们再次循环输入文件,将看到的所有内容写入dict。在
我在上面创建的虚拟文件上成功地测试了这个脚本。希望从脚本中的注释中可以清楚地看到所发生的事情。在import csv
def txt_to_csv(input_filenames):
for input_filename in input_filenames:
column_names = set()
output_filename = input_filename[:-4] + '.csv'
with open(input_filename, 'rb') as in_txt:
# figure out which column names are in the file on at least one line
for line in in_txt:
# get a list of parameters that were split by comma in the input txt file
params = line.strip().split(",")
# lines[1::2] slices out every other entry starting with the first column name
# we or the entries into the set to keep our memory footprint small by only
# storing one copy of each unique column name
# we strip each entry of any extra whitespace while doing a set comprehension.
column_names |= set(params[1::2])
# notice that we always skip the first column with the timestamp by starting at 1
# strip off any extra whitespace in column names
column_names = {x.strip() for x in column_names}
# add in missing timestamp column to the column names
column_names.add('timestamp')
# sort column names and convert python3 strings to bytes as required by csv module
sorted_column_names = sorted(column_names)
# bring the pointer back to the beginning of the file
in_txt.seek(0, 0)
# open a csv file and start writing the output
with open(output_filename, 'wb') as out_csv:
writer = csv.DictWriter(out_csv, sorted_column_names, dialect='excel')
# write column names
writer.writeheader()
for line in in_txt:
# create a list of values for this line
params = [x.strip() for x in line.strip().split(",")]
# turn key value pairs into dictionary
row_dict = dict(zip(params[1::2], params[2::2]))
# write timestamp entry to the dictionary
row_dict['timestamp'] = params[0]
# write row to file
writer.writerow(row_dict)
if __name__ == '__main__':
input_filenames = [r'C:\Users\cruse\Desktop\dummy_data.txt']
txt_to_csv(input_filenames)
我得到的结果是Age PaCO2 PaO2 RecordID SysABP Temp timestamp
73 5 0:00
3400:42:00 34401:11 0:42
10501:11 35.201:11 1:11
对于此数据集,这是正确的。然后,你可以使用像Pandas这样的工具来通过时间来传播价值观。(也就是说,将同一个RecordID分配给具有菲尔纳警官)在
如果您想让它处理更多文件,只需在底部的“输入文件名”列表中添加更多路径。在