python设置csv格式为文本_Python：将结构化文本解析为CSV格式

最新推荐文章于 2024-01-08 09:02:55 发布

weixin_39637661

最新推荐文章于 2024-01-08 09:02:55 发布

阅读量539

点赞数

文章标签： python设置csv格式为文本

本文链接：https://blog.csdn.net/weixin_39637661/article/details/111453768

版权

I want to convert plain structured text files to the CSV format using Python.

The input looks like this

[-------- 1 -------]

Version: 2

Stream: 5

Account: A

[...]

[------- 2 --------]

Version: 3

Stream: 6

Account: B

[...]

The output is supposed to look like this:

Version; Stream; Account; [...]

2; 5; A; [...]

3; 6; B; [...]

I.e. the input is structured text records delimited by [--------] and containing : -pairs and the ouput should be CSV containing one record per line.

I am able to retrive the : -pairs into CSV format via

colonseperated = re.compile(' *(.+) *: *(.+) *')

fixedfields = re.compile('(\d{3} \w{7}) +(.*)')

-- but I have trouble to recognize beginning and end of the structured text records and with the re-writing as CSV line-records. Furthermore I would like to be able to separate different type of records, i.e. distinguish between - say - Version: 2 and Version: 3 type of records.

解决方案

Reading the list is not that hard:

def read_records(iterable):

record = {}

for line in iterable:

if line.startswith('[------'):

# new record, yield previous

if record:

yield record

record = {}

continue

key, value = line.strip().split(':', 1)

record[key.strip()] = value.strip()

# file done, yield last record

if record:

yield record

This produces dictionaries from your input file.

From this you can produce CSV output using the csv module, specifically the csv.DictWriter() class:

# List *all* possible keys, in the order the output file should list them

headers = ('Version', 'Stream', 'Account', ...)

with open(inputfile) as infile, open(outputfile, 'wb') as outfile:

records = read_records(infile)

writer = csv.DictWriter(outfile, headers, delimiter=';')

writer.writeheader()

# and write

writer.writerows(records)

Any header keys missing from a record will leave that column empty for that record. Any extra headers you missed will raise an exception; either add those to the headers tuple, or set the extrasaction keyword to the DictWriter() constructor to 'ignore'.

weixin_39637661

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python设置csv格式为文本_Python：将结构化文本解析为CSV格式

I want to convert plain structured text files to the CSV format using Python.The input looks like this[-------- 1 -------]Version: 2Stream: 5Account: A[...][------- 2 --------]Version: 3Stream: 6Accou...
复制链接

扫一扫