python设置csv格式为文本_Python:将结构化文本解析为CSV格式

I want to convert plain structured text files to the CSV format using Python.

The input looks like this

[-------- 1 -------]

Version: 2

Stream: 5

Account: A

[...]

[------- 2 --------]

Version: 3

Stream: 6

Account: B

[...]

The output is supposed to look like this:

Version; Stream; Account; [...]

2; 5; A; [...]

3; 6; B; [...]

I.e. the input is structured text records delimited by [--------] and containing : -pairs and the ouput should be CSV containing one record per line.

I am able to retrive the : -pairs into CSV format via

colonseperated = re.compile(' *(.+) *: *(.+) *')

fixedfields = re.compile('(\d{3} \w{7}) +(.*)')

-- but I have trouble to recognize beginning and end of the structured text records and with the re-writing as CSV line-records. Furthermore I would like to be able to separate different type of records, i.e. distinguish between - say - Version: 2 and Version: 3 type of records.

解决方案

Reading the list is not that hard:

def read_records(iterable):

record = {}

for line in iterable:

if line.startswith('[------'):

# new record, yield previous

if record:

yield record

record = {}

continue

key, value = line.strip().split(':', 1)

record[key.strip()] = value.strip()

# file done, yield last record

if record:

yield record

This produces dictionaries from your input file.

From this you can produce CSV output using the csv module, specifically the csv.DictWriter() class:

# List *all* possible keys, in the order the output file should list them

headers = ('Version', 'Stream', 'Account', ...)

with open(inputfile) as infile, open(outputfile, 'wb') as outfile:

records = read_records(infile)

writer = csv.DictWriter(outfile, headers, delimiter=';')

writer.writeheader()

# and write

writer.writerows(records)

Any header keys missing from a record will leave that column empty for that record. Any extra headers you missed will raise an exception; either add those to the headers tuple, or set the extrasaction keyword to the DictWriter() constructor to 'ignore'.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值