python将csv一行保存一个txt,Python:将多个.txt文件解析为单个.csv文件?

I'm not very experienced with complicated large-scale parsing in Python, do you guys have any tips or guides on how to easily parse multiple text files with different formats, and combining them into a single .csv file and ultimately entering them into a database?

An example of the text files is as follows:

general.txt (Name -- Department (DEPT) Room # [Age]

John Doe -- Management (MANG) 205 [Age: 40]

Equipment: Laptop, Desktop, Printer, Stapler

Experience: Python, Java, HTML

Description: Hardworking, awesome

Mary Smith -- Public Relations (PR) 605 [Age: 24]

Equipment: Mac, PC

Experience: Social Skills

Description: fun to be around

Scott Lee -- Programmer (PG) 403 [Age: 25]

Equipment: Personal Computer

Experience: HTML, CSS, JS

Description: super-hacker

Susan Kim -- Programmer (PG) 504 [Age: 21]

Equipment: Desktop

Experience: Social Skills

Descriptions: fun to be around

Bob Simon -- Programmer (PG) 101 [Age: 29]

Equipment: Pure Brain Power

Experience: C++, C, Java

Description: never comes out of his room

cars.txt (a list of people who own cars by their department/room #)

Programmer: PG 403, PG 101

Management: MANG 205

house.txt

Programmer: PG 504

The final csv should preferably tabulate to something like:

Name | Division | Division Abbrevation | Equipment | Room | Age | Car? | House? |

Scott Lee Programming PG PC 403 25 YES NO

Mary Smith Public Rel. PR Mac, PC 605 24 NO NO

The ultimate goal is to have a database, where searching "PR" would return every row where a person's Department is "PR," etc. There's maybe 30 text files total, each representing one or more columns in a database. Some columns are short paragraphs, which include commas. Around 10,000 rows total. I know Python has built in csv, but I'm not sure where to start, and how to end with just 1 csv. Any help?

解决方案

It looks like you're looking for someone who will solve a whole problem for you. Here I am :)

General idea is to parse general info to dict (using regular expressions), then append additional fields to it and finally write to CSV. Here's Python 3.x solution (I think Python 2.7+ should suffice):

import csv

import re

def read_general(fname):

# Read general info to dict with 'PR 123'-like keys

# Gerexp that will split row into ready-to-use dict

re_name = re.compile(r'''

(?P.+)

\ --\ # Separator + space

(?P.+)

\ # Space

\(

(?P.*)

\)

\ # Space

(?P\d+)

\ # Space

\[Age:\ # Space at the end

(?P\d+)

\]

''', re.X)

general = {}

with open(fname, 'rt') as f:

for line in f:

line = line.strip()

m = re_name.match(line)

if m:

# Name line, start new man

man = m.groupdict()

key = '%s %s' % (m.group('Division_Abbreviation'), m.group('Id'))

general[key] = man

elif line:

# Non empty lines

# Add values to dict

key, value = line.split(': ', 1)

man[key] = value

return general

def add_bool_criteria(fname, field, general):

# Append a field with YES/NO value

with open(fname, 'rt') as f:

yes_keys = set()

# Phase one, gather all keys

for line in f:

line = line.strip()

_, keys = line.split(': ', 1)

yes_keys.update(keys.split(', '))

# Fill data

for key, man in general.items(): # iteritems() will be faster in Python 2.x

man[field] = 'YES' if key in yes_keys else 'NO'

def save_csv(fname, general):

with open(fname, 'wt') as f:

# Gather field names

all_fields = set()

for value in general.values():

all_fields.update(value.keys())

# Write to csv

w = csv.DictWriter(f, all_fields)

w.writeheader()

w.writerows(general.values())

def main():

general = read_general('general.txt')

add_bool_criteria('cars.txt', 'Car?', general)

add_bool_criteria('house.txt', 'House?', general)

from pprint import pprint

pprint(general)

save_csv('result.csv', general)

if __name__ == '__main__':

main()

I wish you lot of $$$ for this ;)

Side note

CSV is a history, you could use JSON for storage and further use, because it's simpler to use, more flexible and human readable.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值