需求是这样的:
有大量名字以序列号排序 内容如下面内容的文本文件
姓名:aaaa
性别:bbb
年龄:ccc
籍贯:ddd
冒号和内容中间有数量不等的空格,如何能批量导入这些文件进入一个数据库,比如access
形如下面的形式
姓名 性别 年龄 籍贯
aaaa1 bbb1 ccc1 ddd1
aaaa2 bbb2 ccc2 ddd2
其中某项空缺的时候,比如有一个文本文件里年龄空缺,怎么保证列表还能正确
下面是测试用的数据文件:外链网址已屏蔽dhxy.info/upload/CS/1134036281_1.rar方案是生成一个csv文件,然后直接导入access,下面是Python实现:
# -*- coding: cp936 -*-
#convert txt data file into csv file
import sys
import os.path
def output(cur):
s = ""
keys = ["名字","性别","年龄","籍贯"]
for key in keys:
if(cur.has_key(key)):
s += cur[key]
s += ","
s+="\n"
return s
def convertFile(filename):
try:
f = file(filename,'r')
fout = file("result.csv",'a')
#fout.writelines("name,sex,age,birthplace\n")
cur = {}
s=f.readline()
tokens = ["名字","性别","年龄","籍贯"]
while len(s) > 0:
lv = s.split(":",2)
if(len(lv)!=2):
s=f.readline()
continue
token = lv[0].lower()
value = lv[1]
if cur.has_key(token) or len(cur)==len(tokens):
result = output(cur)
fout.writelines(result)
cur = {}
if token in tokens:
cur[token] = value.lstrip().rstrip("\n")
else:
print "error line"
s=f.readline()
if len(cur) > 0:
result = output(cur)
fout.writelines(result)
fout.close();
f.close()
except IOError:
print "data file not found!"
#main program
if len(sys.argv) != 2:
print "Usage:python convert.py directory"
sys.exit()
dirname = sys.argv[1]
try:
entryList = os.listdir(dirname)
#remove the result file first and write the header
fout = file("result.csv",'w+')
fout.writelines("名字,性别,年龄,籍贯\n")
fout.close()
for entry in entryList:
if os.path.isdir(dirname + "\\" + entry):
pass
else:
convertFile(dirname + "\\" + entry)
except WindowsError:
print "Dir not found!" + dirname
测试了一下,发现python干这个效率还不赖,6个数据文件,每个5M,P4 2.8G,大概30s跑完。下面是我的同学实现的C代码(跟上一个例子里同一个人:))
#include#include#include#include#defineOUT_FILE "result.csv"#defineRECORD_MAX_LEN 256struct_record{charname[RECORD_MAX_LEN];charsex[RECORD_MAX_LEN];charage[RECORD_MAX_LEN];charbirth[RECORD_MAX_LEN];intname_len;intsex_len;intage_len;intbirth_len;
};struct_record record;staticchar*token[]={"名字:","性别:","年龄:","籍贯:"};staticinlineintout_record(FILE*out){//fprintf(stdout, "%s,%s,%s,%s ", record.name, record.sex, record.age, record.birth);
fwrite(record.name,1, record.name_len,out);
fwrite(",",1,1,out);
fwrite(record.sex,1, record.sex_len,out);
fwrite(",",1,1,out);
fwrite(record.age,1, record.age_len,out);
fwrite(",",1,1,out);
fwrite(record.birth,1, record.birth_len,out);
fwrite("",1,1,out);
record.name[0]='
测试结果,比Python快11倍....
所以,Python的速度的确比C要慢得多,这些开销主要来自他的动态类型. 在上一个例子中,体现不明显,
性能就不相上下了.