是python2的版本; 将正则表达式中的\s调整一下即可适应|分隔符号;
$ cat data.txt
A|1|22
A|1|44
A|2|33
B|1|22
B|3|44
B|4|55
C|1|11
C|4|22
$ python
Python 2.7.2+ (default, Jul 20 2012, 22:12:53)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
$ python reparser.py
A -- 1, count:2, summary:66
A -- 2, count:1, summary:33
B -- 1, count:1, summary:22
B -- 3, count:1, summary:44
B -- 4, count:1, summary:55
C -- 1, count:1, summary:11
C -- 4, count:1, summary:22#!/usr/bin/env python
# coding: utf-8
import re
patt = re.compile(r"""
(?P\S+)
[\s\|]+
(?P\S+)
[\s\|]+
(?P\d+)
""", re.I|re.U|re.X)
collects = {}
with open("data.txt", 'rt') as handle:
for m in filter(None, map(patt.match, handle)):
d = m.groupdict()
collects.setdefault((d["id"], d["category"]), []).append(int(d["number"]))