我有一种情况,我想把一段很长的文本分成句子。我有一段代码,可以按我的意愿拆分字符串,但是它会删除分隔符(我知道会这样做的)。现在,我希望能够将这些分隔符作为输出字符串的一部分(适当地重新分配)。在
我的例子是:import re
strings = ['UT Arlington 1st - Berthiaume reached on a fielding error by ss (0-0). O. Salinas fouled out to 1b (2-1 KBB). Q. Rohrbaugh flied out to cf (2-0 BB). B. Cox fouled out to lf (2-2 KBBKF)',
'Southeast Mo. State 1st - EZELL, T. lined out to 2b (2-2 FBBKFFF). HOLST, D. flied out to lf (0-2 FK). GAGAN, T. struck out swinging (1-2 BKKS).',
'UT Arlington 3rd - J. Minjarez hit by pitch (0-0); RJ Williams advanced to second. Berthiaume popped up to 1b (0-2 KF). O. Salinas flied out to cf to right center (2-1 KBB); RJ Williams advanced to third.']
for s in strings:
header = re.split(r'[ ][-][ ]', s)
print(header[0])
text = re.split(r'([a-z][.][ ][A-Z]|[)][.][ ][A-Z])', header[-1])
print(text)
电流输出:
^{pr2}$
我想要的输出:UT Arlington 1st
['Berthiaume reached on a fielding error by ss (0-0)', 'O. Salinas fouled out to 1b (2-1 KBB)', 'Q. Rohrbaugh flied out to cf (2-0 BB)', 'B. Cox fouled out to lf (2-2 KBBKF)']
Southeast Mo. State 1st
['EZELL, T. lined out to 2b (2-2 FBBKFFF)', 'HOLST, D. flied out to lf (0-2 FK)', 'GAGAN, T. struck out swinging (1-2 BKKS).']
UT Arlington 3rd
['J. Minjarez hit by pitch (0-0); RJ Williams advanced to second', 'Berthiaume popped up to 1b (0-2 KF)', 'O. Salinas flied out to cf to right center (2-1 KBB); RJ Williams advanced to third.']