我有一个很多句子的数据,关于一个例子作为下面的句子,我想把它分成2个子句子:
Both whole plasma and the d < 1.006 g/ml density fraction of plasma
from 2/2 mice show this broad beta-migration pattern (Fig. 1 B)
|T:**1SP3E3| ; |I:**1SP3E3| |L:**1SP3E3| in contrast, 3/3 plasma shows
virtually no lipid staining at the beta-position. |T:**1SN3E3|
|I:**1SN3E3| |L:**1SN3E3|
将它拆分为:
Both whole plasma and the d < 1.006 g/ml density fraction of plasma
from 2/2 mice show this broad beta-migration pattern (Fig. 1 B)
和
in contrast, 3/3 plasma shows virtually no lipid staining at the
beta-position.
我的代码是:
newData =[]
for item in Data:
test2= re.split(r" (?:\|.*?\| ?)+", item[0])
test2 =test2[:-1]
for tx in test2:
newData.append(tx)
print len(newData)
print newData
但是,我在结果中得到了3个项目,包括;我查了一下原来的句子,发现了,在| T:** 1SP3E3 | ; | I:** 1SP3E3 |,所以我需要删除它;从结果出来.我修改了我的代码
test2= re.split(r" (?:\|.*?\| ?;?)+", item[0])
但我无法得到正确的结果.有人可以帮忙吗?非常感谢.