有如下一段文本,要求将中括号[]后的第一个分号;替换为换行符
[Souma, Kousaku;Kanda, Fumie;Masuko, Takayoshi] Tokyo Univ Agr, Fac Bioind, Abashiri, Hokkaido 0992493, Japan;[Wang, Peng] Jgfdsa Univ, Coll Ansfdm Sci & Vet Med, Chanfdsn 130023, Jfdsn, Peoples R China;[Igarashi, Hiroaki] Hokuren Federat Agr Cooperat Assoc, Obihiro Branch Off, Obihiro, Hokkaido, Japan
思路,用正则表达式定位分号位置,用subn替换
a="[Souma, Kousaku;Kanda, Fumie;Masuko, Takayoshi] Tokyo Univ Agr, Fac Bioind, Abashiri, Hokkaido 0992493, Japan;[Wang, Peng] Jgfdsa Univ, Coll Ansfdm Sci & Vet Med, Chanfdsn 130023, Jfdsn, Peoples R China;[Igarashi, Hiroaki] Hokuren Federat Agr Cooperat Assoc, Obihiro Branch Off, Obihiro, Hokkaido, Japan"
pattern=re.compile(r"(\[.*?\].*?)(;)(?=\[)")
pattern.subn(r'\1\n',a)[0]
输出结果:
'[Souma, Kousaku;Kanda, Fumie;Masuko, Takayoshi] Tokyo Univ Agr, Fac Bioind, Abashiri, Hokkaido 0992493, Japan\n[Wang, Peng] Jgfdsa Univ, Coll Ansfdm Sci & Vet Med, Chanfdsn 130023, Jfdsn, Peoples R China\n[Igarashi, Hiroaki] Hokuren Federat Agr Cooperat Assoc, Obihiro Branch Off, Obihiro, Hokkaido, Japan'
注意:
1、在.*后加?禁用贪婪模式,否则只能搜索到最后一个分号
2、subn的意思是将匹配到的串用正则表达式r'\1\n'替换,其中\1即pattern中第一个分组,第二个分组是分号用\n替代
python版本2.7.3