NCBI序列名字,批量修改物种名字,修改物种名字

第一版本:修改名字文件,不对序列处理
比如序列名字如下,仅仅需要物种名字

>MBZ0548397.1 glycosylating toxin TcdA [Clostridioides difficile]
>WP_238562585.1 YopT-type cysteine protease domain-containing protein, partial [Providencia rettgeri]
>HBG8546675.1 glycosylating toxin TcdA [Clostridioides difficile]HBG8645274.1 glycosylating toxin TcdA [Clostridioides difficile]
>WP_218726216.1 T3SS effector HopA1 family protein [Pseudomonas sp. D1HM]MBW0235923.1 hypothetical protein [Pseudomonas sp. D1HM]
>WP_021426113.1 glycosylating toxin TcdA [Clostridioides difficile]EQK09456.1 toxin B [Clostridioides difficile P59]
>WP_013184208.1 T3SS effector HopA1 family protein [Xenorhabdus nematophila]CBJ90088.1 Mcf2 [Xenorhabdus nematophila ATCC 19061]CEK22955.1 Mcf2 [Xenorhabdus nematophila AN6/1]
>MBH7539202.1 glycosylating toxin TcdB [Clostridioides difficile]MBZ1146783.1 glycosylating toxin TcdB [Clostridioides difficile]HBG0802012.1 glycosylating toxin TcdB [Clostridioides difficile]HBG1566612.1 glycosylating toxin TcdB [Clostridioides difficile]
>WP_107595625.1 glycosylating toxin TcdB [Clostridioides difficile]
>XP_046086339.1 TcdA/TcdB pore forming domain-containing protein [Lentinula edodes]KAH7875245.1 TcdA/TcdB pore forming domain-containing protein [Lentinula edodes]
>WP_038220508.1 T3SS effector HopA1 family protein [Xenorhabdus nematophila]CEF30096.1 Mcf2 [Xenorhabdus nematophila str. Websteri]AYA40386.1 hypothetical protein D3790_07925 [Xenorhabdus nematophila]KHD27510.1 hypothetical protein LH67_17565 [Xenorhabdus nematophila]MBA0019061.1 hypothetical protein [Xenorhabdus nematophila]MCB4424435.1 hypothetical protein [Xenorhabdus nematophila]
>WP_230086495.1 hypothetical protein [Providencia alcalifaciens]
>WP_095890640.1 glycosylating toxin TcdA [Clostridioides difficile]PBF18588.1 peptidase C80 [Clostridioides difficile]
>WP_095890783.1 glycosylating toxin TcdA, partial [Clostridioides difficile]PBF46244.1 peptidase C80, partial [Clostridioides difficile]
>WP_093403197.1 cytotoxin [Pseudomonas sp. NFPP19]SEP70755.1 TcdA/TcdB pore forming domain-containing protein [Pseudomonas sp. NFPP19]


利用python脚本加正则表达处理;
思路:正则匹配 “[” “]”这俩符号,然后为列表,读取列表的-1索引

import re
import sys
for line in open(sys.argv[1], 'r'):
        my = re.findall(r'\[([\s\S]+?)\]',line)
        print (my[-1])


使用:python 脚本.py 名字文件.txt  > new_name.txt
结果
 

Clostridioides difficile
Providencia rettgeri
Clostridioides difficile
Pseudomonas sp. D1HM
Clostridioides difficile P59
Xenorhabdus nematophila AN6/1
Clostridioides difficile
Clostridioides difficile
Lentinula edodes
Xenorhabdus nematophila
Providencia alcalifaciens
Clostridioides difficile
Clostridioides difficile
Pseudomonas sp. NFPP19

第二版本 待更新(更新实用的正则表达用法,外加序列,彻底解决序列名字问题)


作者:luanxins@163.con
欢迎指导!

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值