head virus_taxid
28883
28883
10662
28883
...
`
1.#get the lineage
taxonkit lineage virus_taxid > virus_line
28883 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales
10662 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Myoviridae
28883 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales
28883 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales
2.#获取最后的一个名字
awk '{$1="";print $0}' virus_line|awk -F ";" '{print $NF}' > virus_name
3.#删除行首空格及tab键
sed -i 's/^[ \t]*//g' virus_name
`
Caudovirales
Caudovirales
Caudovirales
Myoviridae
Caudovirales
Caudovirales
这种同样适用于
1.没有分支的,只是virus:
10239 Viruses
10239 Viruses
会变成:(前面会有一个小空格)
Viruses
Viruses
Viruses
Viruses
2.名字很长,其中有空格
1173749 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;Guernseyvirinae;Cornellvirus;Salmonella virus SP31;Salmonella phage FSL SP-031
1173749 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;Guernseyvirinae;Cornellvirus;Salmonella virus SP31;Salmonella phage FSL SP-031
1173749 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;Caudovirales;Siphoviridae;Guernseyvirinae;Cornellvirus;Salmonella virus SP31;Salmonella phage FSL SP-031
会变成:
Salmonella phage FSL SP-031
Salmonella phage FSL SP-031
Salmonella phage FSL SP-031
Salmonella phage FSL SP-031
Salmonella phage FSL SP-031
Salmonella phage FSL SP-031
正是想要的格式
#删除行首空格及tab键
sed -i 's/^[ \t]*//g' virus_name
根据taxid获取名称的方法
最新推荐文章于 2024-09-09 09:35:12 发布