eggnog 本地化及报错解决

努力的猪猪包

已于 2022-11-21 16:16:13 修改

阅读量3.7k

点赞数

分类专栏：生信分析数据分析文章标签： linux

于 2022-11-21 16:10:41 首次发布

本文链接：https://blog.csdn.net/m0_55059521/article/details/127966005

版权

生信分析同时被 2 个专栏收录

8 篇文章

订阅专栏

数据分析

4 篇文章

订阅专栏

在线网站（一次只能提交不超过1000条的基因序列）

eggNOG-mapper (embl.de)

本地化（建议最好参考GitHub）

github：eggNOG mapper v2.1.5 to v2.1.8 · eggnogdb/eggnog-mapper Wiki (github.com)

安装

1. 推荐 pip 更快一点

eggnog-mapper · PyPI

pip install eggnog-mapper

2. 将运行程序emapper.py加入自己base环境中

#Download (clone) the repository
git clone https://github.com/eggnogdb/eggnog-mapper.git
export PATH=/home/user/eggnog-mapper:/home/user/eggnog-mapper/eggnogmapper/bin:"$PATH"
#database saved in specific dir
export EGGNOG_DATA_DIR=/home/user/eggnog-mapper-data

下载数据库

1. 基本操作：只能用 diamond search

download_eggnog_data.py --data_dir eggnog-mapper-data/ -y -f
#--data_dir 数据库存放位置
#-y 自动同意
#-f 强制下载（覆盖掉以前下载过的同名文件）

2.使用hmmer或者mmseq 则需要下载相应的数据库

The -P flag is required to download the PFAM database.
The -M flag is required to download the MMseqs2 database
The -H -d taxID flag is required to download a HMMER database for a given taxID

#下载mmseq数据库
download_eggnog_data.py --data_dir eggnog-mapper-data/ -M -y -f

常用命令

1. 核酸序列

使用prodigal进行功能预测，数据类型：genome

emapper.py  --itype genome --genepred prodigal -i query.fasta -o /outdir/filename --cpu 60
#cpu: 线程数，建议多设点，速度挺慢的
#Run search and annotation for a genome, using Diamond search on proteins predicted by Prodigal
#也可以使用mmseq
emapper.py  -m mmseqs --itype genome --genepred prodigal -i query.fasta -o /outdir/filename --cpu 60

可能报错：

running prodigal: (Consider running with the -p meta option or finding more contigs from the same genome contig）

原因：文件小于100kp -p meta option is missing · Issue #7 · padlocbio/padloc (github.com)

解决：多个文件合并>100kb

使用 mmseq blastp search进行预测，输入：核酸序列

emapper.py  -m mmseqs -i query.fasta -o outdir/filename --cpu 30 --dmnd_ignore_warnings --translate
#输入是核酸序列
#--translate 对核酸序列进行翻译
#--dmnd_ignore_warnings 忽略警告

2. 氨基酸序列

使用diamond blastp search 进行预测，输入：氨基酸序列

emapper.py -i FASTA_FILE_PROTEINS -o test
#This basic example will run a diamond blastp search, and for those queries with hits to eggNOG proteins, will carry out functional annotation.