Prokka的安装与使用

最新推荐文章于 2025-01-15 20:07:12 发布

原创最新推荐文章于 2025-01-15 20:07:12 发布

· 3.1k 阅读

13 ·

版权

文章标签：

#linux #运维 #服务器

Prokka是一个用于快速核基因组注释的工具，支持通过conda和docker安装。文章详细介绍了Prokka的安装步骤，包括conda命令和docker镜像使用。此外，还列出了软件的常用参数和输出文件格式，并提供了批处理注释fasta文件的命令示例。注释完成后，文章指导如何提取和整理结果文件。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章链接：https://github.com/tseemann/prokka

1.ProKKa的简介

快速核基因组注释

2、ProKKa的安装

在本人的使用中，有两种安装方法好用

1、用conda安装

conda install -c conda-forge -c bioconda -c defaults prokka

2、用docker进行安装

docker pull staphb/prokka:latest
docker run staphb/prokka:latest prokka -h

docker需要linux系统中安装，然后使用，比conda好用

3、软件的使用

prokka --proteins MG1655.gbk --outdir mutant --prefix K12_mut contigs.fa

参数

General:
--help This help
--version Print version and exit
--citation Print citation for referencing Prokka
--quiet No screen output (default OFF)
--debug Debug mode: keep all temporary files (default OFF)
Setup:
--listdb List all configured databases
--setupdb Index all installed databases
--cleandb Remove all database indices
--depends List all software dependencies
Outputs:
--outdir [X] Output folder [auto] (default '')
--force Force overwriting existing output folder (default OFF)
--prefix [X] Filename output prefix [auto] (default '')
--addgenes Add 'gene' features for each 'CDS' feature (default OFF)
--locustag [X] Locus tag prefix (default 'PROKKA')
--increment [N] Locus tag counter increment (default '1')
--gffver [N] GFF version (default '3')
--compliant Force Genbank/ENA/DDJB compliance: --genes --mincontiglen 200 --centre XXX (default OFF)
--centre [X] Sequencing centre ID. (default '')
Organism details:
--genus [X] Genus name (default 'Genus')
--species [X] Species name (default 'species')
--strain [X] Strain name (default 'strain')
--plasmid [X] Plasmid name or identifier (default '')
Annotations:
--kingdom [X] Annotation mode: Archaea|Bacteria|Mitochondria|Viruses (default 'Bacteria')
--gcode [N] Genetic code / Translation table (set if --kingdom is set) (default '0')
--prodigaltf [X] Prodigal training file (default '')
--gram [X] Gram: -/neg +/pos (default '')
--usegenus Use genus-specific BLAST databases (needs --genus) (default OFF)
--proteins [X] Fasta file of trusted proteins to first annotate from (default '')
--hmms [X] Trusted HMM to first annotate from (default '')
--metagenome Improve gene predictions for highly fragmented genomes (default OFF)
--rawproduct Do not clean up /product annotation (default OFF)
Computation:
--fast Fast mode - skip CDS /product searching (default OFF)
--cpus [N] Number of CPUs to use [0=all] (default '8')
--mincontiglen [N] Minimum contig size [NCBI needs 200] (default '1')
--evalue [n.n] Similarity e-value cut-off (default '1e-06')
--rfam Enable searching for ncRNAs with Infernal+Rfam (SLOW!) (default '0')
--norrna Don't run rRNA search (default OFF)
--notrna Don't run tRNA search (default OFF)
--rnammer Prefer RNAmmer over Barrnap for rRNA prediction (default OFF)

输出文件

Extension	描述
.gff	This is the master annotation in GFF3 format, containing both sequences and annotations. It can be viewed directly in Artemis or IGV.
.gbk	This is a standard Genbank file derived from the master .gff. If the input to prokka was a multi-FASTA, then this will be a multi-Genbank, with one record for each sequence.
.fna	Nucleotide FASTA file of the input contig sequences.
.faa	Protein FASTA file of the translated CDS sequences.
.ffn	Nucleotide FASTA file of all the prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA)
.sqn	An ASN1 format "Sequin" file for submission to Genbank. It needs to be edited to set the correct taxonomy, authors, related publication etc.
.fsa	Nucleotide FASTA file of the input contig sequences, used by "tbl2asn" to create the .sqn file. It is mostly the same as the .fna file, but with extra Sequin tags in the sequence description lines.
.tbl	Feature Table file, used by "tbl2asn" to create the .sqn file.
.err	Unacceptable annotations - the NCBI discrepancy report.
.log	Contains all the output that Prokka produced during its run. This is a record of what settings you used, even if the --quiet option was enabled.
.txt	Statistics relating to the annotated features found.
.tsv	Tab-separated file of all features: locus_tag,ftype,len_bp,gene,EC_number,COG,product

当用prokka进行批量注释时

for i in *.fna; do k=`echo $i | sed 's/.fna//g'` && prokka $i --prefix $k -outdir $k --cpus 24; done

注：使用使应注意自己的fasta文件格式，尾缀是.fna .fa .fasta等,前后都要改

注释完成后，提取结果文件

mkdir outlet
cp */*.gff outlet

注：提取不同格式的文件，记得更改文件的后缀