【生物信息】VFDB数据库预测毒力因子

VFDB数据库介绍

官网:VFDB: Virulence Factors of Bacterial Pathogens (mgc.ac.cn)

About VFDB

The virulence factor database (VFDB) is an integrated and comprehensive online resource for curating information about virulence factors of bacterial pathogens. Since its inception in 2004, VFDB has been dedicated to providing up-to-date knowledge of VFs from various medically significant bacterial pathogens.
The motivation for constructing VFDB was twofold:First, to provide in-depth coverage major virulence factors of the best-characterized bacterial pathogens, with the structure features, functions and mechanisms used by these pathogens to allow them to conquer new niches and to circumvent host defense mechanisms, and cause disease. Second, to provide current knowledge of the wide variety of mechanisms used by bacterial pathogens for researchers to elucidate pathogenic mechanisms in bacterial diseases that are not yet well characterized and to develop new rational approaches to the treatment and prevention of infectious diseases.

Definitions

bacterial pathogen is usually defined as any bacterium that has the capacity to cause disease. Its ability to cause disease is called pathogenicity.
Virulence provides a quantitative measure of the pathogenicity or the likelihood of causing disease.
Virulence factors refer to the properties (i.e., gene products) that enable a microorganism to establish itself on or within a host of a particular species and enhance its potential to cause disease. Virulence factors include bacterial toxins, cell surface proteins that mediate bacterial attachment, cell surface carbohydrates and proteins that protect a bacterium, and hydrolytic enzymes that may contribute to the pathogenicity of the bacterium.

About VF category 

Traditional microbiological studies generally rely on laboratory bacterium isolation and cultivation, while recent NGS-based metagenomic analyses have revealed a large number of unculturable bacteria, the majority of which are yet uncharacterized. Thus, the in-depth mining of the panbacterial microbiome data in terms of pathogenesis requires a well-organized reference category of all established bacterial VFs from various known pathogens. However, the independent naming of homologous VFs in different bacteria leads to considerable confusion and hampers follow-up panbacterial analyses. Therefore, a well-defined classification scheme with a unified nomenclature is essential for future efficient data mining of bacterial VFs.
To better organize and present bacterial VFs in the database, the VFDB proposed an individual simplified classification scheme for each bacterial genus based on field conventions since inception in 2004. Nevertheless, the previous classification schemes of different bacteria are generally independent of each other, although they share certain similarities. Since the release of 2021, we introduced a general VF category applicable to various bacterial pathogens in the database and reorganized the VFDB dataset accordingly to make it readily applicable for future panbacterial data mining.
Since the majority of the current classifications of various bacterial VFs have proven very useful and durable for phylogenetic analyses, we have tried to maximally follow the existing conventions based on extensive literature mining. Thus, the newly proposed general classification scheme would be instantly familiar to and readily acceptable by traditional bacteriologists. However, unlike the previous scheme proposed in the 2012 release, which contains only four major bacterial VF categories (i.e., adhesion and invasion, secretion system, toxin, and iron acquisition), the newly established general classification scheme was designed to be a comprehensive system capable of covering all known bacterial VFs.

数据准备

  1. 获取vfdb数据库:Protein sequences of core dataset(VFDB_setA_protein.fas)

  2. 获取基因组序列:(示例数据:GenBank: CP002956.1),进行比对的数据需要为编码蛋白质的氨基酸序列,这里下载的是已经完成开放阅读框预测的氨基酸序列。

实现毒力因子的预测

软件版本:

Linux 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

conda 23.10.0

diamond version 2.1.8

  1. 构建数据库

    conda activate diamond
    diamond makedb --in VFDB_setA_pro.fas --db VFDB_setA_pro
  2. 进行比对注释

    diamond blastp --db VFDB_setA_pro.dmnd --query sequence.txt --evalue 1e-5 --query-cover 90 --subject-cover 90 --range-cover 90 --id 60 --out vf.txt

    注释结果:各列含义qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore

  3. 在注释结果中加入毒力因子信息:查看注释结果发现只显示毒力因子id,没有详细的名称类别等信息,我们可以通过把注释结果文件与VFDB_setA_pro.fas进行比对来添加毒力因子的详细信息。

    grep "^>" VFDB_setA_pro.fas | sed 's/^>//g' > VFDB_setA_id.txt
    awk '{print $2}' vf.txt | while read -r line
    do
      grep -F "$line" VFDB_setA_id.txt >> matches.txt
    done
    paste vf.txt matches.txt > vf_detail.txt

    查看添加毒力因子信息后的数据:

  4. 分割毒力因子信息列(原始信息太长,不方便查看)

    主要目的是获取毒力因子VF开头的VFid,比如VF0394,可以通过这个id在VFDB官网上查找关于该毒力因子的详细信息。

    awk -F'\t' '{match($NF, /\[([^\]]+)\]/, a); print $0 FS a[1]}' vf_detail.txt > output.txt
    mv output.txt vf_detail.txt
    awk -F'\t' '{match($NF, /\(([^\)]+)\) -/, a); match($NF, /- ([^-\n]*)$/, b); print $0 FS a[1] FS b[1]}' vf_detail.txt > output.txt
    mv output.txt vf_detail.txt

    最终数据:

  5. 利用最终数据进行后续的统计分析。

  • 24
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值