【生物信息】CARD数据库预测耐药基因

1 CARD数据库介绍

官网:The Comprehensive Antibiotic Resistance Database (mcmaster.ca)

综合抗生素耐药性数据库(“CARD”)提供与抗菌素耐药性分子基础相关的数据、模型和算法。CARD提供通过抗生素耐药性本体(“ARO”)组织的精选参考序列和SNP。这些模型可以在线下载或用于使用抗性基因标识符(“RGI”)分析基因组序列,也可以作为独立工具使用。

CARD: Expert-curated collection of molecular sequences and mutations underlying AMR, organized by the Antibiotic Resistance Ontology. 

RGI: Prediction of complete resistome from genomic and metagenomic data. 

Resistomes & Variants: Pre-compiled resistomes, allelic variants, and AMR gene prevalence data for priority pathogens. 

Annotation Services: Have the CARD team annotate your genomic data. 

Hosting Services: Host your own genome sequence collections on the CARD website, private & password-secure, with annotation constantly updated by RGI. 

Bait Capture: Hybridization bait enrichment of AMR alleles for your metagenomic sequencing projects.

2 数据准备

2.1 获取基因组数据

示例数据:CP002956.1,使用diamond进行比对的数据需要为编码蛋白质的氨基酸序列,这里下载的是一株已经完成开放阅读框预测的鼠疫菌氨基酸序列。

2.2 获取抗生素抗性基因数据库

下载CARD Data

解压后文件夹包含以下文件:

选择protein_fasta_protein_homolog_model.fasta作为ARGs数据库。

关于不同模型,官网解释如下:

The Comprehensive Antibiotic Resistance Database uses bioinformatic models for the detection of molecular determinants. For example, a Protein Homolog Model (PHM) can contain sequences of antimicrobial resistance genes that do not include mutation as a determinant of resistance, whereas a Protein Variant Model (PVM) will contain reference wild-type sequences used for mapping mutations conferring antimicrobial resistance. The Comprehensive Antibiotic Resistance Database additionally uses meta-models for the detection of combinations of individual molecular determinants. For example, efflux pump systems consist of multiple subunits and regulators that are detected together using the the Efflux Pump System Meta-Model (EPS). In CARD, detection models are applied to the detection of antimicrobial resistance elements, but they are broadly applicable to other systems as well, with modifications.

3 ARGs预测

软件版本:

Linux 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

conda 23.10.0

diamond version 2.1.8

rgi main version 6.0.3

我们可以选择使用传统的比对工具DIAMOND或者CARD提供的专门用于ARGs预测的工具rgi来进行ARGs的预测。

3.1 使用DIAMOND

diamond makedb --in protein_fasta_protein_homolog_model.fasta --db homolog
diamond blastp --db homolog.dmnd --query sequence.txt --evalue 1e-5 --query-cover 90 --subject-cover 90 --range-cover 90 --id 60 --out args.txt

部分结果:

ARO:Antibiotic Resistance Ontology 可以通过ARO编号在aro_index.tsv中找到关于该条抗性基因的详细信息,如耐药种类、耐药机制等。

3.2 使用rgi

rgi:https://github.com/arpcard/rgi?tab=readme-ov-file

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.

RGI提供了3种预测标准,即Perfect、Strict和Loose;通过选择同源比对的判定标准,可以得到不同可信度和数量的潜在耐药基因,有助于发现新的耐药基因。

#### 安装rgi
conda create --name rgi rgi
conda activate rgi
# 载入本地card数据库
rgi load --card_json card.json --local
# 进行预测
rgi main --input_sequence sequence.fasta --output_file rgiargs --local --clean --include_loose -t protein

部分结果:

rgi输出的预测结果比较详细,loose模式下,会同时输出含有strict和loose的结果,可以根据identity的大小作后续的筛选。

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值