xtart连接linux图形化界面,ART的安装和简单使用

[TOC]

安装环境

Ubuntu18.10

ART-bin-MountRainier-2016.06.05-Linux64

安装过程

cd /opt

tar xzvf artbinmountrainier2016.06.05linux64.tgz

echo "export PATH=\$PATH:/opt/art_bin_MountRainier" >> ~/.bashrc

source ~/.bashrc

art_illumina

这里就安装完毕了。

简介

Set of Simulation Tools

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles. ART supports simulation of single-end, paired-end/mate-pair reads of three major commercial next-generation sequencing platforms: Illumina's Solexa, Roche's 454 and Applied Biosystems' SOLiD. ART can be used to test or benchmark a variety of method or tools for next-generation sequencing data analysis, including read alignment, de novo assembly, SNP and structure variation discovery. ART was used as a primary tool for the simulation study of the 1000 Genomes Project . ART is implemented in C++ with optimized algorithms and is highly efficient in read simulation. ART outputs reads in the FASTQ format, and alignments in the ALN format. ART can also generate alignments in the SAM alignment or UCSC BED file format. ART can be used together with genome variants simulators (e.g. VarSim) for evaluating variant calling tools or methods.

有些时候,我们需要用到一些数据模拟工具生成模拟数据对软件进行测试,下面我就为大家介绍一款比较流行的模拟数据软件ART。

该款软件于2012年发表在Bioinformatics上,目前被引次数高达476次。ART可以模拟生成三大主流二代测序平台Illumina's Solexa, Roche's 454和Applied Biosystems' SOLiD的single-end, paired-end/mate-pair reads,同时也可以对序列比对、无参组装、call SNP等进行打分,可以说是功能相当全面。ART在1000 Genomes Project里被用作主要的模拟数据工具,采用C++编写,同时内置了Perl脚本,有着优化的算法和极高的效率,但目前并不支持多线程。输出的格式有FASTQ、alignments in the ALN format、SAM等,可以通过内置的脚本将ALN转换成BED格式。

目前ART不但可以在linux和Macos下使用,同时也有相关版本在windows笔记本上运行。通过官方地址(https://www.niehs.nih.gov/research/resources/software/biostatistics/art/)我们找到最新的linux版本下载地址,下载后tar解压缩即可直接使用ART内置的程序。

鉴于目前常用的是illumina平台的测序数据,这里就以ART里illumina相关的程序进行使用说明。art_illumina运行代码如下:

./art_illumina -ss HS25 -i ./testSeq.fa -o ./paired_end_com -l 150 -f 10 -p -m 500 -s 10 -sam

./art_illumina是需要运行的程序

-i 需要输入的参考基因组

-o 需要输出的数据,paired_end_com是输出文件的前缀

-p 表示输出是paired-end数据,如果-m参数给出的值>=2000,则自动升级成mate-pair

-m 表示paired-end的片段大小

-s 表示-m片段的偏差

-f 表示输出数据的覆盖度,这里是10X

-l 150 表示是150bp的双端数据

-sam 同时生成sam文件

-ef 加上-ef可以使输出的模拟数据没有错误值,加不加看自己的需求。

-ss The name of Illumina sequencing system of the built-in profile used for simulation,illumina不同平台有不同的固定表示,具体如下所示,其中HS25目前比较常见。

GA1 - GenomeAnalyzer I (36bp,44bp)

GA2 - GenomeAnalyzer II (50bp, 75bp)

HS10 - HiSeq 1000 (100bp)

HS20 - HiSeq 2000 (100bp)

HS25 - HiSeq 2500 (125bp, 150bp)

HSXn - HiSeqX PCR free (150bp)

HSXt - HiSeqX TruSeq (150bp)

MinS - MiniSeq TruSeq (50bp)

MSv1 - MiSeq v1 (250bp)

MSv3 - MiSeq v3 (250bp)

NS50 - NextSeq500 v2 (75bp)

简单试用

输入20.4MB模拟数据sample.fa

art_illumina -ss HS25 -i ./sample.fa -o ./paired_end_com -l 150 -f 10 -p -m 500 -s 10 -sam

控制台输出结果为

====================ART====================

ART_Illumina (2008-2016)

Q Version 2.5.8 (June 7, 2016)

Contact: Weichun Huang

-------------------------------------------

Paired-end sequencing simulation

Total CPU time used: 0.65

The random seed for the run: 1542454628

Parameters used during run

Read Length: 150

Genome masking 'N' cutoff frequency: 1 in 150

Fold Coverage: 10X

Mean Fragment Length: 500

Standard Deviation: 10

Profile Type: Combined

ID Tag:

Quality Profile(s)

First Read: HiSeq 2500 Length 150 R1 (built-in profile)

First Read: HiSeq 2500 Length 150 R2 (built-in profile)

Output files

FASTQ Sequence Files:

the 1st reads: ./paired_end_com1.fq

the 2nd reads: ./paired_end_com2.fq

ALN Alignment Files:

the 1st reads: ./paired_end_com1.aln

the 2nd reads: ./paired_end_com2.aln

SAM Alignment File:

./paired_end_com.sam

输出文件如下:

9e3207d1eb61

art_illumina_result.png

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值