star cd linux,比对软件STAR【更新】

最新推荐文章于 2024-07-05 19:16:52 发布

BLACK.VOW

最新推荐文章于 2024-07-05 19:16:52 发布

阅读量1k

点赞数 1

文章标签： star cd linux

STAR下载地址

https://github.com/alexdobin/STAR

STAR的优势：

1.快速

2.推荐在RNAseq数据中使用。

STAR经常出现在哪些应用中：

10x cellranger中

RNAseq数据

单细胞数据。

下载安装：

tar -xzf 2.5.3a.tar.gz

cd STAR-2.5.3a

make STAR

第一步build index：

任何一款比对软件在比对前都需要对reference建立一个index，目的是为了减少比对时间或降低算法复杂度(算法使然)。

(1)使用现成的

10x genomics 的ref data中有现成的index文件。可以在官网下载下来直接用，但仅限于对应的ref。

比如你下载的是refdata-cellranger-GRCh38-3.0.0，

那么就有如下这些内容都是STAR 比对所需的index文件。并且STAR软件也自带下载了。

294eadc1fc5a

(2)自己构建：

需要用到的文件有genome.fa，gtf文件，两种即可，如下命令：

/cygene/work/STAR-2.5.3a/source/STAR \

--runThreadN 20 \

--runMode genomeGenerate \

--genomeDir ./ \

--genomeFastaFiles /home/dushiyi/database/refdata-cellranger-GRCh38-1.2.0/fasta/genome.fa \

--sjdbGTFfile /home/dushiyi/database/refdata-cellranger-GRCh38-1.2.0/genes/genes.gtf

$ sh work.sh

Aug 28 09:16:14 ..... started STAR run

Aug 28 09:16:14 ... starting to generate Genome files

Aug 28 09:17:10 ... starting to sort Suffix Array. This may take a long time...

Aug 28 09:17:24 ... sorting Suffix Array chunks and saving them to disk...

Aug 28 09:58:43 ... loading chunks from disk, packing SA...

Aug 28 10:01:53 ... finished generating suffix array

Aug 28 10:01:53 ... generating Suffix Array index

Aug 28 10:04:37 ... completed Suffix Array index

Aug 28 10:04:37 ..... processing annotations GTF

Aug 28 10:04:50 ..... inserting junctions into the genome indices

Aug 28 10:07:26 ... writing Genome to disk ...

Aug 28 10:07:28 ... writing Suffix Array to disk ...

Aug 28 10:09:05 ... writing SAindex to disk

Aug 28 10:09:22 ..... finished successfully

20个线程，大约耗时35分钟。消耗内存30G，

比对：

最简单的比对参数：

STAR --runThreadN $CPU --genomeDir $index_dir --readFilesIn [PE_1.fq] [PE_2.fq] --outFileNamePrefix [prefix.] --outSAMtype BAM SortedByCoordinate

得到的是bam文件。

Example：

STAR --runThreadN 20 --genomeDir $star_index

--readFilesCommand zcat

--outSAMtype BAM Unsorted

--readFilesIn sample1.fastq.gz.tagged.fastq.gz,sample2.fastq.gz.tagged.fastq.gz,sample3.fastq.gz.tagged.fastq.gz,sample4.fastq.gz.tagged.fastq.gz

--outFileNamePrefix L006

example2:

/cygene/work/STAR-2.5.3a/source/STAR \

--runThreadN 20 \

--genomeDir /cygene/work/02.dropEst/star \

--readFilesCommand zcat \

--outSAMtype BAM Unsorted \

--readFilesIn /cygene/work/02.dropEst/01_dropTag/sample1.fastq.gz.tagged.fastq.gz

example3: STAR 输出 unmapped reads (STAR 输出未必对上的reads)

STAR-2.7.6a/bin/Linux_x86_64/STAR --runThreadN 10 --genomeDir /path/to/database/mm10/STAR-2.7.6a --readFilesCommand zcat --readFilesIn myfile1_1.fq.gz myfile2_2.fq.gz --outFileNamePrefix myfile_prefix. --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx --outSAMattributes All

注意--outSAMattributes ALL指输出所有tag，如：NH:i:1 HI:i:1 AS:i:202 nM:i:3 NM:i:2 MD:Z:57T14C32 jM:B:c,-1 jI:B:i,-1 MC:Z:45S105M

默认情况下(指不加该参数)只有： NH:i:1 HI:i:1 AS:i:292 nM:i:3 这几个(好像是，待验证)，也可以指定只输出哪几个。因为有些下游分析软件会要求bam中药有NM才能统计。

关于STAR使用时的一些报错收集及解决方法：STAR报错合集。

STAR报错1：

STAR Segmentation Fault

$ STAR --runThreadN 10 \

--genomeDir refdata-cellranger-GRCh38-1.2.0/star \

--readFilesCommand zcat \

--readFilesIn /my/data/G88E3L2_R1.fq.gz /my/data/G88E3L2_R2.fq.gz \

--outFileNamePrefix mysamplename. \

--outSAMtype BAM SortedByCoordinate

Mar 27 11:00:11 ..... started STAR run

Mar 27 11:00:11 ..... loading genome

Mar 27 11:00:19 ..... started mapping

Segmentation fault (core dumped)

尝试解决方法：

1.不设置--runThreadN 仍然报错。

2.设置-- outSAMtype BAM Unsorted 仍然报错

3.设置 --genomeLoad LoadAndRemove --limitBAMsortRAM 10000000000 仍然报错

4.设置 --outSAMtype SAM 仍然报错

5.检查read1与read2 文件大小是否一致。结果文件大小一致。

6.换其他版本STAR，换了cellranger自带的STAR，不再报错。

STAR 报错2：

STAR Error: the read ID should start with @ or >

这个报错主要是你放入的fastq的压缩文件。

解决方法：设置读取文件参数：

--readFilesCommand zcat

或者

--readFilesCommand "gunzip -c"

STAR 报错3：

FATAL ERROR，number of bytes expected from the BAM bin does not agree with the actual size on disk:

294eadc1fc5a

解决方法：

设置 --outSAMtype SAM

或者更新到最新版本

STAR报错4：

$ STAR --version

2.7.3a

$ STAR --runThreadN 70 \

--genomeDir /hg19_star2.7_index \

--readFilesCommand zcat \

--readFilesIn XXX_R1.fastq.gz XXX_R2.fastq.gz \

--outFileNamePrefix samplename. \

--outSAMtype BAM SortedByCoordinate

Apr 10 22:15:50 ..... started STAR run

Apr 10 22:15:50 ..... loading genome

Apr 10 22:16:35 ..... started mapping

BAMoutput.cpp:27:BAMoutput: exiting because of *OUTPUT FILE* error: could not create output file GB001._STARtmp//BAMsort/20/16

SOLUTION: check that the path exists and you have write permission for this file. Also check ulimit -n and increase it to allow more open files.

Apr 10 22:16:37 ...... FATAL ERROR, exiting

解决方法：将--runThreadN 设置为20，原因还需要深究。

或更新到最新版本。

STAR报错5：

(这里忘记收集报错的截图了，以后遇到再补)

解决方法：

检查read1 和read2是否一致。多半是PE read 不一致造成。

STAR报错6：

EXITING because of fatal ERROR: not enough memory for BAM sorting:

294eadc1fc5a

解决方法：

根据报错信息，增加--limitBAMsortRAM 36949420170参数即可。可以设置数值大一点也没问题

总结：最新版bug少。

20210301修改补充

20200422修改补充

BLACK.VOW

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
star cd linux,比对软件STAR【更新】

STAR下载地址https://github.com/alexdobin/STARSTAR的优势：1.快速2.推荐在RNAseq数据中使用。STAR经常出现在哪些应用中：10x cellranger中RNAseq数据单细胞数据。下载安装：tar -xzf 2.5.3a.tar.gzcd STAR-2.5.3amake STAR第一步build index：任何一款比对软件在比对前都需要对refer...
复制链接

扫一扫